public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/6] Automatically use vector optabs
@ 2015-11-09 16:20 Richard Sandiford
  2015-11-09 16:21 ` [PATCH 1/6] Use IFN_SQRT in tree-vect-patterns.c Richard Sandiford
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Richard Sandiford @ 2015-11-09 16:20 UTC (permalink / raw)
  To: gcc-patches

The main goal of this series is to allow functions to be vectorised
simply by defining the associated optab.  At the moment you can get
a scalar square root instruction by defining an md pattern like sqrtdf2.
But if you want to have vectorised sqrt, you need to have a target-
specific C-level built-in function for the vector form of sqrt,
implement TARGET_BUILTIN_VECTORIZED_FUNCTION, and expand the sqrt in
the same way that the target expands other directly-called built-in
functions.  That seems unnecessarily indirect, especially when in
practice those target-specific functions tend to use patterns like
sqrtv2df2 anyway.  It also means GCC has less idea what the vector
function is doing.

The series uses the combined_fn enum and internal functions from
the patches I posted on Saturday to make the vectoriser pick up
vector optabs automatically and represent the result as a vector
internal function call.  For example, we can convert BUILT_IN_SQRT{F,,L}
or a scalar IFN_SQRT into a vector IFN_SQRT if the appropriate sqrtM2
optab is defined.

Tested on x86_64-linux-gnu, aarch64-linux-gnu, arm-linux-gnu and
powerpc64-linux-gnu.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/6] Use IFN_SQRT in tree-vect-patterns.c
  2015-11-09 16:20 [PATCH 0/6] Automatically use vector optabs Richard Sandiford
@ 2015-11-09 16:21 ` Richard Sandiford
  2015-11-10 10:21   ` Richard Biener
  2015-11-09 16:25 ` [PATCH 2/6] Make builtin_vectorized_function take a combined_fn Richard Sandiford
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Richard Sandiford @ 2015-11-09 16:21 UTC (permalink / raw)
  To: gcc-patches

In practice all targets that can vectorise sqrt define the appropriate
sqrt<mode>2 optab.  The only case where this isn't immediately obvious
is the libmass support in rs6000.c, but Mike Meissner said that it shouldn't
be exercised for sqrt.

This patch therefore uses the internal function interface instead of
going via the target hook.


gcc/
	* tree-vect-patterns.c: Include internal-fn.h.
	(vect_recog_pow_pattern): Use IFN_SQRT instead of BUILT_IN_SQRT*.

diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
index bab9a4f..a803e8c 100644
--- a/gcc/tree-vect-patterns.c
+++ b/gcc/tree-vect-patterns.c
@@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-vectorizer.h"
 #include "dumpfile.h"
 #include "builtins.h"
+#include "internal-fn.h"
 #include "case-cfn-macros.h"
 
 /* Pattern recognition functions  */
@@ -1052,18 +1053,13 @@ vect_recog_pow_pattern (vec<gimple *> *stmts, tree *type_in,
   if (TREE_CODE (exp) == REAL_CST
       && real_equal (&TREE_REAL_CST (exp), &dconsthalf))
     {
-      tree newfn = mathfn_built_in (TREE_TYPE (base), BUILT_IN_SQRT);
       *type_in = get_vectype_for_scalar_type (TREE_TYPE (base));
-      if (*type_in)
+      if (*type_in && direct_internal_fn_supported_p (IFN_SQRT, *type_in))
 	{
-	  gcall *stmt = gimple_build_call (newfn, 1, base);
-	  if (vectorizable_function (stmt, *type_in, *type_in)
-	      != NULL_TREE)
-	    {
-	      var = vect_recog_temp_ssa_var (TREE_TYPE (base), stmt);
-	      gimple_call_set_lhs (stmt, var);
-	      return stmt;
-	    }
+	  gcall *stmt = gimple_build_call_internal (IFN_SQRT, 1, base);
+	  var = vect_recog_temp_ssa_var (TREE_TYPE (base), stmt);
+	  gimple_call_set_lhs (stmt, var);
+	  return stmt;
 	}
     }
 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 2/6] Make builtin_vectorized_function take a combined_fn
  2015-11-09 16:20 [PATCH 0/6] Automatically use vector optabs Richard Sandiford
  2015-11-09 16:21 ` [PATCH 1/6] Use IFN_SQRT in tree-vect-patterns.c Richard Sandiford
@ 2015-11-09 16:25 ` Richard Sandiford
  2015-11-10 10:36   ` Richard Biener
  2015-11-09 16:27 ` [PATCH 3/6] Vectorize internal functions Richard Sandiford
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Richard Sandiford @ 2015-11-09 16:25 UTC (permalink / raw)
  To: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 2484 bytes --]

This patch replaces the fndecl argument to builtin_vectorized_function
with a combined_fn and gets the vectoriser to call it for internal
functions too.  The patch also moves vectorisation of machine-specific
built-ins to a new hook, builtin_md_vectorized_function.

I've attached a -b version too since that's easier to read.


gcc/
	* target.def (builtin_vectorized_function): Take a combined_fn (in
	the form of an unsigned int) rather than a function decl.
	(builtin_md_vectorized_function): New.
	* targhooks.h (default_builtin_vectorized_function): Replace the
	fndecl argument with an unsigned int.
	(default_builtin_md_vectorized_function): Declare.
	* targhooks.c (default_builtin_vectorized_function): Replace the
	fndecl argument with an unsigned int.
	(default_builtin_md_vectorized_function): New function.
	* doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION):
	New hook.
	* doc/tm.texi: Regenerate.
	* tree-vect-stmts.c (vectorizable_function): Update call to
	builtin_vectorized_function, also passing internal functions.
	Call builtin_md_vectorized_function for target-specific builtins.
	* config/aarch64/aarch64-protos.h
	(aarch64_builtin_vectorized_function): Replace fndecl argument
	with an unsigned int.
	* config/aarch64/aarch64-builtins.c: Include case-cfn-macros.h.
	(aarch64_builtin_vectorized_function): Update after above changes.
	Use CASE_CFN_*.
	* config/arm/arm-protos.h (arm_builtin_vectorized_function): Replace
	fndecl argument with an unsigned int.
	* config/arm/arm-builtins.c: Include case-cfn-macros.h
	(arm_builtin_vectorized_function): Update after above changes.
	Use CASE_CFN_*.
	* config/i386/i386.c: Include case-cfn-macros.h
	(ix86_veclib_handler): Take a combined_fn rather than a
	built_in_function.
	(ix86_veclibabi_svml, ix86_veclibabi_acml): Likewise.  Use
	mathfn_built_in rather than calling builtin_decl_implicit directly.
	(ix86_builtin_vectorized_function) Update after above changes.
	Use CASE_CFN_*.
	* config/rs6000/rs6000.c: Include case-cfn-macros.h
	(rs6000_builtin_vectorized_libmass): Replace fndecl argument
	with a combined_fn.  Use CASE_CFN_*.  Use mathfn_built_in rather
	than calling builtin_decl_implicit directly.
	(rs6000_builtin_vectorized_function): Update after above changes.
	Use CASE_CFN_*.  Move BUILT_IN_MD to...
	(rs6000_builtin_md_vectorized_function): ...this new function.
	(TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION): Define.


[-- Attachment #2: actual.patch --]
[-- Type: text/x-diff, Size: 59605 bytes --]

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 6b4208f..c4cda4f 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -38,6 +38,7 @@
 #include "expr.h"
 #include "langhooks.h"
 #include "gimple-iterator.h"
+#include "case-cfn-macros.h"
 
 #define v8qi_UP  V8QImode
 #define v4hi_UP  V4HImode
@@ -1258,7 +1259,8 @@ aarch64_expand_builtin (tree exp,
 }
 
 tree
-aarch64_builtin_vectorized_function (tree fndecl, tree type_out, tree type_in)
+aarch64_builtin_vectorized_function (unsigned int fn, tree type_out,
+				     tree type_in)
 {
   machine_mode in_mode, out_mode;
   int in_n, out_n;
@@ -1282,130 +1284,119 @@ aarch64_builtin_vectorized_function (tree fndecl, tree type_out, tree type_in)
 	: (AARCH64_CHECK_BUILTIN_MODE (2, S) \
 	   ? aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOP_##N##v2sf] \
 	   : NULL_TREE)))
-  if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
+  switch (fn)
     {
-      enum built_in_function fn = DECL_FUNCTION_CODE (fndecl);
-      switch (fn)
-	{
 #undef AARCH64_CHECK_BUILTIN_MODE
 #define AARCH64_CHECK_BUILTIN_MODE(C, N) \
   (out_mode == N##Fmode && out_n == C \
    && in_mode == N##Fmode && in_n == C)
-	case BUILT_IN_FLOOR:
-	case BUILT_IN_FLOORF:
-	  return AARCH64_FIND_FRINT_VARIANT (floor);
-	case BUILT_IN_CEIL:
-	case BUILT_IN_CEILF:
-	  return AARCH64_FIND_FRINT_VARIANT (ceil);
-	case BUILT_IN_TRUNC:
-	case BUILT_IN_TRUNCF:
-	  return AARCH64_FIND_FRINT_VARIANT (btrunc);
-	case BUILT_IN_ROUND:
-	case BUILT_IN_ROUNDF:
-	  return AARCH64_FIND_FRINT_VARIANT (round);
-	case BUILT_IN_NEARBYINT:
-	case BUILT_IN_NEARBYINTF:
-	  return AARCH64_FIND_FRINT_VARIANT (nearbyint);
-	case BUILT_IN_SQRT:
-	case BUILT_IN_SQRTF:
-	  return AARCH64_FIND_FRINT_VARIANT (sqrt);
+    CASE_CFN_FLOOR:
+      return AARCH64_FIND_FRINT_VARIANT (floor);
+    CASE_CFN_CEIL:
+      return AARCH64_FIND_FRINT_VARIANT (ceil);
+    CASE_CFN_TRUNC:
+      return AARCH64_FIND_FRINT_VARIANT (btrunc);
+    CASE_CFN_ROUND:
+      return AARCH64_FIND_FRINT_VARIANT (round);
+    CASE_CFN_NEARBYINT:
+      return AARCH64_FIND_FRINT_VARIANT (nearbyint);
+    CASE_CFN_SQRT:
+      return AARCH64_FIND_FRINT_VARIANT (sqrt);
 #undef AARCH64_CHECK_BUILTIN_MODE
 #define AARCH64_CHECK_BUILTIN_MODE(C, N) \
   (out_mode == SImode && out_n == C \
    && in_mode == N##Imode && in_n == C)
-        case BUILT_IN_CLZ:
-          {
-            if (AARCH64_CHECK_BUILTIN_MODE (4, S))
-              return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOP_clzv4si];
-            return NULL_TREE;
-          }
-	case BUILT_IN_CTZ:
-          {
-	    if (AARCH64_CHECK_BUILTIN_MODE (2, S))
-	      return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOP_ctzv2si];
-	    else if (AARCH64_CHECK_BUILTIN_MODE (4, S))
-	      return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOP_ctzv4si];
-	    return NULL_TREE;
-          }
+    CASE_CFN_CLZ:
+      {
+	if (AARCH64_CHECK_BUILTIN_MODE (4, S))
+	  return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOP_clzv4si];
+	return NULL_TREE;
+      }
+    CASE_CFN_CTZ:
+      {
+	if (AARCH64_CHECK_BUILTIN_MODE (2, S))
+	  return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOP_ctzv2si];
+	else if (AARCH64_CHECK_BUILTIN_MODE (4, S))
+	  return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOP_ctzv4si];
+	return NULL_TREE;
+      }
 #undef AARCH64_CHECK_BUILTIN_MODE
 #define AARCH64_CHECK_BUILTIN_MODE(C, N) \
   (out_mode == N##Imode && out_n == C \
    && in_mode == N##Fmode && in_n == C)
-	case BUILT_IN_LFLOOR:
-	case BUILT_IN_LFLOORF:
-	case BUILT_IN_LLFLOOR:
-	case BUILT_IN_IFLOORF:
-	  {
-	    enum aarch64_builtins builtin;
-	    if (AARCH64_CHECK_BUILTIN_MODE (2, D))
-	      builtin = AARCH64_SIMD_BUILTIN_UNOP_lfloorv2dfv2di;
-	    else if (AARCH64_CHECK_BUILTIN_MODE (4, S))
-	      builtin = AARCH64_SIMD_BUILTIN_UNOP_lfloorv4sfv4si;
-	    else if (AARCH64_CHECK_BUILTIN_MODE (2, S))
-	      builtin = AARCH64_SIMD_BUILTIN_UNOP_lfloorv2sfv2si;
-	    else
-	      return NULL_TREE;
-
-	    return aarch64_builtin_decls[builtin];
-	  }
-	case BUILT_IN_LCEIL:
-	case BUILT_IN_LCEILF:
-	case BUILT_IN_LLCEIL:
-	case BUILT_IN_ICEILF:
-	  {
-	    enum aarch64_builtins builtin;
-	    if (AARCH64_CHECK_BUILTIN_MODE (2, D))
-	      builtin = AARCH64_SIMD_BUILTIN_UNOP_lceilv2dfv2di;
-	    else if (AARCH64_CHECK_BUILTIN_MODE (4, S))
-	      builtin = AARCH64_SIMD_BUILTIN_UNOP_lceilv4sfv4si;
-	    else if (AARCH64_CHECK_BUILTIN_MODE (2, S))
-	      builtin = AARCH64_SIMD_BUILTIN_UNOP_lceilv2sfv2si;
-	    else
-	      return NULL_TREE;
-
-	    return aarch64_builtin_decls[builtin];
-	  }
-	case BUILT_IN_LROUND:
-	case BUILT_IN_IROUNDF:
-	  {
-	    enum aarch64_builtins builtin;
-	    if (AARCH64_CHECK_BUILTIN_MODE (2, D))
-	      builtin =	AARCH64_SIMD_BUILTIN_UNOP_lroundv2dfv2di;
-	    else if (AARCH64_CHECK_BUILTIN_MODE (4, S))
-	      builtin =	AARCH64_SIMD_BUILTIN_UNOP_lroundv4sfv4si;
-	    else if (AARCH64_CHECK_BUILTIN_MODE (2, S))
-	      builtin =	AARCH64_SIMD_BUILTIN_UNOP_lroundv2sfv2si;
-	    else
-	      return NULL_TREE;
-
-	    return aarch64_builtin_decls[builtin];
-	  }
-	case BUILT_IN_BSWAP16:
+    CASE_CFN_IFLOOR:
+    CASE_CFN_LFLOOR:
+    CASE_CFN_LLFLOOR:
+      {
+	enum aarch64_builtins builtin;
+	if (AARCH64_CHECK_BUILTIN_MODE (2, D))
+	  builtin = AARCH64_SIMD_BUILTIN_UNOP_lfloorv2dfv2di;
+	else if (AARCH64_CHECK_BUILTIN_MODE (4, S))
+	  builtin = AARCH64_SIMD_BUILTIN_UNOP_lfloorv4sfv4si;
+	else if (AARCH64_CHECK_BUILTIN_MODE (2, S))
+	  builtin = AARCH64_SIMD_BUILTIN_UNOP_lfloorv2sfv2si;
+	else
+	  return NULL_TREE;
+
+	return aarch64_builtin_decls[builtin];
+      }
+    CASE_CFN_ICEIL:
+    CASE_CFN_LCEIL:
+    CASE_CFN_LLCEIL:
+      {
+	enum aarch64_builtins builtin;
+	if (AARCH64_CHECK_BUILTIN_MODE (2, D))
+	  builtin = AARCH64_SIMD_BUILTIN_UNOP_lceilv2dfv2di;
+	else if (AARCH64_CHECK_BUILTIN_MODE (4, S))
+	  builtin = AARCH64_SIMD_BUILTIN_UNOP_lceilv4sfv4si;
+	else if (AARCH64_CHECK_BUILTIN_MODE (2, S))
+	  builtin = AARCH64_SIMD_BUILTIN_UNOP_lceilv2sfv2si;
+	else
+	  return NULL_TREE;
+
+	return aarch64_builtin_decls[builtin];
+      }
+    CASE_CFN_IROUND:
+    CASE_CFN_LROUND:
+    CASE_CFN_LLROUND:
+      {
+	enum aarch64_builtins builtin;
+	if (AARCH64_CHECK_BUILTIN_MODE (2, D))
+	  builtin =	AARCH64_SIMD_BUILTIN_UNOP_lroundv2dfv2di;
+	else if (AARCH64_CHECK_BUILTIN_MODE (4, S))
+	  builtin =	AARCH64_SIMD_BUILTIN_UNOP_lroundv4sfv4si;
+	else if (AARCH64_CHECK_BUILTIN_MODE (2, S))
+	  builtin =	AARCH64_SIMD_BUILTIN_UNOP_lroundv2sfv2si;
+	else
+	  return NULL_TREE;
+
+	return aarch64_builtin_decls[builtin];
+      }
+    case CFN_BUILT_IN_BSWAP16:
 #undef AARCH64_CHECK_BUILTIN_MODE
 #define AARCH64_CHECK_BUILTIN_MODE(C, N) \
   (out_mode == N##Imode && out_n == C \
    && in_mode == N##Imode && in_n == C)
-	  if (AARCH64_CHECK_BUILTIN_MODE (4, H))
-	    return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOPU_bswapv4hi];
-	  else if (AARCH64_CHECK_BUILTIN_MODE (8, H))
-	    return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOPU_bswapv8hi];
-	  else
-	    return NULL_TREE;
-	case BUILT_IN_BSWAP32:
-	  if (AARCH64_CHECK_BUILTIN_MODE (2, S))
-	    return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOPU_bswapv2si];
-	  else if (AARCH64_CHECK_BUILTIN_MODE (4, S))
-	    return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOPU_bswapv4si];
-	  else
-	    return NULL_TREE;
-	case BUILT_IN_BSWAP64:
-	  if (AARCH64_CHECK_BUILTIN_MODE (2, D))
-	    return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOPU_bswapv2di];
-	  else
-	    return NULL_TREE;
-	default:
-	  return NULL_TREE;
-      }
+      if (AARCH64_CHECK_BUILTIN_MODE (4, H))
+	return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOPU_bswapv4hi];
+      else if (AARCH64_CHECK_BUILTIN_MODE (8, H))
+	return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOPU_bswapv8hi];
+      else
+	return NULL_TREE;
+    case CFN_BUILT_IN_BSWAP32:
+      if (AARCH64_CHECK_BUILTIN_MODE (2, S))
+	return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOPU_bswapv2si];
+      else if (AARCH64_CHECK_BUILTIN_MODE (4, S))
+	return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOPU_bswapv4si];
+      else
+	return NULL_TREE;
+    case CFN_BUILT_IN_BSWAP64:
+      if (AARCH64_CHECK_BUILTIN_MODE (2, D))
+	return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOPU_bswapv2di];
+      else
+	return NULL_TREE;
+    default:
+      return NULL_TREE;
     }
 
   return NULL_TREE;
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 0f20f60..c77dbbf 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -407,10 +407,7 @@ tree aarch64_builtin_decl (unsigned, bool ATTRIBUTE_UNUSED);
 
 tree aarch64_builtin_rsqrt (unsigned int, bool);
 
-tree
-aarch64_builtin_vectorized_function (tree fndecl,
-				     tree type_out,
-				     tree type_in);
+tree aarch64_builtin_vectorized_function (unsigned int, tree, tree);
 
 extern void aarch64_split_combinev16qi (rtx operands[3]);
 extern void aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel);
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index bad3dc3..ee2e7b0 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -35,6 +35,7 @@
 #include "explow.h"
 #include "expr.h"
 #include "langhooks.h"
+#include "case-cfn-macros.h"
 
 #define SIMD_MAX_BUILTIN_ARGS 5
 
@@ -2812,7 +2813,7 @@ arm_expand_builtin (tree exp,
 }
 
 tree
-arm_builtin_vectorized_function (tree fndecl, tree type_out, tree type_in)
+arm_builtin_vectorized_function (unsigned int fn, tree type_out, tree type_in)
 {
   machine_mode in_mode, out_mode;
   int in_n, out_n;
@@ -2849,19 +2850,16 @@ arm_builtin_vectorized_function (tree fndecl, tree type_out, tree type_in)
       ? arm_builtin_decl(ARM_BUILTIN_NEON_##N##v4sf, false) \
       : NULL_TREE))
 
-  if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
+  switch (fn)
     {
-      enum built_in_function fn = DECL_FUNCTION_CODE (fndecl);
-      switch (fn)
-        {
-          case BUILT_IN_FLOORF:
-            return ARM_FIND_VRINT_VARIANT (vrintm);
-          case BUILT_IN_CEILF:
-            return ARM_FIND_VRINT_VARIANT (vrintp);
-          case BUILT_IN_TRUNCF:
-            return ARM_FIND_VRINT_VARIANT (vrintz);
-          case BUILT_IN_ROUNDF:
-            return ARM_FIND_VRINT_VARIANT (vrinta);
+    CASE_CFN_FLOOR:
+      return ARM_FIND_VRINT_VARIANT (vrintm);
+    CASE_CFN_CEIL:
+      return ARM_FIND_VRINT_VARIANT (vrintp);
+    CASE_CFN_TRUNC:
+      return ARM_FIND_VRINT_VARIANT (vrintz);
+    CASE_CFN_ROUND:
+      return ARM_FIND_VRINT_VARIANT (vrinta);
 #undef ARM_CHECK_BUILTIN_MODE_1
 #define ARM_CHECK_BUILTIN_MODE_1(C) \
   (out_mode == SImode && out_n == C \
@@ -2880,52 +2878,51 @@ arm_builtin_vectorized_function (tree fndecl, tree type_out, tree type_in)
    : (ARM_CHECK_BUILTIN_MODE (4) \
      ? arm_builtin_decl(ARM_BUILTIN_NEON_##N##uv4sfv4si, false) \
      : NULL_TREE))
-          case BUILT_IN_LROUNDF:
-            return out_unsigned_p
-                     ? ARM_FIND_VCVTU_VARIANT (vcvta)
-                     : ARM_FIND_VCVT_VARIANT (vcvta);
-          case BUILT_IN_LCEILF:
-            return out_unsigned_p
-                     ? ARM_FIND_VCVTU_VARIANT (vcvtp)
-                     : ARM_FIND_VCVT_VARIANT (vcvtp);
-          case BUILT_IN_LFLOORF:
-            return out_unsigned_p
-                     ? ARM_FIND_VCVTU_VARIANT (vcvtm)
-                     : ARM_FIND_VCVT_VARIANT (vcvtm);
+    CASE_CFN_LROUND:
+      return (out_unsigned_p
+	      ? ARM_FIND_VCVTU_VARIANT (vcvta)
+	      : ARM_FIND_VCVT_VARIANT (vcvta));
+    CASE_CFN_LCEIL:
+      return (out_unsigned_p
+	      ? ARM_FIND_VCVTU_VARIANT (vcvtp)
+	      : ARM_FIND_VCVT_VARIANT (vcvtp));
+    CASE_CFN_LFLOOR:
+      return (out_unsigned_p
+	      ? ARM_FIND_VCVTU_VARIANT (vcvtm)
+	      : ARM_FIND_VCVT_VARIANT (vcvtm));
 #undef ARM_CHECK_BUILTIN_MODE
 #define ARM_CHECK_BUILTIN_MODE(C, N) \
   (out_mode == N##mode && out_n == C \
    && in_mode == N##mode && in_n == C)
-          case BUILT_IN_BSWAP16:
-            if (ARM_CHECK_BUILTIN_MODE (4, HI))
-              return arm_builtin_decl (ARM_BUILTIN_NEON_bswapv4hi, false);
-            else if (ARM_CHECK_BUILTIN_MODE (8, HI))
-              return arm_builtin_decl (ARM_BUILTIN_NEON_bswapv8hi, false);
-            else
-              return NULL_TREE;
-          case BUILT_IN_BSWAP32:
-            if (ARM_CHECK_BUILTIN_MODE (2, SI))
-              return arm_builtin_decl (ARM_BUILTIN_NEON_bswapv2si, false);
-            else if (ARM_CHECK_BUILTIN_MODE (4, SI))
-              return arm_builtin_decl (ARM_BUILTIN_NEON_bswapv4si, false);
-            else
-              return NULL_TREE;
-          case BUILT_IN_BSWAP64:
-            if (ARM_CHECK_BUILTIN_MODE (2, DI))
-              return arm_builtin_decl (ARM_BUILTIN_NEON_bswapv2di, false);
-            else
-              return NULL_TREE;
-	  case BUILT_IN_COPYSIGNF:
-	    if (ARM_CHECK_BUILTIN_MODE (2, SF))
-              return arm_builtin_decl (ARM_BUILTIN_NEON_copysignfv2sf, false);
-	    else if (ARM_CHECK_BUILTIN_MODE (4, SF))
-              return arm_builtin_decl (ARM_BUILTIN_NEON_copysignfv4sf, false);
-	    else
-	      return NULL_TREE;
-
-          default:
-            return NULL_TREE;
-        }
+    case CFN_BUILT_IN_BSWAP16:
+      if (ARM_CHECK_BUILTIN_MODE (4, HI))
+	return arm_builtin_decl (ARM_BUILTIN_NEON_bswapv4hi, false);
+      else if (ARM_CHECK_BUILTIN_MODE (8, HI))
+	return arm_builtin_decl (ARM_BUILTIN_NEON_bswapv8hi, false);
+      else
+	return NULL_TREE;
+    case CFN_BUILT_IN_BSWAP32:
+      if (ARM_CHECK_BUILTIN_MODE (2, SI))
+	return arm_builtin_decl (ARM_BUILTIN_NEON_bswapv2si, false);
+      else if (ARM_CHECK_BUILTIN_MODE (4, SI))
+	return arm_builtin_decl (ARM_BUILTIN_NEON_bswapv4si, false);
+      else
+	return NULL_TREE;
+    case CFN_BUILT_IN_BSWAP64:
+      if (ARM_CHECK_BUILTIN_MODE (2, DI))
+	return arm_builtin_decl (ARM_BUILTIN_NEON_bswapv2di, false);
+      else
+	return NULL_TREE;
+    CASE_CFN_COPYSIGN:
+      if (ARM_CHECK_BUILTIN_MODE (2, SF))
+	return arm_builtin_decl (ARM_BUILTIN_NEON_copysignfv2sf, false);
+      else if (ARM_CHECK_BUILTIN_MODE (4, SF))
+	return arm_builtin_decl (ARM_BUILTIN_NEON_copysignfv4sf, false);
+      else
+	return NULL_TREE;
+
+    default:
+      return NULL_TREE;
     }
   return NULL_TREE;
 }
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index f9b1276..10c96b2 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -84,7 +84,7 @@ extern char *neon_output_shift_immediate (const char *, char, rtx *,
 extern void neon_pairwise_reduce (rtx, rtx, machine_mode,
 				  rtx (*) (rtx, rtx, rtx));
 extern rtx neon_make_constant (rtx);
-extern tree arm_builtin_vectorized_function (tree, tree, tree);
+extern tree arm_builtin_vectorized_function (unsigned int, tree, tree);
 extern void neon_expand_vector_init (rtx, rtx);
 extern void neon_lane_bounds (rtx, HOST_WIDE_INT, HOST_WIDE_INT, const_tree);
 extern void neon_const_bounds (rtx, HOST_WIDE_INT, HOST_WIDE_INT);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index bb37aba..a1d59a5 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -73,6 +73,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-chkp.h"
 #include "rtl-chkp.h"
 #include "dbgcnt.h"
+#include "case-cfn-macros.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -2611,10 +2612,10 @@ static int ix86_tune_defaulted;
 static int ix86_arch_specified;
 
 /* Vectorization library interface and handlers.  */
-static tree (*ix86_veclib_handler) (enum built_in_function, tree, tree);
+static tree (*ix86_veclib_handler) (combined_fn, tree, tree);
 
-static tree ix86_veclibabi_svml (enum built_in_function, tree, tree);
-static tree ix86_veclibabi_acml (enum built_in_function, tree, tree);
+static tree ix86_veclibabi_svml (combined_fn, tree, tree);
+static tree ix86_veclibabi_acml (combined_fn, tree, tree);
 
 /* Processor target table, indexed by processor number */
 struct ptt
@@ -41723,21 +41724,19 @@ ix86_store_returned_bounds (rtx slot, rtx bounds)
   emit_move_insn (slot, bounds);
 }
 
-/* Returns a function decl for a vectorized version of the builtin function
-   with builtin function code FN and the result vector type TYPE, or NULL_TREE
+/* Returns a function decl for a vectorized version of the combined function
+   with combined_fn code FN and the result vector type TYPE, or NULL_TREE
    if it is not available.  */
 
 static tree
-ix86_builtin_vectorized_function (tree fndecl, tree type_out,
+ix86_builtin_vectorized_function (unsigned int fn, tree type_out,
 				  tree type_in)
 {
   machine_mode in_mode, out_mode;
   int in_n, out_n;
-  enum built_in_function fn = DECL_FUNCTION_CODE (fndecl);
 
   if (TREE_CODE (type_out) != VECTOR_TYPE
-      || TREE_CODE (type_in) != VECTOR_TYPE
-      || DECL_BUILT_IN_CLASS (fndecl) != BUILT_IN_NORMAL)
+      || TREE_CODE (type_in) != VECTOR_TYPE)
     return NULL_TREE;
 
   out_mode = TYPE_MODE (TREE_TYPE (type_out));
@@ -41747,7 +41746,7 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 
   switch (fn)
     {
-    case BUILT_IN_SQRT:
+    CASE_CFN_SQRT:
       if (out_mode == DFmode && in_mode == DFmode)
 	{
 	  if (out_n == 2 && in_n == 2)
@@ -41757,17 +41756,6 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	  else if (out_n == 8 && in_n == 8)
 	    return ix86_get_builtin (IX86_BUILTIN_SQRTPD512);
 	}
-      break;
-
-    case BUILT_IN_EXP2F:
-      if (out_mode == SFmode && in_mode == SFmode)
-	{
-	  if (out_n == 16 && in_n == 16)
-	    return ix86_get_builtin (IX86_BUILTIN_EXP2PS);
-	}
-      break;
-
-    case BUILT_IN_SQRTF:
       if (out_mode == SFmode && in_mode == SFmode)
 	{
 	  if (out_n == 4 && in_n == 4)
@@ -41779,9 +41767,17 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	}
       break;
 
-    case BUILT_IN_IFLOOR:
-    case BUILT_IN_LFLOOR:
-    case BUILT_IN_LLFLOOR:
+    CASE_CFN_EXP2:
+      if (out_mode == SFmode && in_mode == SFmode)
+	{
+	  if (out_n == 16 && in_n == 16)
+	    return ix86_get_builtin (IX86_BUILTIN_EXP2PS);
+	}
+      break;
+
+    CASE_CFN_IFLOOR:
+    CASE_CFN_LFLOOR:
+    CASE_CFN_LLFLOOR:
       /* The round insn does not trap on denormals.  */
       if (flag_trapping_math || !TARGET_ROUND)
 	break;
@@ -41795,15 +41791,6 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	  else if (out_n == 16 && in_n == 8)
 	    return ix86_get_builtin (IX86_BUILTIN_FLOORPD_VEC_PACK_SFIX512);
 	}
-      break;
-
-    case BUILT_IN_IFLOORF:
-    case BUILT_IN_LFLOORF:
-    case BUILT_IN_LLFLOORF:
-      /* The round insn does not trap on denormals.  */
-      if (flag_trapping_math || !TARGET_ROUND)
-	break;
-
       if (out_mode == SImode && in_mode == SFmode)
 	{
 	  if (out_n == 4 && in_n == 4)
@@ -41813,9 +41800,9 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	}
       break;
 
-    case BUILT_IN_ICEIL:
-    case BUILT_IN_LCEIL:
-    case BUILT_IN_LLCEIL:
+    CASE_CFN_ICEIL:
+    CASE_CFN_LCEIL:
+    CASE_CFN_LLCEIL:
       /* The round insn does not trap on denormals.  */
       if (flag_trapping_math || !TARGET_ROUND)
 	break;
@@ -41829,15 +41816,6 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	  else if (out_n == 16 && in_n == 8)
 	    return ix86_get_builtin (IX86_BUILTIN_CEILPD_VEC_PACK_SFIX512);
 	}
-      break;
-
-    case BUILT_IN_ICEILF:
-    case BUILT_IN_LCEILF:
-    case BUILT_IN_LLCEILF:
-      /* The round insn does not trap on denormals.  */
-      if (flag_trapping_math || !TARGET_ROUND)
-	break;
-
       if (out_mode == SImode && in_mode == SFmode)
 	{
 	  if (out_n == 4 && in_n == 4)
@@ -41847,9 +41825,9 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	}
       break;
 
-    case BUILT_IN_IRINT:
-    case BUILT_IN_LRINT:
-    case BUILT_IN_LLRINT:
+    CASE_CFN_IRINT:
+    CASE_CFN_LRINT:
+    CASE_CFN_LLRINT:
       if (out_mode == SImode && in_mode == DFmode)
 	{
 	  if (out_n == 4 && in_n == 2)
@@ -41857,11 +41835,6 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	  else if (out_n == 8 && in_n == 4)
 	    return ix86_get_builtin (IX86_BUILTIN_VEC_PACK_SFIX256);
 	}
-      break;
-
-    case BUILT_IN_IRINTF:
-    case BUILT_IN_LRINTF:
-    case BUILT_IN_LLRINTF:
       if (out_mode == SImode && in_mode == SFmode)
 	{
 	  if (out_n == 4 && in_n == 4)
@@ -41871,9 +41844,9 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	}
       break;
 
-    case BUILT_IN_IROUND:
-    case BUILT_IN_LROUND:
-    case BUILT_IN_LLROUND:
+    CASE_CFN_IROUND:
+    CASE_CFN_LROUND:
+    CASE_CFN_LLROUND:
       /* The round insn does not trap on denormals.  */
       if (flag_trapping_math || !TARGET_ROUND)
 	break;
@@ -41887,15 +41860,6 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	  else if (out_n == 16 && in_n == 8)
 	    return ix86_get_builtin (IX86_BUILTIN_ROUNDPD_AZ_VEC_PACK_SFIX512);
 	}
-      break;
-
-    case BUILT_IN_IROUNDF:
-    case BUILT_IN_LROUNDF:
-    case BUILT_IN_LLROUNDF:
-      /* The round insn does not trap on denormals.  */
-      if (flag_trapping_math || !TARGET_ROUND)
-	break;
-
       if (out_mode == SImode && in_mode == SFmode)
 	{
 	  if (out_n == 4 && in_n == 4)
@@ -41905,7 +41869,7 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	}
       break;
 
-    case BUILT_IN_COPYSIGN:
+    CASE_CFN_COPYSIGN:
       if (out_mode == DFmode && in_mode == DFmode)
 	{
 	  if (out_n == 2 && in_n == 2)
@@ -41915,9 +41879,6 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	  else if (out_n == 8 && in_n == 8)
 	    return ix86_get_builtin (IX86_BUILTIN_CPYSGNPD512);
 	}
-      break;
-
-    case BUILT_IN_COPYSIGNF:
       if (out_mode == SFmode && in_mode == SFmode)
 	{
 	  if (out_n == 4 && in_n == 4)
@@ -41929,7 +41890,7 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	}
       break;
 
-    case BUILT_IN_FLOOR:
+    CASE_CFN_FLOOR:
       /* The round insn does not trap on denormals.  */
       if (flag_trapping_math || !TARGET_ROUND)
 	break;
@@ -41941,13 +41902,6 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	  else if (out_n == 4 && in_n == 4)
 	    return ix86_get_builtin (IX86_BUILTIN_FLOORPD256);
 	}
-      break;
-
-    case BUILT_IN_FLOORF:
-      /* The round insn does not trap on denormals.  */
-      if (flag_trapping_math || !TARGET_ROUND)
-	break;
-
       if (out_mode == SFmode && in_mode == SFmode)
 	{
 	  if (out_n == 4 && in_n == 4)
@@ -41957,7 +41911,7 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	}
       break;
 
-    case BUILT_IN_CEIL:
+    CASE_CFN_CEIL:
       /* The round insn does not trap on denormals.  */
       if (flag_trapping_math || !TARGET_ROUND)
 	break;
@@ -41969,13 +41923,6 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	  else if (out_n == 4 && in_n == 4)
 	    return ix86_get_builtin (IX86_BUILTIN_CEILPD256);
 	}
-      break;
-
-    case BUILT_IN_CEILF:
-      /* The round insn does not trap on denormals.  */
-      if (flag_trapping_math || !TARGET_ROUND)
-	break;
-
       if (out_mode == SFmode && in_mode == SFmode)
 	{
 	  if (out_n == 4 && in_n == 4)
@@ -41985,7 +41932,7 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	}
       break;
 
-    case BUILT_IN_TRUNC:
+    CASE_CFN_TRUNC:
       /* The round insn does not trap on denormals.  */
       if (flag_trapping_math || !TARGET_ROUND)
 	break;
@@ -41997,13 +41944,6 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	  else if (out_n == 4 && in_n == 4)
 	    return ix86_get_builtin (IX86_BUILTIN_TRUNCPD256);
 	}
-      break;
-
-    case BUILT_IN_TRUNCF:
-      /* The round insn does not trap on denormals.  */
-      if (flag_trapping_math || !TARGET_ROUND)
-	break;
-
       if (out_mode == SFmode && in_mode == SFmode)
 	{
 	  if (out_n == 4 && in_n == 4)
@@ -42013,7 +41953,7 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	}
       break;
 
-    case BUILT_IN_RINT:
+    CASE_CFN_RINT:
       /* The round insn does not trap on denormals.  */
       if (flag_trapping_math || !TARGET_ROUND)
 	break;
@@ -42025,13 +41965,6 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	  else if (out_n == 4 && in_n == 4)
 	    return ix86_get_builtin (IX86_BUILTIN_RINTPD256);
 	}
-      break;
-
-    case BUILT_IN_RINTF:
-      /* The round insn does not trap on denormals.  */
-      if (flag_trapping_math || !TARGET_ROUND)
-	break;
-
       if (out_mode == SFmode && in_mode == SFmode)
 	{
 	  if (out_n == 4 && in_n == 4)
@@ -42041,7 +41974,7 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	}
       break;
 
-    case BUILT_IN_ROUND:
+    CASE_CFN_ROUND:
       /* The round insn does not trap on denormals.  */
       if (flag_trapping_math || !TARGET_ROUND)
 	break;
@@ -42053,13 +41986,6 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	  else if (out_n == 4 && in_n == 4)
 	    return ix86_get_builtin (IX86_BUILTIN_ROUNDPD_AZ256);
 	}
-      break;
-
-    case BUILT_IN_ROUNDF:
-      /* The round insn does not trap on denormals.  */
-      if (flag_trapping_math || !TARGET_ROUND)
-	break;
-
       if (out_mode == SFmode && in_mode == SFmode)
 	{
 	  if (out_n == 4 && in_n == 4)
@@ -42069,7 +41995,7 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	}
       break;
 
-    case BUILT_IN_FMA:
+    CASE_CFN_FMA:
       if (out_mode == DFmode && in_mode == DFmode)
 	{
 	  if (out_n == 2 && in_n == 2)
@@ -42077,9 +42003,6 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	  if (out_n == 4 && in_n == 4)
 	    return ix86_get_builtin (IX86_BUILTIN_VFMADDPD256);
 	}
-      break;
-
-    case BUILT_IN_FMAF:
       if (out_mode == SFmode && in_mode == SFmode)
 	{
 	  if (out_n == 4 && in_n == 4)
@@ -42095,8 +42018,7 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 
   /* Dispatch to a handler for a vectorization library.  */
   if (ix86_veclib_handler)
-    return ix86_veclib_handler ((enum built_in_function) fn, type_out,
-				type_in);
+    return ix86_veclib_handler (combined_fn (fn), type_out, type_in);
 
   return NULL_TREE;
 }
@@ -42105,7 +42027,7 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
    a library with vectorized intrinsics.  */
 
 static tree
-ix86_veclibabi_svml (enum built_in_function fn, tree type_out, tree type_in)
+ix86_veclibabi_svml (combined_fn fn, tree type_out, tree type_in)
 {
   char name[20];
   tree fntype, new_fndecl, args;
@@ -42128,47 +42050,26 @@ ix86_veclibabi_svml (enum built_in_function fn, tree type_out, tree type_in)
 
   switch (fn)
     {
-    case BUILT_IN_EXP:
-    case BUILT_IN_LOG:
-    case BUILT_IN_LOG10:
-    case BUILT_IN_POW:
-    case BUILT_IN_TANH:
-    case BUILT_IN_TAN:
-    case BUILT_IN_ATAN:
-    case BUILT_IN_ATAN2:
-    case BUILT_IN_ATANH:
-    case BUILT_IN_CBRT:
-    case BUILT_IN_SINH:
-    case BUILT_IN_SIN:
-    case BUILT_IN_ASINH:
-    case BUILT_IN_ASIN:
-    case BUILT_IN_COSH:
-    case BUILT_IN_COS:
-    case BUILT_IN_ACOSH:
-    case BUILT_IN_ACOS:
-      if (el_mode != DFmode || n != 2)
-	return NULL_TREE;
-      break;
-
-    case BUILT_IN_EXPF:
-    case BUILT_IN_LOGF:
-    case BUILT_IN_LOG10F:
-    case BUILT_IN_POWF:
-    case BUILT_IN_TANHF:
-    case BUILT_IN_TANF:
-    case BUILT_IN_ATANF:
-    case BUILT_IN_ATAN2F:
-    case BUILT_IN_ATANHF:
-    case BUILT_IN_CBRTF:
-    case BUILT_IN_SINHF:
-    case BUILT_IN_SINF:
-    case BUILT_IN_ASINHF:
-    case BUILT_IN_ASINF:
-    case BUILT_IN_COSHF:
-    case BUILT_IN_COSF:
-    case BUILT_IN_ACOSHF:
-    case BUILT_IN_ACOSF:
-      if (el_mode != SFmode || n != 4)
+    CASE_CFN_EXP:
+    CASE_CFN_LOG:
+    CASE_CFN_LOG10:
+    CASE_CFN_POW:
+    CASE_CFN_TANH:
+    CASE_CFN_TAN:
+    CASE_CFN_ATAN:
+    CASE_CFN_ATAN2:
+    CASE_CFN_ATANH:
+    CASE_CFN_CBRT:
+    CASE_CFN_SINH:
+    CASE_CFN_SIN:
+    CASE_CFN_ASINH:
+    CASE_CFN_ASIN:
+    CASE_CFN_COSH:
+    CASE_CFN_COS:
+    CASE_CFN_ACOSH:
+    CASE_CFN_ACOS:
+      if ((el_mode != DFmode || n != 2)
+	  && (el_mode != SFmode || n != 4))
 	return NULL_TREE;
       break;
 
@@ -42176,11 +42077,12 @@ ix86_veclibabi_svml (enum built_in_function fn, tree type_out, tree type_in)
       return NULL_TREE;
     }
 
-  bname = IDENTIFIER_POINTER (DECL_NAME (builtin_decl_implicit (fn)));
+  tree fndecl = mathfn_built_in (TREE_TYPE (type_in), fn);
+  bname = IDENTIFIER_POINTER (DECL_NAME (fndecl));
 
-  if (fn == BUILT_IN_LOGF)
+  if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_LOGF)
     strcpy (name, "vmlsLn4");
-  else if (fn == BUILT_IN_LOG)
+  else if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_LOG)
     strcpy (name, "vmldLn2");
   else if (n == 4)
     {
@@ -42194,9 +42096,7 @@ ix86_veclibabi_svml (enum built_in_function fn, tree type_out, tree type_in)
   name[4] &= ~0x20;
 
   arity = 0;
-  for (args = DECL_ARGUMENTS (builtin_decl_implicit (fn));
-       args;
-       args = TREE_CHAIN (args))
+  for (args = DECL_ARGUMENTS (fndecl); args; args = TREE_CHAIN (args))
     arity++;
 
   if (arity == 1)
@@ -42219,7 +42119,7 @@ ix86_veclibabi_svml (enum built_in_function fn, tree type_out, tree type_in)
    a library with vectorized intrinsics.  */
 
 static tree
-ix86_veclibabi_acml (enum built_in_function fn, tree type_out, tree type_in)
+ix86_veclibabi_acml (combined_fn fn, tree type_out, tree type_in)
 {
   char name[20] = "__vr.._";
   tree fntype, new_fndecl, args;
@@ -42245,30 +42145,23 @@ ix86_veclibabi_acml (enum built_in_function fn, tree type_out, tree type_in)
 
   switch (fn)
     {
-    case BUILT_IN_SIN:
-    case BUILT_IN_COS:
-    case BUILT_IN_EXP:
-    case BUILT_IN_LOG:
-    case BUILT_IN_LOG2:
-    case BUILT_IN_LOG10:
-      name[4] = 'd';
-      name[5] = '2';
-      if (el_mode != DFmode
-	  || n != 2)
-	return NULL_TREE;
-      break;
-
-    case BUILT_IN_SINF:
-    case BUILT_IN_COSF:
-    case BUILT_IN_EXPF:
-    case BUILT_IN_POWF:
-    case BUILT_IN_LOGF:
-    case BUILT_IN_LOG2F:
-    case BUILT_IN_LOG10F:
-      name[4] = 's';
-      name[5] = '4';
-      if (el_mode != SFmode
-	  || n != 4)
+    CASE_CFN_SIN:
+    CASE_CFN_COS:
+    CASE_CFN_EXP:
+    CASE_CFN_LOG:
+    CASE_CFN_LOG2:
+    CASE_CFN_LOG10:
+      if (el_mode == DFmode && n == 2)
+	{
+	  name[4] = 'd';
+	  name[5] = '2';
+	}
+      else if (el_mode == SFmode && n == 4)
+	{
+	  name[4] = 's';
+	  name[5] = '4';
+	}
+      else
 	return NULL_TREE;
       break;
 
@@ -42276,13 +42169,12 @@ ix86_veclibabi_acml (enum built_in_function fn, tree type_out, tree type_in)
       return NULL_TREE;
     }
 
-  bname = IDENTIFIER_POINTER (DECL_NAME (builtin_decl_implicit (fn)));
+  tree fndecl = mathfn_built_in (TREE_TYPE (type_in), fn);
+  bname = IDENTIFIER_POINTER (DECL_NAME (fndecl));
   sprintf (name + 7, "%s", bname+10);
 
   arity = 0;
-  for (args = DECL_ARGUMENTS (builtin_decl_implicit (fn));
-       args;
-       args = TREE_CHAIN (args))
+  for (args = DECL_ARGUMENTS (fndecl); args; args = TREE_CHAIN (args))
     arity++;
 
   if (arity == 1)
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 8bdd646..26a0410 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -70,6 +70,7 @@
 #if TARGET_MACHO
 #include "gstab.h"  /* for N_SLINE */
 #endif
+#include "case-cfn-macros.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -1077,7 +1078,7 @@ static const struct rs6000_builtin_info_type rs6000_builtin_info[] =
 #undef RS6000_BUILTIN_X
 
 /* Support for -mveclibabi=<xxx> to control which vector library to use.  */
-static tree (*rs6000_veclib_handler) (tree, tree, tree);
+static tree (*rs6000_veclib_handler) (combined_fn, tree, tree);
 
 \f
 static bool rs6000_debug_legitimate_address_p (machine_mode, rtx, bool);
@@ -1087,7 +1088,7 @@ static int rs6000_ra_ever_killed (void);
 static tree rs6000_handle_longcall_attribute (tree *, tree, tree, int, bool *);
 static tree rs6000_handle_altivec_attribute (tree *, tree, tree, int, bool *);
 static tree rs6000_handle_struct_attribute (tree *, tree, tree, int, bool *);
-static tree rs6000_builtin_vectorized_libmass (tree, tree, tree);
+static tree rs6000_builtin_vectorized_libmass (combined_fn, tree, tree);
 static void rs6000_emit_set_long_const (rtx, HOST_WIDE_INT);
 static int rs6000_memory_move_cost (machine_mode, reg_class_t, bool);
 static bool rs6000_debug_rtx_costs (rtx, machine_mode, int, int, int *, bool);
@@ -1576,6 +1577,10 @@ static const struct attribute_spec rs6000_attribute_table[] =
 #define TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION \
   rs6000_builtin_vectorized_function
 
+#undef TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION
+#define TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION \
+  rs6000_builtin_md_vectorized_function
+
 #if !TARGET_MACHO
 #undef TARGET_STACK_PROTECT_FAIL
 #define TARGET_STACK_PROTECT_FAIL rs6000_stack_protect_fail
@@ -4775,7 +4780,8 @@ rs6000_destroy_cost_data (void *data)
    library with vectorized intrinsics.  */
 
 static tree
-rs6000_builtin_vectorized_libmass (tree fndecl, tree type_out, tree type_in)
+rs6000_builtin_vectorized_libmass (combined_fn fn, tree type_out,
+				   tree type_in)
 {
   char name[32];
   const char *suffix = NULL;
@@ -4800,93 +4806,57 @@ rs6000_builtin_vectorized_libmass (tree fndecl, tree type_out, tree type_in)
       || n != in_n)
     return NULL_TREE;
 
-  if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
-    {
-      enum built_in_function fn = DECL_FUNCTION_CODE (fndecl);
-      switch (fn)
-	{
-	case BUILT_IN_ATAN2:
-	case BUILT_IN_HYPOT:
-	case BUILT_IN_POW:
-	  n_args = 2;
-	  /* fall through */
-
-	case BUILT_IN_ACOS:
-	case BUILT_IN_ACOSH:
-	case BUILT_IN_ASIN:
-	case BUILT_IN_ASINH:
-	case BUILT_IN_ATAN:
-	case BUILT_IN_ATANH:
-	case BUILT_IN_CBRT:
-	case BUILT_IN_COS:
-	case BUILT_IN_COSH:
-	case BUILT_IN_ERF:
-	case BUILT_IN_ERFC:
-	case BUILT_IN_EXP2:
-	case BUILT_IN_EXP:
-	case BUILT_IN_EXPM1:
-	case BUILT_IN_LGAMMA:
-	case BUILT_IN_LOG10:
-	case BUILT_IN_LOG1P:
-	case BUILT_IN_LOG2:
-	case BUILT_IN_LOG:
-	case BUILT_IN_SIN:
-	case BUILT_IN_SINH:
-	case BUILT_IN_SQRT:
-	case BUILT_IN_TAN:
-	case BUILT_IN_TANH:
-	  bdecl = builtin_decl_implicit (fn);
-	  suffix = "d2";				/* pow -> powd2 */
-	  if (el_mode != DFmode
-	      || n != 2
-	      || !bdecl)
-	    return NULL_TREE;
-	  break;
+  switch (fn)
+    {
+    CASE_CFN_ATAN2:
+    CASE_CFN_HYPOT:
+    CASE_CFN_POW:
+      n_args = 2;
+      /* fall through */
 
-	case BUILT_IN_ATAN2F:
-	case BUILT_IN_HYPOTF:
-	case BUILT_IN_POWF:
-	  n_args = 2;
-	  /* fall through */
-
-	case BUILT_IN_ACOSF:
-	case BUILT_IN_ACOSHF:
-	case BUILT_IN_ASINF:
-	case BUILT_IN_ASINHF:
-	case BUILT_IN_ATANF:
-	case BUILT_IN_ATANHF:
-	case BUILT_IN_CBRTF:
-	case BUILT_IN_COSF:
-	case BUILT_IN_COSHF:
-	case BUILT_IN_ERFF:
-	case BUILT_IN_ERFCF:
-	case BUILT_IN_EXP2F:
-	case BUILT_IN_EXPF:
-	case BUILT_IN_EXPM1F:
-	case BUILT_IN_LGAMMAF:
-	case BUILT_IN_LOG10F:
-	case BUILT_IN_LOG1PF:
-	case BUILT_IN_LOG2F:
-	case BUILT_IN_LOGF:
-	case BUILT_IN_SINF:
-	case BUILT_IN_SINHF:
-	case BUILT_IN_SQRTF:
-	case BUILT_IN_TANF:
-	case BUILT_IN_TANHF:
-	  bdecl = builtin_decl_implicit (fn);
+    CASE_CFN_ACOS:
+    CASE_CFN_ACOSH:
+    CASE_CFN_ASIN:
+    CASE_CFN_ASINH:
+    CASE_CFN_ATAN:
+    CASE_CFN_ATANH:
+    CASE_CFN_CBRT:
+    CASE_CFN_COS:
+    CASE_CFN_COSH:
+    CASE_CFN_ERF:
+    CASE_CFN_ERFC:
+    CASE_CFN_EXP2:
+    CASE_CFN_EXP:
+    CASE_CFN_EXPM1:
+    CASE_CFN_LGAMMA:
+    CASE_CFN_LOG10:
+    CASE_CFN_LOG1P:
+    CASE_CFN_LOG2:
+    CASE_CFN_LOG:
+    CASE_CFN_SIN:
+    CASE_CFN_SINH:
+    CASE_CFN_SQRT:
+    CASE_CFN_TAN:
+    CASE_CFN_TANH:
+      if (el_mode == DFmode && n == 2)
+	{
+	  bdecl = mathfn_built_in (double_type_node, fn);
+	  suffix = "d2";				/* pow -> powd2 */
+	}
+      else if (el_mode == SFmode && n == 4)
+	{
+	  bdecl = mathfn_built_in (float_type_node, fn);
 	  suffix = "4";					/* powf -> powf4 */
-	  if (el_mode != SFmode
-	      || n != 4
-	      || !bdecl)
-	    return NULL_TREE;
-	  break;
-
-	default:
-	  return NULL_TREE;
 	}
+      else
+	return NULL_TREE;
+      if (!bdecl)
+	return NULL_TREE;
+      break;
+
+    default:
+      return NULL_TREE;
     }
-  else
-    return NULL_TREE;
 
   gcc_assert (suffix != NULL);
   bname = IDENTIFIER_POINTER (DECL_NAME (bdecl));
@@ -4919,7 +4889,7 @@ rs6000_builtin_vectorized_libmass (tree fndecl, tree type_out, tree type_in)
    if it is not available.  */
 
 static tree
-rs6000_builtin_vectorized_function (tree fndecl, tree type_out,
+rs6000_builtin_vectorized_function (unsigned int fn, tree type_out,
 				    tree type_in)
 {
   machine_mode in_mode, out_mode;
@@ -4927,7 +4897,7 @@ rs6000_builtin_vectorized_function (tree fndecl, tree type_out,
 
   if (TARGET_DEBUG_BUILTIN)
     fprintf (stderr, "rs6000_builtin_vectorized_function (%s, %s, %s)\n",
-	     IDENTIFIER_POINTER (DECL_NAME (fndecl)),
+	     combined_fn_name (combined_fn (fn)),
 	     GET_MODE_NAME (TYPE_MODE (type_out)),
 	     GET_MODE_NAME (TYPE_MODE (type_in)));
 
@@ -4941,203 +4911,205 @@ rs6000_builtin_vectorized_function (tree fndecl, tree type_out,
   in_mode = TYPE_MODE (TREE_TYPE (type_in));
   in_n = TYPE_VECTOR_SUBPARTS (type_in);
 
-  if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
+  switch (fn)
     {
-      enum built_in_function fn = DECL_FUNCTION_CODE (fndecl);
-      switch (fn)
+    CASE_CFN_CLZ:
+      if (TARGET_P8_VECTOR && in_mode == out_mode && out_n == in_n)
 	{
-	case BUILT_IN_CLZIMAX:
-	case BUILT_IN_CLZLL:
-	case BUILT_IN_CLZL:
-	case BUILT_IN_CLZ:
-	  if (TARGET_P8_VECTOR && in_mode == out_mode && out_n == in_n)
-	    {
-	      if (out_mode == QImode && out_n == 16)
-		return rs6000_builtin_decls[P8V_BUILTIN_VCLZB];
-	      else if (out_mode == HImode && out_n == 8)
-		return rs6000_builtin_decls[P8V_BUILTIN_VCLZH];
-	      else if (out_mode == SImode && out_n == 4)
-		return rs6000_builtin_decls[P8V_BUILTIN_VCLZW];
-	      else if (out_mode == DImode && out_n == 2)
-		return rs6000_builtin_decls[P8V_BUILTIN_VCLZD];
-	    }
-	  break;
-	case BUILT_IN_COPYSIGN:
-	  if (VECTOR_UNIT_VSX_P (V2DFmode)
-	      && out_mode == DFmode && out_n == 2
-	      && in_mode == DFmode && in_n == 2)
-	    return rs6000_builtin_decls[VSX_BUILTIN_CPSGNDP];
-	  break;
-	case BUILT_IN_COPYSIGNF:
-	  if (out_mode != SFmode || out_n != 4
-	      || in_mode != SFmode || in_n != 4)
-	    break;
-	  if (VECTOR_UNIT_VSX_P (V4SFmode))
-	    return rs6000_builtin_decls[VSX_BUILTIN_CPSGNSP];
-	  if (VECTOR_UNIT_ALTIVEC_P (V4SFmode))
-	    return rs6000_builtin_decls[ALTIVEC_BUILTIN_COPYSIGN_V4SF];
-	  break;
-	case BUILT_IN_POPCOUNTIMAX:
-	case BUILT_IN_POPCOUNTLL:
-	case BUILT_IN_POPCOUNTL:
-	case BUILT_IN_POPCOUNT:
-	  if (TARGET_P8_VECTOR && in_mode == out_mode && out_n == in_n)
-	    {
-	      if (out_mode == QImode && out_n == 16)
-		return rs6000_builtin_decls[P8V_BUILTIN_VPOPCNTB];
-	      else if (out_mode == HImode && out_n == 8)
-		return rs6000_builtin_decls[P8V_BUILTIN_VPOPCNTH];
-	      else if (out_mode == SImode && out_n == 4)
-		return rs6000_builtin_decls[P8V_BUILTIN_VPOPCNTW];
-	      else if (out_mode == DImode && out_n == 2)
-		return rs6000_builtin_decls[P8V_BUILTIN_VPOPCNTD];
-	    }
-	  break;
-	case BUILT_IN_SQRT:
-	  if (VECTOR_UNIT_VSX_P (V2DFmode)
-	      && out_mode == DFmode && out_n == 2
-	      && in_mode == DFmode && in_n == 2)
-	    return rs6000_builtin_decls[VSX_BUILTIN_XVSQRTDP];
-	  break;
-	case BUILT_IN_SQRTF:
-	  if (VECTOR_UNIT_VSX_P (V4SFmode)
-	      && out_mode == SFmode && out_n == 4
-	      && in_mode == SFmode && in_n == 4)
-	    return rs6000_builtin_decls[VSX_BUILTIN_XVSQRTSP];
-	  break;
-	case BUILT_IN_CEIL:
-	  if (VECTOR_UNIT_VSX_P (V2DFmode)
-	      && out_mode == DFmode && out_n == 2
-	      && in_mode == DFmode && in_n == 2)
-	    return rs6000_builtin_decls[VSX_BUILTIN_XVRDPIP];
-	  break;
-	case BUILT_IN_CEILF:
-	  if (out_mode != SFmode || out_n != 4
-	      || in_mode != SFmode || in_n != 4)
-	    break;
-	  if (VECTOR_UNIT_VSX_P (V4SFmode))
-	    return rs6000_builtin_decls[VSX_BUILTIN_XVRSPIP];
-	  if (VECTOR_UNIT_ALTIVEC_P (V4SFmode))
-	    return rs6000_builtin_decls[ALTIVEC_BUILTIN_VRFIP];
-	  break;
-	case BUILT_IN_FLOOR:
-	  if (VECTOR_UNIT_VSX_P (V2DFmode)
-	      && out_mode == DFmode && out_n == 2
-	      && in_mode == DFmode && in_n == 2)
-	    return rs6000_builtin_decls[VSX_BUILTIN_XVRDPIM];
-	  break;
-	case BUILT_IN_FLOORF:
-	  if (out_mode != SFmode || out_n != 4
-	      || in_mode != SFmode || in_n != 4)
-	    break;
-	  if (VECTOR_UNIT_VSX_P (V4SFmode))
-	    return rs6000_builtin_decls[VSX_BUILTIN_XVRSPIM];
-	  if (VECTOR_UNIT_ALTIVEC_P (V4SFmode))
-	    return rs6000_builtin_decls[ALTIVEC_BUILTIN_VRFIM];
-	  break;
-	case BUILT_IN_FMA:
-	  if (VECTOR_UNIT_VSX_P (V2DFmode)
-	      && out_mode == DFmode && out_n == 2
-	      && in_mode == DFmode && in_n == 2)
-	    return rs6000_builtin_decls[VSX_BUILTIN_XVMADDDP];
-	  break;
-	case BUILT_IN_FMAF:
-	  if (VECTOR_UNIT_VSX_P (V4SFmode)
-	      && out_mode == SFmode && out_n == 4
-	      && in_mode == SFmode && in_n == 4)
-	    return rs6000_builtin_decls[VSX_BUILTIN_XVMADDSP];
-	  else if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
-	      && out_mode == SFmode && out_n == 4
-	      && in_mode == SFmode && in_n == 4)
-	    return rs6000_builtin_decls[ALTIVEC_BUILTIN_VMADDFP];
-	  break;
-	case BUILT_IN_TRUNC:
-	  if (VECTOR_UNIT_VSX_P (V2DFmode)
-	      && out_mode == DFmode && out_n == 2
-	      && in_mode == DFmode && in_n == 2)
-	    return rs6000_builtin_decls[VSX_BUILTIN_XVRDPIZ];
-	  break;
-	case BUILT_IN_TRUNCF:
-	  if (out_mode != SFmode || out_n != 4
-	      || in_mode != SFmode || in_n != 4)
-	    break;
-	  if (VECTOR_UNIT_VSX_P (V4SFmode))
-	    return rs6000_builtin_decls[VSX_BUILTIN_XVRSPIZ];
-	  if (VECTOR_UNIT_ALTIVEC_P (V4SFmode))
-	    return rs6000_builtin_decls[ALTIVEC_BUILTIN_VRFIZ];
-	  break;
-	case BUILT_IN_NEARBYINT:
-	  if (VECTOR_UNIT_VSX_P (V2DFmode)
-	      && flag_unsafe_math_optimizations
-	      && out_mode == DFmode && out_n == 2
-	      && in_mode == DFmode && in_n == 2)
-	    return rs6000_builtin_decls[VSX_BUILTIN_XVRDPI];
-	  break;
-	case BUILT_IN_NEARBYINTF:
-	  if (VECTOR_UNIT_VSX_P (V4SFmode)
-	      && flag_unsafe_math_optimizations
-	      && out_mode == SFmode && out_n == 4
-	      && in_mode == SFmode && in_n == 4)
-	    return rs6000_builtin_decls[VSX_BUILTIN_XVRSPI];
-	  break;
-	case BUILT_IN_RINT:
-	  if (VECTOR_UNIT_VSX_P (V2DFmode)
-	      && !flag_trapping_math
-	      && out_mode == DFmode && out_n == 2
-	      && in_mode == DFmode && in_n == 2)
-	    return rs6000_builtin_decls[VSX_BUILTIN_XVRDPIC];
-	  break;
-	case BUILT_IN_RINTF:
-	  if (VECTOR_UNIT_VSX_P (V4SFmode)
-	      && !flag_trapping_math
-	      && out_mode == SFmode && out_n == 4
-	      && in_mode == SFmode && in_n == 4)
-	    return rs6000_builtin_decls[VSX_BUILTIN_XVRSPIC];
-	  break;
-	default:
-	  break;
+	  if (out_mode == QImode && out_n == 16)
+	    return rs6000_builtin_decls[P8V_BUILTIN_VCLZB];
+	  else if (out_mode == HImode && out_n == 8)
+	    return rs6000_builtin_decls[P8V_BUILTIN_VCLZH];
+	  else if (out_mode == SImode && out_n == 4)
+	    return rs6000_builtin_decls[P8V_BUILTIN_VCLZW];
+	  else if (out_mode == DImode && out_n == 2)
+	    return rs6000_builtin_decls[P8V_BUILTIN_VCLZD];
 	}
-    }
-
-  else if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_MD)
-    {
-      enum rs6000_builtins fn
-	= (enum rs6000_builtins)DECL_FUNCTION_CODE (fndecl);
-      switch (fn)
-	{
-	case RS6000_BUILTIN_RSQRTF:
-	  if (VECTOR_UNIT_ALTIVEC_OR_VSX_P (V4SFmode)
-	      && out_mode == SFmode && out_n == 4
-	      && in_mode == SFmode && in_n == 4)
-	    return rs6000_builtin_decls[ALTIVEC_BUILTIN_VRSQRTFP];
-	  break;
-	case RS6000_BUILTIN_RSQRT:
-	  if (VECTOR_UNIT_VSX_P (V2DFmode)
-	      && out_mode == DFmode && out_n == 2
-	      && in_mode == DFmode && in_n == 2)
-	    return rs6000_builtin_decls[VSX_BUILTIN_RSQRT_2DF];
-	  break;
-	case RS6000_BUILTIN_RECIPF:
-	  if (VECTOR_UNIT_ALTIVEC_OR_VSX_P (V4SFmode)
-	      && out_mode == SFmode && out_n == 4
-	      && in_mode == SFmode && in_n == 4)
-	    return rs6000_builtin_decls[ALTIVEC_BUILTIN_VRECIPFP];
-	  break;
-	case RS6000_BUILTIN_RECIP:
-	  if (VECTOR_UNIT_VSX_P (V2DFmode)
-	      && out_mode == DFmode && out_n == 2
-	      && in_mode == DFmode && in_n == 2)
-	    return rs6000_builtin_decls[VSX_BUILTIN_RECIP_V2DF];
-	  break;
-	default:
-	  break;
+      break;
+    CASE_CFN_COPYSIGN:
+      if (VECTOR_UNIT_VSX_P (V2DFmode)
+	  && out_mode == DFmode && out_n == 2
+	  && in_mode == DFmode && in_n == 2)
+	return rs6000_builtin_decls[VSX_BUILTIN_CPSGNDP];
+      if (VECTOR_UNIT_VSX_P (V4SFmode)
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
+	return rs6000_builtin_decls[VSX_BUILTIN_CPSGNSP];
+      if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
+	return rs6000_builtin_decls[ALTIVEC_BUILTIN_COPYSIGN_V4SF];
+      break;
+    CASE_CFN_POPCOUNT:
+      if (TARGET_P8_VECTOR && in_mode == out_mode && out_n == in_n)
+	{
+	  if (out_mode == QImode && out_n == 16)
+	    return rs6000_builtin_decls[P8V_BUILTIN_VPOPCNTB];
+	  else if (out_mode == HImode && out_n == 8)
+	    return rs6000_builtin_decls[P8V_BUILTIN_VPOPCNTH];
+	  else if (out_mode == SImode && out_n == 4)
+	    return rs6000_builtin_decls[P8V_BUILTIN_VPOPCNTW];
+	  else if (out_mode == DImode && out_n == 2)
+	    return rs6000_builtin_decls[P8V_BUILTIN_VPOPCNTD];
 	}
+      break;
+    CASE_CFN_SQRT:
+      if (VECTOR_UNIT_VSX_P (V2DFmode)
+	  && out_mode == DFmode && out_n == 2
+	  && in_mode == DFmode && in_n == 2)
+	return rs6000_builtin_decls[VSX_BUILTIN_XVSQRTDP];
+      if (VECTOR_UNIT_VSX_P (V4SFmode)
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
+	return rs6000_builtin_decls[VSX_BUILTIN_XVSQRTSP];
+      break;
+    CASE_CFN_CEIL:
+      if (VECTOR_UNIT_VSX_P (V2DFmode)
+	  && out_mode == DFmode && out_n == 2
+	  && in_mode == DFmode && in_n == 2)
+	return rs6000_builtin_decls[VSX_BUILTIN_XVRDPIP];
+      if (VECTOR_UNIT_VSX_P (V4SFmode)
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
+	return rs6000_builtin_decls[VSX_BUILTIN_XVRSPIP];
+      if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
+	return rs6000_builtin_decls[ALTIVEC_BUILTIN_VRFIP];
+      break;
+    CASE_CFN_FLOOR:
+      if (VECTOR_UNIT_VSX_P (V2DFmode)
+	  && out_mode == DFmode && out_n == 2
+	  && in_mode == DFmode && in_n == 2)
+	return rs6000_builtin_decls[VSX_BUILTIN_XVRDPIM];
+      if (VECTOR_UNIT_VSX_P (V4SFmode)
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
+	return rs6000_builtin_decls[VSX_BUILTIN_XVRSPIM];
+      if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
+	return rs6000_builtin_decls[ALTIVEC_BUILTIN_VRFIM];
+      break;
+    CASE_CFN_FMA:
+      if (VECTOR_UNIT_VSX_P (V2DFmode)
+	  && out_mode == DFmode && out_n == 2
+	  && in_mode == DFmode && in_n == 2)
+	return rs6000_builtin_decls[VSX_BUILTIN_XVMADDDP];
+      if (VECTOR_UNIT_VSX_P (V4SFmode)
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
+	return rs6000_builtin_decls[VSX_BUILTIN_XVMADDSP];
+      if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
+	return rs6000_builtin_decls[ALTIVEC_BUILTIN_VMADDFP];
+      break;
+    CASE_CFN_TRUNC:
+      if (VECTOR_UNIT_VSX_P (V2DFmode)
+	  && out_mode == DFmode && out_n == 2
+	  && in_mode == DFmode && in_n == 2)
+	return rs6000_builtin_decls[VSX_BUILTIN_XVRDPIZ];
+      if (VECTOR_UNIT_VSX_P (V4SFmode)
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
+	return rs6000_builtin_decls[VSX_BUILTIN_XVRSPIZ];
+      if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
+	return rs6000_builtin_decls[ALTIVEC_BUILTIN_VRFIZ];
+      break;
+    CASE_CFN_NEARBYINT:
+      if (VECTOR_UNIT_VSX_P (V2DFmode)
+	  && flag_unsafe_math_optimizations
+	  && out_mode == DFmode && out_n == 2
+	  && in_mode == DFmode && in_n == 2)
+	return rs6000_builtin_decls[VSX_BUILTIN_XVRDPI];
+      if (VECTOR_UNIT_VSX_P (V4SFmode)
+	  && flag_unsafe_math_optimizations
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
+	return rs6000_builtin_decls[VSX_BUILTIN_XVRSPI];
+      break;
+    CASE_CFN_RINT:
+      if (VECTOR_UNIT_VSX_P (V2DFmode)
+	  && !flag_trapping_math
+	  && out_mode == DFmode && out_n == 2
+	  && in_mode == DFmode && in_n == 2)
+	return rs6000_builtin_decls[VSX_BUILTIN_XVRDPIC];
+      if (VECTOR_UNIT_VSX_P (V4SFmode)
+	  && !flag_trapping_math
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
+	return rs6000_builtin_decls[VSX_BUILTIN_XVRSPIC];
+      break;
+    default:
+      break;
     }
 
   /* Generate calls to libmass if appropriate.  */
   if (rs6000_veclib_handler)
-    return rs6000_veclib_handler (fndecl, type_out, type_in);
+    return rs6000_veclib_handler (combined_fn (fn), type_out, type_in);
+
+  return NULL_TREE;
+}
+
+/* Implement TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION.  */
 
+static tree
+rs6000_builtin_md_vectorized_function (tree fndecl, tree type_out,
+				       tree type_in)
+{
+  machine_mode in_mode, out_mode;
+  int in_n, out_n;
+
+  if (TARGET_DEBUG_BUILTIN)
+    fprintf (stderr, "rs6000_builtin_md_vectorized_function (%s, %s, %s)\n",
+	     IDENTIFIER_POINTER (DECL_NAME (fndecl)),
+	     GET_MODE_NAME (TYPE_MODE (type_out)),
+	     GET_MODE_NAME (TYPE_MODE (type_in)));
+
+  if (TREE_CODE (type_out) != VECTOR_TYPE
+      || TREE_CODE (type_in) != VECTOR_TYPE
+      || !TARGET_VECTORIZE_BUILTINS)
+    return NULL_TREE;
+
+  out_mode = TYPE_MODE (TREE_TYPE (type_out));
+  out_n = TYPE_VECTOR_SUBPARTS (type_out);
+  in_mode = TYPE_MODE (TREE_TYPE (type_in));
+  in_n = TYPE_VECTOR_SUBPARTS (type_in);
+
+  enum rs6000_builtins fn
+    = (enum rs6000_builtins) DECL_FUNCTION_CODE (fndecl);
+  switch (fn)
+    {
+    case RS6000_BUILTIN_RSQRTF:
+      if (VECTOR_UNIT_ALTIVEC_OR_VSX_P (V4SFmode)
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
+	return rs6000_builtin_decls[ALTIVEC_BUILTIN_VRSQRTFP];
+      break;
+    case RS6000_BUILTIN_RSQRT:
+      if (VECTOR_UNIT_VSX_P (V2DFmode)
+	  && out_mode == DFmode && out_n == 2
+	  && in_mode == DFmode && in_n == 2)
+	return rs6000_builtin_decls[VSX_BUILTIN_RSQRT_2DF];
+      break;
+    case RS6000_BUILTIN_RECIPF:
+      if (VECTOR_UNIT_ALTIVEC_OR_VSX_P (V4SFmode)
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
+	return rs6000_builtin_decls[ALTIVEC_BUILTIN_VRECIPFP];
+      break;
+    case RS6000_BUILTIN_RECIP:
+      if (VECTOR_UNIT_VSX_P (V2DFmode)
+	  && out_mode == DFmode && out_n == 2
+	  && in_mode == DFmode && in_n == 2)
+	return rs6000_builtin_decls[VSX_BUILTIN_RECIP_V2DF];
+      break;
+    default:
+      break;
+    }
   return NULL_TREE;
 }
 \f
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index f394db7..20a77d1 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -5668,11 +5668,17 @@ If this hook is defined, the autovectorizer will use the
 conversion. Otherwise, it will return @code{NULL_TREE}.
 @end deftypefn
 
-@deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION (tree @var{fndecl}, tree @var{vec_type_out}, tree @var{vec_type_in})
+@deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION (unsigned @var{code}, tree @var{vec_type_out}, tree @var{vec_type_in})
 This hook should return the decl of a function that implements the
-vectorized variant of the builtin function with builtin function code
+vectorized variant of the function with the @code{combined_fn} code
 @var{code} or @code{NULL_TREE} if such a function is not available.
-The value of @var{fndecl} is the builtin function declaration.  The
+The return type of the vectorized function shall be of vector type
+@var{vec_type_out} and the argument types should be @var{vec_type_in}.
+@end deftypefn
+
+@deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION (tree @var{fndecl}, tree @var{vec_type_out}, tree @var{vec_type_in})
+This hook should return the decl of a function that implements the
+vectorized variant of target built-in function @code{fndecl}.  The
 return type of the vectorized function shall be of vector type
 @var{vec_type_out} and the argument types should be @var{vec_type_in}.
 @end deftypefn
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index d188c57..b1c6d1e 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4230,6 +4230,8 @@ address;  but often a machine-dependent strategy can generate better code.
 
 @hook TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
 
+@hook TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION
+
 @hook TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT
 
 @hook TARGET_VECTORIZE_PREFERRED_SIMD_MODE
diff --git a/gcc/target.def b/gcc/target.def
index c7ec292..dddbd2c 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1728,18 +1728,28 @@ the argument @var{OFF} to @code{REALIGN_LOAD}, in which case the low\n\
 log2(@var{VS}) @minus{} 1 bits of @var{addr} will be considered.",
  tree, (void), NULL)
 
-/* Returns a code for builtin that realizes vectorized version of
-   function, or NULL_TREE if not available.  */
+/* Returns a built-in function that realizes the vectorized version of
+   a target-independent function, or NULL_TREE if not available.  */
 DEFHOOK
 (builtin_vectorized_function,
  "This hook should return the decl of a function that implements the\n\
-vectorized variant of the builtin function with builtin function code\n\
+vectorized variant of the function with the @code{combined_fn} code\n\
 @var{code} or @code{NULL_TREE} if such a function is not available.\n\
-The value of @var{fndecl} is the builtin function declaration.  The\n\
+The return type of the vectorized function shall be of vector type\n\
+@var{vec_type_out} and the argument types should be @var{vec_type_in}.",
+ tree, (unsigned code, tree vec_type_out, tree vec_type_in),
+ default_builtin_vectorized_function)
+
+/* Returns a built-in function that realizes the vectorized version of
+   a target-specific function, or NULL_TREE if not available.  */
+DEFHOOK
+(builtin_md_vectorized_function,
+ "This hook should return the decl of a function that implements the\n\
+vectorized variant of target built-in function @code{fndecl}.  The\n\
 return type of the vectorized function shall be of vector type\n\
 @var{vec_type_out} and the argument types should be @var{vec_type_in}.",
  tree, (tree fndecl, tree vec_type_out, tree vec_type_in),
- default_builtin_vectorized_function)
+ default_builtin_md_vectorized_function)
 
 /* Returns a function declaration for a builtin that realizes the
    vector conversion, or NULL_TREE if not available.  */
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 14324b7..7852670 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -533,9 +533,15 @@ default_invalid_within_doloop (const rtx_insn *insn)
 /* Mapping of builtin functions to vectorized variants.  */
 
 tree
-default_builtin_vectorized_function (tree fndecl ATTRIBUTE_UNUSED,
-				     tree type_out ATTRIBUTE_UNUSED,
-				     tree type_in ATTRIBUTE_UNUSED)
+default_builtin_vectorized_function (unsigned int, tree, tree)
+{
+  return NULL_TREE;
+}
+
+/* Mapping of target builtin functions to vectorized variants.  */
+
+tree
+default_builtin_md_vectorized_function (tree, tree, tree)
 {
   return NULL_TREE;
 }
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index a8e7ebb..ea263da 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -83,7 +83,8 @@ extern bool default_has_ifunc_p (void);
 
 extern const char * default_invalid_within_doloop (const rtx_insn *);
 
-extern tree default_builtin_vectorized_function (tree, tree, tree);
+extern tree default_builtin_vectorized_function (unsigned int, tree, tree);
+extern tree default_builtin_md_vectorized_function (tree, tree, tree);
 
 extern tree default_builtin_vectorized_conversion (unsigned int, tree, tree);
 
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 51dff9e..75389c4 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1639,20 +1639,20 @@ vect_finish_stmt_generation (gimple *stmt, gimple *vec_stmt,
 tree
 vectorizable_function (gcall *call, tree vectype_out, tree vectype_in)
 {
-  tree fndecl = gimple_call_fndecl (call);
-
-  /* We only handle functions that do not read or clobber memory -- i.e.
-     const or novops ones.  */
-  if (!(gimple_call_flags (call) & (ECF_CONST | ECF_NOVOPS)))
+  /* We only handle functions that do not read or clobber memory.  */
+  if (gimple_vuse (call))
     return NULL_TREE;
 
-  if (!fndecl
-      || TREE_CODE (fndecl) != FUNCTION_DECL
-      || !DECL_BUILT_IN (fndecl))
-    return NULL_TREE;
+  combined_fn fn = gimple_call_combined_fn (call);
+  if (fn != CFN_LAST)
+    return targetm.vectorize.builtin_vectorized_function
+      (fn, vectype_out, vectype_in);
+
+  if (gimple_call_builtin_p (call, BUILT_IN_MD))
+    return targetm.vectorize.builtin_md_vectorized_function
+      (gimple_call_fndecl (call), vectype_out, vectype_in);
 
-  return targetm.vectorize.builtin_vectorized_function (fndecl, vectype_out,
-						        vectype_in);
+  return NULL_TREE;
 }
 
 

[-- Attachment #3: with-b.patch --]
[-- Type: text/x-diff, Size: 47765 bytes --]

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index 6b4208f..c4cda4f 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -38,6 +38,7 @@
 #include "expr.h"
 #include "langhooks.h"
 #include "gimple-iterator.h"
+#include "case-cfn-macros.h"
 
 #define v8qi_UP  V8QImode
 #define v4hi_UP  V4HImode
@@ -1258,7 +1259,8 @@ aarch64_expand_builtin (tree exp,
 }
 
 tree
-aarch64_builtin_vectorized_function (tree fndecl, tree type_out, tree type_in)
+aarch64_builtin_vectorized_function (unsigned int fn, tree type_out,
+				     tree type_in)
 {
   machine_mode in_mode, out_mode;
   int in_n, out_n;
@@ -1282,44 +1284,35 @@ aarch64_builtin_vectorized_function (tree fndecl, tree type_out, tree type_in)
 	: (AARCH64_CHECK_BUILTIN_MODE (2, S) \
 	   ? aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOP_##N##v2sf] \
 	   : NULL_TREE)))
-  if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
-    {
-      enum built_in_function fn = DECL_FUNCTION_CODE (fndecl);
   switch (fn)
     {
 #undef AARCH64_CHECK_BUILTIN_MODE
 #define AARCH64_CHECK_BUILTIN_MODE(C, N) \
   (out_mode == N##Fmode && out_n == C \
    && in_mode == N##Fmode && in_n == C)
-	case BUILT_IN_FLOOR:
-	case BUILT_IN_FLOORF:
+    CASE_CFN_FLOOR:
       return AARCH64_FIND_FRINT_VARIANT (floor);
-	case BUILT_IN_CEIL:
-	case BUILT_IN_CEILF:
+    CASE_CFN_CEIL:
       return AARCH64_FIND_FRINT_VARIANT (ceil);
-	case BUILT_IN_TRUNC:
-	case BUILT_IN_TRUNCF:
+    CASE_CFN_TRUNC:
       return AARCH64_FIND_FRINT_VARIANT (btrunc);
-	case BUILT_IN_ROUND:
-	case BUILT_IN_ROUNDF:
+    CASE_CFN_ROUND:
       return AARCH64_FIND_FRINT_VARIANT (round);
-	case BUILT_IN_NEARBYINT:
-	case BUILT_IN_NEARBYINTF:
+    CASE_CFN_NEARBYINT:
       return AARCH64_FIND_FRINT_VARIANT (nearbyint);
-	case BUILT_IN_SQRT:
-	case BUILT_IN_SQRTF:
+    CASE_CFN_SQRT:
       return AARCH64_FIND_FRINT_VARIANT (sqrt);
 #undef AARCH64_CHECK_BUILTIN_MODE
 #define AARCH64_CHECK_BUILTIN_MODE(C, N) \
   (out_mode == SImode && out_n == C \
    && in_mode == N##Imode && in_n == C)
-        case BUILT_IN_CLZ:
+    CASE_CFN_CLZ:
       {
 	if (AARCH64_CHECK_BUILTIN_MODE (4, S))
 	  return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOP_clzv4si];
 	return NULL_TREE;
       }
-	case BUILT_IN_CTZ:
+    CASE_CFN_CTZ:
       {
 	if (AARCH64_CHECK_BUILTIN_MODE (2, S))
 	  return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOP_ctzv2si];
@@ -1331,10 +1324,9 @@ aarch64_builtin_vectorized_function (tree fndecl, tree type_out, tree type_in)
 #define AARCH64_CHECK_BUILTIN_MODE(C, N) \
   (out_mode == N##Imode && out_n == C \
    && in_mode == N##Fmode && in_n == C)
-	case BUILT_IN_LFLOOR:
-	case BUILT_IN_LFLOORF:
-	case BUILT_IN_LLFLOOR:
-	case BUILT_IN_IFLOORF:
+    CASE_CFN_IFLOOR:
+    CASE_CFN_LFLOOR:
+    CASE_CFN_LLFLOOR:
       {
 	enum aarch64_builtins builtin;
 	if (AARCH64_CHECK_BUILTIN_MODE (2, D))
@@ -1348,10 +1340,9 @@ aarch64_builtin_vectorized_function (tree fndecl, tree type_out, tree type_in)
 
 	return aarch64_builtin_decls[builtin];
       }
-	case BUILT_IN_LCEIL:
-	case BUILT_IN_LCEILF:
-	case BUILT_IN_LLCEIL:
-	case BUILT_IN_ICEILF:
+    CASE_CFN_ICEIL:
+    CASE_CFN_LCEIL:
+    CASE_CFN_LLCEIL:
       {
 	enum aarch64_builtins builtin;
 	if (AARCH64_CHECK_BUILTIN_MODE (2, D))
@@ -1365,8 +1356,9 @@ aarch64_builtin_vectorized_function (tree fndecl, tree type_out, tree type_in)
 
 	return aarch64_builtin_decls[builtin];
       }
-	case BUILT_IN_LROUND:
-	case BUILT_IN_IROUNDF:
+    CASE_CFN_IROUND:
+    CASE_CFN_LROUND:
+    CASE_CFN_LLROUND:
       {
 	enum aarch64_builtins builtin;
 	if (AARCH64_CHECK_BUILTIN_MODE (2, D))
@@ -1380,7 +1372,7 @@ aarch64_builtin_vectorized_function (tree fndecl, tree type_out, tree type_in)
 
 	return aarch64_builtin_decls[builtin];
       }
-	case BUILT_IN_BSWAP16:
+    case CFN_BUILT_IN_BSWAP16:
 #undef AARCH64_CHECK_BUILTIN_MODE
 #define AARCH64_CHECK_BUILTIN_MODE(C, N) \
   (out_mode == N##Imode && out_n == C \
@@ -1391,14 +1383,14 @@ aarch64_builtin_vectorized_function (tree fndecl, tree type_out, tree type_in)
 	return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOPU_bswapv8hi];
       else
 	return NULL_TREE;
-	case BUILT_IN_BSWAP32:
+    case CFN_BUILT_IN_BSWAP32:
       if (AARCH64_CHECK_BUILTIN_MODE (2, S))
 	return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOPU_bswapv2si];
       else if (AARCH64_CHECK_BUILTIN_MODE (4, S))
 	return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOPU_bswapv4si];
       else
 	return NULL_TREE;
-	case BUILT_IN_BSWAP64:
+    case CFN_BUILT_IN_BSWAP64:
       if (AARCH64_CHECK_BUILTIN_MODE (2, D))
 	return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOPU_bswapv2di];
       else
@@ -1406,7 +1398,6 @@ aarch64_builtin_vectorized_function (tree fndecl, tree type_out, tree type_in)
     default:
       return NULL_TREE;
     }
-    }
 
   return NULL_TREE;
 }
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 0f20f60..c77dbbf 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -407,10 +407,7 @@ tree aarch64_builtin_decl (unsigned, bool ATTRIBUTE_UNUSED);
 
 tree aarch64_builtin_rsqrt (unsigned int, bool);
 
-tree
-aarch64_builtin_vectorized_function (tree fndecl,
-				     tree type_out,
-				     tree type_in);
+tree aarch64_builtin_vectorized_function (unsigned int, tree, tree);
 
 extern void aarch64_split_combinev16qi (rtx operands[3]);
 extern void aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel);
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index bad3dc3..ee2e7b0 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -35,6 +35,7 @@
 #include "explow.h"
 #include "expr.h"
 #include "langhooks.h"
+#include "case-cfn-macros.h"
 
 #define SIMD_MAX_BUILTIN_ARGS 5
 
@@ -2812,7 +2813,7 @@ arm_expand_builtin (tree exp,
 }
 
 tree
-arm_builtin_vectorized_function (tree fndecl, tree type_out, tree type_in)
+arm_builtin_vectorized_function (unsigned int fn, tree type_out, tree type_in)
 {
   machine_mode in_mode, out_mode;
   int in_n, out_n;
@@ -2849,18 +2850,15 @@ arm_builtin_vectorized_function (tree fndecl, tree type_out, tree type_in)
       ? arm_builtin_decl(ARM_BUILTIN_NEON_##N##v4sf, false) \
       : NULL_TREE))
 
-  if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
-    {
-      enum built_in_function fn = DECL_FUNCTION_CODE (fndecl);
   switch (fn)
     {
-          case BUILT_IN_FLOORF:
+    CASE_CFN_FLOOR:
       return ARM_FIND_VRINT_VARIANT (vrintm);
-          case BUILT_IN_CEILF:
+    CASE_CFN_CEIL:
       return ARM_FIND_VRINT_VARIANT (vrintp);
-          case BUILT_IN_TRUNCF:
+    CASE_CFN_TRUNC:
       return ARM_FIND_VRINT_VARIANT (vrintz);
-          case BUILT_IN_ROUNDF:
+    CASE_CFN_ROUND:
       return ARM_FIND_VRINT_VARIANT (vrinta);
 #undef ARM_CHECK_BUILTIN_MODE_1
 #define ARM_CHECK_BUILTIN_MODE_1(C) \
@@ -2880,42 +2878,42 @@ arm_builtin_vectorized_function (tree fndecl, tree type_out, tree type_in)
    : (ARM_CHECK_BUILTIN_MODE (4) \
      ? arm_builtin_decl(ARM_BUILTIN_NEON_##N##uv4sfv4si, false) \
      : NULL_TREE))
-          case BUILT_IN_LROUNDF:
-            return out_unsigned_p
+    CASE_CFN_LROUND:
+      return (out_unsigned_p
 	      ? ARM_FIND_VCVTU_VARIANT (vcvta)
-                     : ARM_FIND_VCVT_VARIANT (vcvta);
-          case BUILT_IN_LCEILF:
-            return out_unsigned_p
+	      : ARM_FIND_VCVT_VARIANT (vcvta));
+    CASE_CFN_LCEIL:
+      return (out_unsigned_p
 	      ? ARM_FIND_VCVTU_VARIANT (vcvtp)
-                     : ARM_FIND_VCVT_VARIANT (vcvtp);
-          case BUILT_IN_LFLOORF:
-            return out_unsigned_p
+	      : ARM_FIND_VCVT_VARIANT (vcvtp));
+    CASE_CFN_LFLOOR:
+      return (out_unsigned_p
 	      ? ARM_FIND_VCVTU_VARIANT (vcvtm)
-                     : ARM_FIND_VCVT_VARIANT (vcvtm);
+	      : ARM_FIND_VCVT_VARIANT (vcvtm));
 #undef ARM_CHECK_BUILTIN_MODE
 #define ARM_CHECK_BUILTIN_MODE(C, N) \
   (out_mode == N##mode && out_n == C \
    && in_mode == N##mode && in_n == C)
-          case BUILT_IN_BSWAP16:
+    case CFN_BUILT_IN_BSWAP16:
       if (ARM_CHECK_BUILTIN_MODE (4, HI))
 	return arm_builtin_decl (ARM_BUILTIN_NEON_bswapv4hi, false);
       else if (ARM_CHECK_BUILTIN_MODE (8, HI))
 	return arm_builtin_decl (ARM_BUILTIN_NEON_bswapv8hi, false);
       else
 	return NULL_TREE;
-          case BUILT_IN_BSWAP32:
+    case CFN_BUILT_IN_BSWAP32:
       if (ARM_CHECK_BUILTIN_MODE (2, SI))
 	return arm_builtin_decl (ARM_BUILTIN_NEON_bswapv2si, false);
       else if (ARM_CHECK_BUILTIN_MODE (4, SI))
 	return arm_builtin_decl (ARM_BUILTIN_NEON_bswapv4si, false);
       else
 	return NULL_TREE;
-          case BUILT_IN_BSWAP64:
+    case CFN_BUILT_IN_BSWAP64:
       if (ARM_CHECK_BUILTIN_MODE (2, DI))
 	return arm_builtin_decl (ARM_BUILTIN_NEON_bswapv2di, false);
       else
 	return NULL_TREE;
-	  case BUILT_IN_COPYSIGNF:
+    CASE_CFN_COPYSIGN:
       if (ARM_CHECK_BUILTIN_MODE (2, SF))
 	return arm_builtin_decl (ARM_BUILTIN_NEON_copysignfv2sf, false);
       else if (ARM_CHECK_BUILTIN_MODE (4, SF))
@@ -2926,7 +2924,6 @@ arm_builtin_vectorized_function (tree fndecl, tree type_out, tree type_in)
     default:
       return NULL_TREE;
     }
-    }
   return NULL_TREE;
 }
 #undef ARM_FIND_VCVT_VARIANT
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index f9b1276..10c96b2 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -84,7 +84,7 @@ extern char *neon_output_shift_immediate (const char *, char, rtx *,
 extern void neon_pairwise_reduce (rtx, rtx, machine_mode,
 				  rtx (*) (rtx, rtx, rtx));
 extern rtx neon_make_constant (rtx);
-extern tree arm_builtin_vectorized_function (tree, tree, tree);
+extern tree arm_builtin_vectorized_function (unsigned int, tree, tree);
 extern void neon_expand_vector_init (rtx, rtx);
 extern void neon_lane_bounds (rtx, HOST_WIDE_INT, HOST_WIDE_INT, const_tree);
 extern void neon_const_bounds (rtx, HOST_WIDE_INT, HOST_WIDE_INT);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index bb37aba..a1d59a5 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -73,6 +73,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-chkp.h"
 #include "rtl-chkp.h"
 #include "dbgcnt.h"
+#include "case-cfn-macros.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -2611,10 +2612,10 @@ static int ix86_tune_defaulted;
 static int ix86_arch_specified;
 
 /* Vectorization library interface and handlers.  */
-static tree (*ix86_veclib_handler) (enum built_in_function, tree, tree);
+static tree (*ix86_veclib_handler) (combined_fn, tree, tree);
 
-static tree ix86_veclibabi_svml (enum built_in_function, tree, tree);
-static tree ix86_veclibabi_acml (enum built_in_function, tree, tree);
+static tree ix86_veclibabi_svml (combined_fn, tree, tree);
+static tree ix86_veclibabi_acml (combined_fn, tree, tree);
 
 /* Processor target table, indexed by processor number */
 struct ptt
@@ -41723,21 +41724,19 @@ ix86_store_returned_bounds (rtx slot, rtx bounds)
   emit_move_insn (slot, bounds);
 }
 
-/* Returns a function decl for a vectorized version of the builtin function
-   with builtin function code FN and the result vector type TYPE, or NULL_TREE
+/* Returns a function decl for a vectorized version of the combined function
+   with combined_fn code FN and the result vector type TYPE, or NULL_TREE
    if it is not available.  */
 
 static tree
-ix86_builtin_vectorized_function (tree fndecl, tree type_out,
+ix86_builtin_vectorized_function (unsigned int fn, tree type_out,
 				  tree type_in)
 {
   machine_mode in_mode, out_mode;
   int in_n, out_n;
-  enum built_in_function fn = DECL_FUNCTION_CODE (fndecl);
 
   if (TREE_CODE (type_out) != VECTOR_TYPE
-      || TREE_CODE (type_in) != VECTOR_TYPE
-      || DECL_BUILT_IN_CLASS (fndecl) != BUILT_IN_NORMAL)
+      || TREE_CODE (type_in) != VECTOR_TYPE)
     return NULL_TREE;
 
   out_mode = TYPE_MODE (TREE_TYPE (type_out));
@@ -41747,7 +41746,7 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 
   switch (fn)
     {
-    case BUILT_IN_SQRT:
+    CASE_CFN_SQRT:
       if (out_mode == DFmode && in_mode == DFmode)
 	{
 	  if (out_n == 2 && in_n == 2)
@@ -41757,17 +41756,6 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	  else if (out_n == 8 && in_n == 8)
 	    return ix86_get_builtin (IX86_BUILTIN_SQRTPD512);
 	}
-      break;
-
-    case BUILT_IN_EXP2F:
-      if (out_mode == SFmode && in_mode == SFmode)
-	{
-	  if (out_n == 16 && in_n == 16)
-	    return ix86_get_builtin (IX86_BUILTIN_EXP2PS);
-	}
-      break;
-
-    case BUILT_IN_SQRTF:
       if (out_mode == SFmode && in_mode == SFmode)
 	{
 	  if (out_n == 4 && in_n == 4)
@@ -41779,9 +41767,17 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	}
       break;
 
-    case BUILT_IN_IFLOOR:
-    case BUILT_IN_LFLOOR:
-    case BUILT_IN_LLFLOOR:
+    CASE_CFN_EXP2:
+      if (out_mode == SFmode && in_mode == SFmode)
+	{
+	  if (out_n == 16 && in_n == 16)
+	    return ix86_get_builtin (IX86_BUILTIN_EXP2PS);
+	}
+      break;
+
+    CASE_CFN_IFLOOR:
+    CASE_CFN_LFLOOR:
+    CASE_CFN_LLFLOOR:
       /* The round insn does not trap on denormals.  */
       if (flag_trapping_math || !TARGET_ROUND)
 	break;
@@ -41795,15 +41791,6 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	  else if (out_n == 16 && in_n == 8)
 	    return ix86_get_builtin (IX86_BUILTIN_FLOORPD_VEC_PACK_SFIX512);
 	}
-      break;
-
-    case BUILT_IN_IFLOORF:
-    case BUILT_IN_LFLOORF:
-    case BUILT_IN_LLFLOORF:
-      /* The round insn does not trap on denormals.  */
-      if (flag_trapping_math || !TARGET_ROUND)
-	break;
-
       if (out_mode == SImode && in_mode == SFmode)
 	{
 	  if (out_n == 4 && in_n == 4)
@@ -41813,9 +41800,9 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	}
       break;
 
-    case BUILT_IN_ICEIL:
-    case BUILT_IN_LCEIL:
-    case BUILT_IN_LLCEIL:
+    CASE_CFN_ICEIL:
+    CASE_CFN_LCEIL:
+    CASE_CFN_LLCEIL:
       /* The round insn does not trap on denormals.  */
       if (flag_trapping_math || !TARGET_ROUND)
 	break;
@@ -41829,15 +41816,6 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	  else if (out_n == 16 && in_n == 8)
 	    return ix86_get_builtin (IX86_BUILTIN_CEILPD_VEC_PACK_SFIX512);
 	}
-      break;
-
-    case BUILT_IN_ICEILF:
-    case BUILT_IN_LCEILF:
-    case BUILT_IN_LLCEILF:
-      /* The round insn does not trap on denormals.  */
-      if (flag_trapping_math || !TARGET_ROUND)
-	break;
-
       if (out_mode == SImode && in_mode == SFmode)
 	{
 	  if (out_n == 4 && in_n == 4)
@@ -41847,9 +41825,9 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	}
       break;
 
-    case BUILT_IN_IRINT:
-    case BUILT_IN_LRINT:
-    case BUILT_IN_LLRINT:
+    CASE_CFN_IRINT:
+    CASE_CFN_LRINT:
+    CASE_CFN_LLRINT:
       if (out_mode == SImode && in_mode == DFmode)
 	{
 	  if (out_n == 4 && in_n == 2)
@@ -41857,11 +41835,6 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	  else if (out_n == 8 && in_n == 4)
 	    return ix86_get_builtin (IX86_BUILTIN_VEC_PACK_SFIX256);
 	}
-      break;
-
-    case BUILT_IN_IRINTF:
-    case BUILT_IN_LRINTF:
-    case BUILT_IN_LLRINTF:
       if (out_mode == SImode && in_mode == SFmode)
 	{
 	  if (out_n == 4 && in_n == 4)
@@ -41871,9 +41844,9 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	}
       break;
 
-    case BUILT_IN_IROUND:
-    case BUILT_IN_LROUND:
-    case BUILT_IN_LLROUND:
+    CASE_CFN_IROUND:
+    CASE_CFN_LROUND:
+    CASE_CFN_LLROUND:
       /* The round insn does not trap on denormals.  */
       if (flag_trapping_math || !TARGET_ROUND)
 	break;
@@ -41887,15 +41860,6 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	  else if (out_n == 16 && in_n == 8)
 	    return ix86_get_builtin (IX86_BUILTIN_ROUNDPD_AZ_VEC_PACK_SFIX512);
 	}
-      break;
-
-    case BUILT_IN_IROUNDF:
-    case BUILT_IN_LROUNDF:
-    case BUILT_IN_LLROUNDF:
-      /* The round insn does not trap on denormals.  */
-      if (flag_trapping_math || !TARGET_ROUND)
-	break;
-
       if (out_mode == SImode && in_mode == SFmode)
 	{
 	  if (out_n == 4 && in_n == 4)
@@ -41905,7 +41869,7 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	}
       break;
 
-    case BUILT_IN_COPYSIGN:
+    CASE_CFN_COPYSIGN:
       if (out_mode == DFmode && in_mode == DFmode)
 	{
 	  if (out_n == 2 && in_n == 2)
@@ -41915,9 +41879,6 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	  else if (out_n == 8 && in_n == 8)
 	    return ix86_get_builtin (IX86_BUILTIN_CPYSGNPD512);
 	}
-      break;
-
-    case BUILT_IN_COPYSIGNF:
       if (out_mode == SFmode && in_mode == SFmode)
 	{
 	  if (out_n == 4 && in_n == 4)
@@ -41929,7 +41890,7 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	}
       break;
 
-    case BUILT_IN_FLOOR:
+    CASE_CFN_FLOOR:
       /* The round insn does not trap on denormals.  */
       if (flag_trapping_math || !TARGET_ROUND)
 	break;
@@ -41941,13 +41902,6 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	  else if (out_n == 4 && in_n == 4)
 	    return ix86_get_builtin (IX86_BUILTIN_FLOORPD256);
 	}
-      break;
-
-    case BUILT_IN_FLOORF:
-      /* The round insn does not trap on denormals.  */
-      if (flag_trapping_math || !TARGET_ROUND)
-	break;
-
       if (out_mode == SFmode && in_mode == SFmode)
 	{
 	  if (out_n == 4 && in_n == 4)
@@ -41957,7 +41911,7 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	}
       break;
 
-    case BUILT_IN_CEIL:
+    CASE_CFN_CEIL:
       /* The round insn does not trap on denormals.  */
       if (flag_trapping_math || !TARGET_ROUND)
 	break;
@@ -41969,13 +41923,6 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	  else if (out_n == 4 && in_n == 4)
 	    return ix86_get_builtin (IX86_BUILTIN_CEILPD256);
 	}
-      break;
-
-    case BUILT_IN_CEILF:
-      /* The round insn does not trap on denormals.  */
-      if (flag_trapping_math || !TARGET_ROUND)
-	break;
-
       if (out_mode == SFmode && in_mode == SFmode)
 	{
 	  if (out_n == 4 && in_n == 4)
@@ -41985,7 +41932,7 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	}
       break;
 
-    case BUILT_IN_TRUNC:
+    CASE_CFN_TRUNC:
       /* The round insn does not trap on denormals.  */
       if (flag_trapping_math || !TARGET_ROUND)
 	break;
@@ -41997,13 +41944,6 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	  else if (out_n == 4 && in_n == 4)
 	    return ix86_get_builtin (IX86_BUILTIN_TRUNCPD256);
 	}
-      break;
-
-    case BUILT_IN_TRUNCF:
-      /* The round insn does not trap on denormals.  */
-      if (flag_trapping_math || !TARGET_ROUND)
-	break;
-
       if (out_mode == SFmode && in_mode == SFmode)
 	{
 	  if (out_n == 4 && in_n == 4)
@@ -42013,7 +41953,7 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	}
       break;
 
-    case BUILT_IN_RINT:
+    CASE_CFN_RINT:
       /* The round insn does not trap on denormals.  */
       if (flag_trapping_math || !TARGET_ROUND)
 	break;
@@ -42025,13 +41965,6 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	  else if (out_n == 4 && in_n == 4)
 	    return ix86_get_builtin (IX86_BUILTIN_RINTPD256);
 	}
-      break;
-
-    case BUILT_IN_RINTF:
-      /* The round insn does not trap on denormals.  */
-      if (flag_trapping_math || !TARGET_ROUND)
-	break;
-
       if (out_mode == SFmode && in_mode == SFmode)
 	{
 	  if (out_n == 4 && in_n == 4)
@@ -42041,7 +41974,7 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	}
       break;
 
-    case BUILT_IN_ROUND:
+    CASE_CFN_ROUND:
       /* The round insn does not trap on denormals.  */
       if (flag_trapping_math || !TARGET_ROUND)
 	break;
@@ -42053,13 +41986,6 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	  else if (out_n == 4 && in_n == 4)
 	    return ix86_get_builtin (IX86_BUILTIN_ROUNDPD_AZ256);
 	}
-      break;
-
-    case BUILT_IN_ROUNDF:
-      /* The round insn does not trap on denormals.  */
-      if (flag_trapping_math || !TARGET_ROUND)
-	break;
-
       if (out_mode == SFmode && in_mode == SFmode)
 	{
 	  if (out_n == 4 && in_n == 4)
@@ -42069,7 +41995,7 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	}
       break;
 
-    case BUILT_IN_FMA:
+    CASE_CFN_FMA:
       if (out_mode == DFmode && in_mode == DFmode)
 	{
 	  if (out_n == 2 && in_n == 2)
@@ -42077,9 +42003,6 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 	  if (out_n == 4 && in_n == 4)
 	    return ix86_get_builtin (IX86_BUILTIN_VFMADDPD256);
 	}
-      break;
-
-    case BUILT_IN_FMAF:
       if (out_mode == SFmode && in_mode == SFmode)
 	{
 	  if (out_n == 4 && in_n == 4)
@@ -42095,8 +42018,7 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
 
   /* Dispatch to a handler for a vectorization library.  */
   if (ix86_veclib_handler)
-    return ix86_veclib_handler ((enum built_in_function) fn, type_out,
-				type_in);
+    return ix86_veclib_handler (combined_fn (fn), type_out, type_in);
 
   return NULL_TREE;
 }
@@ -42105,7 +42027,7 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out,
    a library with vectorized intrinsics.  */
 
 static tree
-ix86_veclibabi_svml (enum built_in_function fn, tree type_out, tree type_in)
+ix86_veclibabi_svml (combined_fn fn, tree type_out, tree type_in)
 {
   char name[20];
   tree fntype, new_fndecl, args;
@@ -42128,47 +42050,26 @@ ix86_veclibabi_svml (enum built_in_function fn, tree type_out, tree type_in)
 
   switch (fn)
     {
-    case BUILT_IN_EXP:
-    case BUILT_IN_LOG:
-    case BUILT_IN_LOG10:
-    case BUILT_IN_POW:
-    case BUILT_IN_TANH:
-    case BUILT_IN_TAN:
-    case BUILT_IN_ATAN:
-    case BUILT_IN_ATAN2:
-    case BUILT_IN_ATANH:
-    case BUILT_IN_CBRT:
-    case BUILT_IN_SINH:
-    case BUILT_IN_SIN:
-    case BUILT_IN_ASINH:
-    case BUILT_IN_ASIN:
-    case BUILT_IN_COSH:
-    case BUILT_IN_COS:
-    case BUILT_IN_ACOSH:
-    case BUILT_IN_ACOS:
-      if (el_mode != DFmode || n != 2)
-	return NULL_TREE;
-      break;
-
-    case BUILT_IN_EXPF:
-    case BUILT_IN_LOGF:
-    case BUILT_IN_LOG10F:
-    case BUILT_IN_POWF:
-    case BUILT_IN_TANHF:
-    case BUILT_IN_TANF:
-    case BUILT_IN_ATANF:
-    case BUILT_IN_ATAN2F:
-    case BUILT_IN_ATANHF:
-    case BUILT_IN_CBRTF:
-    case BUILT_IN_SINHF:
-    case BUILT_IN_SINF:
-    case BUILT_IN_ASINHF:
-    case BUILT_IN_ASINF:
-    case BUILT_IN_COSHF:
-    case BUILT_IN_COSF:
-    case BUILT_IN_ACOSHF:
-    case BUILT_IN_ACOSF:
-      if (el_mode != SFmode || n != 4)
+    CASE_CFN_EXP:
+    CASE_CFN_LOG:
+    CASE_CFN_LOG10:
+    CASE_CFN_POW:
+    CASE_CFN_TANH:
+    CASE_CFN_TAN:
+    CASE_CFN_ATAN:
+    CASE_CFN_ATAN2:
+    CASE_CFN_ATANH:
+    CASE_CFN_CBRT:
+    CASE_CFN_SINH:
+    CASE_CFN_SIN:
+    CASE_CFN_ASINH:
+    CASE_CFN_ASIN:
+    CASE_CFN_COSH:
+    CASE_CFN_COS:
+    CASE_CFN_ACOSH:
+    CASE_CFN_ACOS:
+      if ((el_mode != DFmode || n != 2)
+	  && (el_mode != SFmode || n != 4))
 	return NULL_TREE;
       break;
 
@@ -42176,11 +42077,12 @@ ix86_veclibabi_svml (enum built_in_function fn, tree type_out, tree type_in)
       return NULL_TREE;
     }
 
-  bname = IDENTIFIER_POINTER (DECL_NAME (builtin_decl_implicit (fn)));
+  tree fndecl = mathfn_built_in (TREE_TYPE (type_in), fn);
+  bname = IDENTIFIER_POINTER (DECL_NAME (fndecl));
 
-  if (fn == BUILT_IN_LOGF)
+  if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_LOGF)
     strcpy (name, "vmlsLn4");
-  else if (fn == BUILT_IN_LOG)
+  else if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_LOG)
     strcpy (name, "vmldLn2");
   else if (n == 4)
     {
@@ -42194,9 +42096,7 @@ ix86_veclibabi_svml (enum built_in_function fn, tree type_out, tree type_in)
   name[4] &= ~0x20;
 
   arity = 0;
-  for (args = DECL_ARGUMENTS (builtin_decl_implicit (fn));
-       args;
-       args = TREE_CHAIN (args))
+  for (args = DECL_ARGUMENTS (fndecl); args; args = TREE_CHAIN (args))
     arity++;
 
   if (arity == 1)
@@ -42219,7 +42119,7 @@ ix86_veclibabi_svml (enum built_in_function fn, tree type_out, tree type_in)
    a library with vectorized intrinsics.  */
 
 static tree
-ix86_veclibabi_acml (enum built_in_function fn, tree type_out, tree type_in)
+ix86_veclibabi_acml (combined_fn fn, tree type_out, tree type_in)
 {
   char name[20] = "__vr.._";
   tree fntype, new_fndecl, args;
@@ -42245,30 +42145,23 @@ ix86_veclibabi_acml (enum built_in_function fn, tree type_out, tree type_in)
 
   switch (fn)
     {
-    case BUILT_IN_SIN:
-    case BUILT_IN_COS:
-    case BUILT_IN_EXP:
-    case BUILT_IN_LOG:
-    case BUILT_IN_LOG2:
-    case BUILT_IN_LOG10:
+    CASE_CFN_SIN:
+    CASE_CFN_COS:
+    CASE_CFN_EXP:
+    CASE_CFN_LOG:
+    CASE_CFN_LOG2:
+    CASE_CFN_LOG10:
+      if (el_mode == DFmode && n == 2)
+	{
 	  name[4] = 'd';
 	  name[5] = '2';
-      if (el_mode != DFmode
-	  || n != 2)
-	return NULL_TREE;
-      break;
-
-    case BUILT_IN_SINF:
-    case BUILT_IN_COSF:
-    case BUILT_IN_EXPF:
-    case BUILT_IN_POWF:
-    case BUILT_IN_LOGF:
-    case BUILT_IN_LOG2F:
-    case BUILT_IN_LOG10F:
+	}
+      else if (el_mode == SFmode && n == 4)
+	{
 	  name[4] = 's';
 	  name[5] = '4';
-      if (el_mode != SFmode
-	  || n != 4)
+	}
+      else
 	return NULL_TREE;
       break;
 
@@ -42276,13 +42169,12 @@ ix86_veclibabi_acml (enum built_in_function fn, tree type_out, tree type_in)
       return NULL_TREE;
     }
 
-  bname = IDENTIFIER_POINTER (DECL_NAME (builtin_decl_implicit (fn)));
+  tree fndecl = mathfn_built_in (TREE_TYPE (type_in), fn);
+  bname = IDENTIFIER_POINTER (DECL_NAME (fndecl));
   sprintf (name + 7, "%s", bname+10);
 
   arity = 0;
-  for (args = DECL_ARGUMENTS (builtin_decl_implicit (fn));
-       args;
-       args = TREE_CHAIN (args))
+  for (args = DECL_ARGUMENTS (fndecl); args; args = TREE_CHAIN (args))
     arity++;
 
   if (arity == 1)
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 8bdd646..26a0410 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -70,6 +70,7 @@
 #if TARGET_MACHO
 #include "gstab.h"  /* for N_SLINE */
 #endif
+#include "case-cfn-macros.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -1077,7 +1078,7 @@ static const struct rs6000_builtin_info_type rs6000_builtin_info[] =
 #undef RS6000_BUILTIN_X
 
 /* Support for -mveclibabi=<xxx> to control which vector library to use.  */
-static tree (*rs6000_veclib_handler) (tree, tree, tree);
+static tree (*rs6000_veclib_handler) (combined_fn, tree, tree);
 
 \f
 static bool rs6000_debug_legitimate_address_p (machine_mode, rtx, bool);
@@ -1087,7 +1088,7 @@ static int rs6000_ra_ever_killed (void);
 static tree rs6000_handle_longcall_attribute (tree *, tree, tree, int, bool *);
 static tree rs6000_handle_altivec_attribute (tree *, tree, tree, int, bool *);
 static tree rs6000_handle_struct_attribute (tree *, tree, tree, int, bool *);
-static tree rs6000_builtin_vectorized_libmass (tree, tree, tree);
+static tree rs6000_builtin_vectorized_libmass (combined_fn, tree, tree);
 static void rs6000_emit_set_long_const (rtx, HOST_WIDE_INT);
 static int rs6000_memory_move_cost (machine_mode, reg_class_t, bool);
 static bool rs6000_debug_rtx_costs (rtx, machine_mode, int, int, int *, bool);
@@ -1576,6 +1577,10 @@ static const struct attribute_spec rs6000_attribute_table[] =
 #define TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION \
   rs6000_builtin_vectorized_function
 
+#undef TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION
+#define TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION \
+  rs6000_builtin_md_vectorized_function
+
 #if !TARGET_MACHO
 #undef TARGET_STACK_PROTECT_FAIL
 #define TARGET_STACK_PROTECT_FAIL rs6000_stack_protect_fail
@@ -4775,7 +4780,8 @@ rs6000_destroy_cost_data (void *data)
    library with vectorized intrinsics.  */
 
 static tree
-rs6000_builtin_vectorized_libmass (tree fndecl, tree type_out, tree type_in)
+rs6000_builtin_vectorized_libmass (combined_fn fn, tree type_out,
+				   tree type_in)
 {
   char name[32];
   const char *suffix = NULL;
@@ -4800,93 +4806,57 @@ rs6000_builtin_vectorized_libmass (tree fndecl, tree type_out, tree type_in)
       || n != in_n)
     return NULL_TREE;
 
-  if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
-    {
-      enum built_in_function fn = DECL_FUNCTION_CODE (fndecl);
   switch (fn)
     {
-	case BUILT_IN_ATAN2:
-	case BUILT_IN_HYPOT:
-	case BUILT_IN_POW:
+    CASE_CFN_ATAN2:
+    CASE_CFN_HYPOT:
+    CASE_CFN_POW:
       n_args = 2;
       /* fall through */
 
-	case BUILT_IN_ACOS:
-	case BUILT_IN_ACOSH:
-	case BUILT_IN_ASIN:
-	case BUILT_IN_ASINH:
-	case BUILT_IN_ATAN:
-	case BUILT_IN_ATANH:
-	case BUILT_IN_CBRT:
-	case BUILT_IN_COS:
-	case BUILT_IN_COSH:
-	case BUILT_IN_ERF:
-	case BUILT_IN_ERFC:
-	case BUILT_IN_EXP2:
-	case BUILT_IN_EXP:
-	case BUILT_IN_EXPM1:
-	case BUILT_IN_LGAMMA:
-	case BUILT_IN_LOG10:
-	case BUILT_IN_LOG1P:
-	case BUILT_IN_LOG2:
-	case BUILT_IN_LOG:
-	case BUILT_IN_SIN:
-	case BUILT_IN_SINH:
-	case BUILT_IN_SQRT:
-	case BUILT_IN_TAN:
-	case BUILT_IN_TANH:
-	  bdecl = builtin_decl_implicit (fn);
+    CASE_CFN_ACOS:
+    CASE_CFN_ACOSH:
+    CASE_CFN_ASIN:
+    CASE_CFN_ASINH:
+    CASE_CFN_ATAN:
+    CASE_CFN_ATANH:
+    CASE_CFN_CBRT:
+    CASE_CFN_COS:
+    CASE_CFN_COSH:
+    CASE_CFN_ERF:
+    CASE_CFN_ERFC:
+    CASE_CFN_EXP2:
+    CASE_CFN_EXP:
+    CASE_CFN_EXPM1:
+    CASE_CFN_LGAMMA:
+    CASE_CFN_LOG10:
+    CASE_CFN_LOG1P:
+    CASE_CFN_LOG2:
+    CASE_CFN_LOG:
+    CASE_CFN_SIN:
+    CASE_CFN_SINH:
+    CASE_CFN_SQRT:
+    CASE_CFN_TAN:
+    CASE_CFN_TANH:
+      if (el_mode == DFmode && n == 2)
+	{
+	  bdecl = mathfn_built_in (double_type_node, fn);
 	  suffix = "d2";				/* pow -> powd2 */
-	  if (el_mode != DFmode
-	      || n != 2
-	      || !bdecl)
-	    return NULL_TREE;
-	  break;
-
-	case BUILT_IN_ATAN2F:
-	case BUILT_IN_HYPOTF:
-	case BUILT_IN_POWF:
-	  n_args = 2;
-	  /* fall through */
-
-	case BUILT_IN_ACOSF:
-	case BUILT_IN_ACOSHF:
-	case BUILT_IN_ASINF:
-	case BUILT_IN_ASINHF:
-	case BUILT_IN_ATANF:
-	case BUILT_IN_ATANHF:
-	case BUILT_IN_CBRTF:
-	case BUILT_IN_COSF:
-	case BUILT_IN_COSHF:
-	case BUILT_IN_ERFF:
-	case BUILT_IN_ERFCF:
-	case BUILT_IN_EXP2F:
-	case BUILT_IN_EXPF:
-	case BUILT_IN_EXPM1F:
-	case BUILT_IN_LGAMMAF:
-	case BUILT_IN_LOG10F:
-	case BUILT_IN_LOG1PF:
-	case BUILT_IN_LOG2F:
-	case BUILT_IN_LOGF:
-	case BUILT_IN_SINF:
-	case BUILT_IN_SINHF:
-	case BUILT_IN_SQRTF:
-	case BUILT_IN_TANF:
-	case BUILT_IN_TANHF:
-	  bdecl = builtin_decl_implicit (fn);
+	}
+      else if (el_mode == SFmode && n == 4)
+	{
+	  bdecl = mathfn_built_in (float_type_node, fn);
 	  suffix = "4";					/* powf -> powf4 */
-	  if (el_mode != SFmode
-	      || n != 4
-	      || !bdecl)
+	}
+      else
+	return NULL_TREE;
+      if (!bdecl)
 	return NULL_TREE;
       break;
 
     default:
       return NULL_TREE;
     }
-    }
-  else
-    return NULL_TREE;
 
   gcc_assert (suffix != NULL);
   bname = IDENTIFIER_POINTER (DECL_NAME (bdecl));
@@ -4919,7 +4889,7 @@ rs6000_builtin_vectorized_libmass (tree fndecl, tree type_out, tree type_in)
    if it is not available.  */
 
 static tree
-rs6000_builtin_vectorized_function (tree fndecl, tree type_out,
+rs6000_builtin_vectorized_function (unsigned int fn, tree type_out,
 				    tree type_in)
 {
   machine_mode in_mode, out_mode;
@@ -4927,7 +4897,7 @@ rs6000_builtin_vectorized_function (tree fndecl, tree type_out,
 
   if (TARGET_DEBUG_BUILTIN)
     fprintf (stderr, "rs6000_builtin_vectorized_function (%s, %s, %s)\n",
-	     IDENTIFIER_POINTER (DECL_NAME (fndecl)),
+	     combined_fn_name (combined_fn (fn)),
 	     GET_MODE_NAME (TYPE_MODE (type_out)),
 	     GET_MODE_NAME (TYPE_MODE (type_in)));
 
@@ -4941,15 +4911,9 @@ rs6000_builtin_vectorized_function (tree fndecl, tree type_out,
   in_mode = TYPE_MODE (TREE_TYPE (type_in));
   in_n = TYPE_VECTOR_SUBPARTS (type_in);
 
-  if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_NORMAL)
-    {
-      enum built_in_function fn = DECL_FUNCTION_CODE (fndecl);
   switch (fn)
     {
-	case BUILT_IN_CLZIMAX:
-	case BUILT_IN_CLZLL:
-	case BUILT_IN_CLZL:
-	case BUILT_IN_CLZ:
+    CASE_CFN_CLZ:
       if (TARGET_P8_VECTOR && in_mode == out_mode && out_n == in_n)
 	{
 	  if (out_mode == QImode && out_n == 16)
@@ -4962,25 +4926,21 @@ rs6000_builtin_vectorized_function (tree fndecl, tree type_out,
 	    return rs6000_builtin_decls[P8V_BUILTIN_VCLZD];
 	}
       break;
-	case BUILT_IN_COPYSIGN:
+    CASE_CFN_COPYSIGN:
       if (VECTOR_UNIT_VSX_P (V2DFmode)
 	  && out_mode == DFmode && out_n == 2
 	  && in_mode == DFmode && in_n == 2)
 	return rs6000_builtin_decls[VSX_BUILTIN_CPSGNDP];
-	  break;
-	case BUILT_IN_COPYSIGNF:
-	  if (out_mode != SFmode || out_n != 4
-	      || in_mode != SFmode || in_n != 4)
-	    break;
-	  if (VECTOR_UNIT_VSX_P (V4SFmode))
+      if (VECTOR_UNIT_VSX_P (V4SFmode)
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
 	return rs6000_builtin_decls[VSX_BUILTIN_CPSGNSP];
-	  if (VECTOR_UNIT_ALTIVEC_P (V4SFmode))
+      if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
 	return rs6000_builtin_decls[ALTIVEC_BUILTIN_COPYSIGN_V4SF];
       break;
-	case BUILT_IN_POPCOUNTIMAX:
-	case BUILT_IN_POPCOUNTLL:
-	case BUILT_IN_POPCOUNTL:
-	case BUILT_IN_POPCOUNT:
+    CASE_CFN_POPCOUNT:
       if (TARGET_P8_VECTOR && in_mode == out_mode && out_n == in_n)
 	{
 	  if (out_mode == QImode && out_n == 16)
@@ -4993,101 +4953,90 @@ rs6000_builtin_vectorized_function (tree fndecl, tree type_out,
 	    return rs6000_builtin_decls[P8V_BUILTIN_VPOPCNTD];
 	}
       break;
-	case BUILT_IN_SQRT:
+    CASE_CFN_SQRT:
       if (VECTOR_UNIT_VSX_P (V2DFmode)
 	  && out_mode == DFmode && out_n == 2
 	  && in_mode == DFmode && in_n == 2)
 	return rs6000_builtin_decls[VSX_BUILTIN_XVSQRTDP];
-	  break;
-	case BUILT_IN_SQRTF:
       if (VECTOR_UNIT_VSX_P (V4SFmode)
 	  && out_mode == SFmode && out_n == 4
 	  && in_mode == SFmode && in_n == 4)
 	return rs6000_builtin_decls[VSX_BUILTIN_XVSQRTSP];
       break;
-	case BUILT_IN_CEIL:
+    CASE_CFN_CEIL:
       if (VECTOR_UNIT_VSX_P (V2DFmode)
 	  && out_mode == DFmode && out_n == 2
 	  && in_mode == DFmode && in_n == 2)
 	return rs6000_builtin_decls[VSX_BUILTIN_XVRDPIP];
-	  break;
-	case BUILT_IN_CEILF:
-	  if (out_mode != SFmode || out_n != 4
-	      || in_mode != SFmode || in_n != 4)
-	    break;
-	  if (VECTOR_UNIT_VSX_P (V4SFmode))
+      if (VECTOR_UNIT_VSX_P (V4SFmode)
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
 	return rs6000_builtin_decls[VSX_BUILTIN_XVRSPIP];
-	  if (VECTOR_UNIT_ALTIVEC_P (V4SFmode))
+      if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
 	return rs6000_builtin_decls[ALTIVEC_BUILTIN_VRFIP];
       break;
-	case BUILT_IN_FLOOR:
+    CASE_CFN_FLOOR:
       if (VECTOR_UNIT_VSX_P (V2DFmode)
 	  && out_mode == DFmode && out_n == 2
 	  && in_mode == DFmode && in_n == 2)
 	return rs6000_builtin_decls[VSX_BUILTIN_XVRDPIM];
-	  break;
-	case BUILT_IN_FLOORF:
-	  if (out_mode != SFmode || out_n != 4
-	      || in_mode != SFmode || in_n != 4)
-	    break;
-	  if (VECTOR_UNIT_VSX_P (V4SFmode))
+      if (VECTOR_UNIT_VSX_P (V4SFmode)
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
 	return rs6000_builtin_decls[VSX_BUILTIN_XVRSPIM];
-	  if (VECTOR_UNIT_ALTIVEC_P (V4SFmode))
+      if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
 	return rs6000_builtin_decls[ALTIVEC_BUILTIN_VRFIM];
       break;
-	case BUILT_IN_FMA:
+    CASE_CFN_FMA:
       if (VECTOR_UNIT_VSX_P (V2DFmode)
 	  && out_mode == DFmode && out_n == 2
 	  && in_mode == DFmode && in_n == 2)
 	return rs6000_builtin_decls[VSX_BUILTIN_XVMADDDP];
-	  break;
-	case BUILT_IN_FMAF:
       if (VECTOR_UNIT_VSX_P (V4SFmode)
 	  && out_mode == SFmode && out_n == 4
 	  && in_mode == SFmode && in_n == 4)
 	return rs6000_builtin_decls[VSX_BUILTIN_XVMADDSP];
-	  else if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
+      if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
 	  && out_mode == SFmode && out_n == 4
 	  && in_mode == SFmode && in_n == 4)
 	return rs6000_builtin_decls[ALTIVEC_BUILTIN_VMADDFP];
       break;
-	case BUILT_IN_TRUNC:
+    CASE_CFN_TRUNC:
       if (VECTOR_UNIT_VSX_P (V2DFmode)
 	  && out_mode == DFmode && out_n == 2
 	  && in_mode == DFmode && in_n == 2)
 	return rs6000_builtin_decls[VSX_BUILTIN_XVRDPIZ];
-	  break;
-	case BUILT_IN_TRUNCF:
-	  if (out_mode != SFmode || out_n != 4
-	      || in_mode != SFmode || in_n != 4)
-	    break;
-	  if (VECTOR_UNIT_VSX_P (V4SFmode))
+      if (VECTOR_UNIT_VSX_P (V4SFmode)
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
 	return rs6000_builtin_decls[VSX_BUILTIN_XVRSPIZ];
-	  if (VECTOR_UNIT_ALTIVEC_P (V4SFmode))
+      if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
 	return rs6000_builtin_decls[ALTIVEC_BUILTIN_VRFIZ];
       break;
-	case BUILT_IN_NEARBYINT:
+    CASE_CFN_NEARBYINT:
       if (VECTOR_UNIT_VSX_P (V2DFmode)
 	  && flag_unsafe_math_optimizations
 	  && out_mode == DFmode && out_n == 2
 	  && in_mode == DFmode && in_n == 2)
 	return rs6000_builtin_decls[VSX_BUILTIN_XVRDPI];
-	  break;
-	case BUILT_IN_NEARBYINTF:
       if (VECTOR_UNIT_VSX_P (V4SFmode)
 	  && flag_unsafe_math_optimizations
 	  && out_mode == SFmode && out_n == 4
 	  && in_mode == SFmode && in_n == 4)
 	return rs6000_builtin_decls[VSX_BUILTIN_XVRSPI];
       break;
-	case BUILT_IN_RINT:
+    CASE_CFN_RINT:
       if (VECTOR_UNIT_VSX_P (V2DFmode)
 	  && !flag_trapping_math
 	  && out_mode == DFmode && out_n == 2
 	  && in_mode == DFmode && in_n == 2)
 	return rs6000_builtin_decls[VSX_BUILTIN_XVRDPIC];
-	  break;
-	case BUILT_IN_RINTF:
       if (VECTOR_UNIT_VSX_P (V4SFmode)
 	  && !flag_trapping_math
 	  && out_mode == SFmode && out_n == 4
@@ -5097,12 +5046,41 @@ rs6000_builtin_vectorized_function (tree fndecl, tree type_out,
     default:
       break;
     }
-    }
 
-  else if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_MD)
-    {
+  /* Generate calls to libmass if appropriate.  */
+  if (rs6000_veclib_handler)
+    return rs6000_veclib_handler (combined_fn (fn), type_out, type_in);
+
+  return NULL_TREE;
+}
+
+/* Implement TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION.  */
+
+static tree
+rs6000_builtin_md_vectorized_function (tree fndecl, tree type_out,
+				       tree type_in)
+{
+  machine_mode in_mode, out_mode;
+  int in_n, out_n;
+
+  if (TARGET_DEBUG_BUILTIN)
+    fprintf (stderr, "rs6000_builtin_md_vectorized_function (%s, %s, %s)\n",
+	     IDENTIFIER_POINTER (DECL_NAME (fndecl)),
+	     GET_MODE_NAME (TYPE_MODE (type_out)),
+	     GET_MODE_NAME (TYPE_MODE (type_in)));
+
+  if (TREE_CODE (type_out) != VECTOR_TYPE
+      || TREE_CODE (type_in) != VECTOR_TYPE
+      || !TARGET_VECTORIZE_BUILTINS)
+    return NULL_TREE;
+
+  out_mode = TYPE_MODE (TREE_TYPE (type_out));
+  out_n = TYPE_VECTOR_SUBPARTS (type_out);
+  in_mode = TYPE_MODE (TREE_TYPE (type_in));
+  in_n = TYPE_VECTOR_SUBPARTS (type_in);
+
   enum rs6000_builtins fn
-	= (enum rs6000_builtins)DECL_FUNCTION_CODE (fndecl);
+    = (enum rs6000_builtins) DECL_FUNCTION_CODE (fndecl);
   switch (fn)
     {
     case RS6000_BUILTIN_RSQRTF:
@@ -5132,12 +5110,6 @@ rs6000_builtin_vectorized_function (tree fndecl, tree type_out,
     default:
       break;
     }
-    }
-
-  /* Generate calls to libmass if appropriate.  */
-  if (rs6000_veclib_handler)
-    return rs6000_veclib_handler (fndecl, type_out, type_in);
-
   return NULL_TREE;
 }
 \f
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index f394db7..20a77d1 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -5668,11 +5668,17 @@ If this hook is defined, the autovectorizer will use the
 conversion. Otherwise, it will return @code{NULL_TREE}.
 @end deftypefn
 
-@deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION (tree @var{fndecl}, tree @var{vec_type_out}, tree @var{vec_type_in})
+@deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION (unsigned @var{code}, tree @var{vec_type_out}, tree @var{vec_type_in})
 This hook should return the decl of a function that implements the
-vectorized variant of the builtin function with builtin function code
+vectorized variant of the function with the @code{combined_fn} code
 @var{code} or @code{NULL_TREE} if such a function is not available.
-The value of @var{fndecl} is the builtin function declaration.  The
+The return type of the vectorized function shall be of vector type
+@var{vec_type_out} and the argument types should be @var{vec_type_in}.
+@end deftypefn
+
+@deftypefn {Target Hook} tree TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION (tree @var{fndecl}, tree @var{vec_type_out}, tree @var{vec_type_in})
+This hook should return the decl of a function that implements the
+vectorized variant of target built-in function @code{fndecl}.  The
 return type of the vectorized function shall be of vector type
 @var{vec_type_out} and the argument types should be @var{vec_type_in}.
 @end deftypefn
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index d188c57..b1c6d1e 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4230,6 +4230,8 @@ address;  but often a machine-dependent strategy can generate better code.
 
 @hook TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION
 
+@hook TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION
+
 @hook TARGET_VECTORIZE_SUPPORT_VECTOR_MISALIGNMENT
 
 @hook TARGET_VECTORIZE_PREFERRED_SIMD_MODE
diff --git a/gcc/target.def b/gcc/target.def
index c7ec292..dddbd2c 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -1728,18 +1728,28 @@ the argument @var{OFF} to @code{REALIGN_LOAD}, in which case the low\n\
 log2(@var{VS}) @minus{} 1 bits of @var{addr} will be considered.",
  tree, (void), NULL)
 
-/* Returns a code for builtin that realizes vectorized version of
-   function, or NULL_TREE if not available.  */
+/* Returns a built-in function that realizes the vectorized version of
+   a target-independent function, or NULL_TREE if not available.  */
 DEFHOOK
 (builtin_vectorized_function,
  "This hook should return the decl of a function that implements the\n\
-vectorized variant of the builtin function with builtin function code\n\
+vectorized variant of the function with the @code{combined_fn} code\n\
 @var{code} or @code{NULL_TREE} if such a function is not available.\n\
-The value of @var{fndecl} is the builtin function declaration.  The\n\
+The return type of the vectorized function shall be of vector type\n\
+@var{vec_type_out} and the argument types should be @var{vec_type_in}.",
+ tree, (unsigned code, tree vec_type_out, tree vec_type_in),
+ default_builtin_vectorized_function)
+
+/* Returns a built-in function that realizes the vectorized version of
+   a target-specific function, or NULL_TREE if not available.  */
+DEFHOOK
+(builtin_md_vectorized_function,
+ "This hook should return the decl of a function that implements the\n\
+vectorized variant of target built-in function @code{fndecl}.  The\n\
 return type of the vectorized function shall be of vector type\n\
 @var{vec_type_out} and the argument types should be @var{vec_type_in}.",
  tree, (tree fndecl, tree vec_type_out, tree vec_type_in),
- default_builtin_vectorized_function)
+ default_builtin_md_vectorized_function)
 
 /* Returns a function declaration for a builtin that realizes the
    vector conversion, or NULL_TREE if not available.  */
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 14324b7..7852670 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -533,9 +533,15 @@ default_invalid_within_doloop (const rtx_insn *insn)
 /* Mapping of builtin functions to vectorized variants.  */
 
 tree
-default_builtin_vectorized_function (tree fndecl ATTRIBUTE_UNUSED,
-				     tree type_out ATTRIBUTE_UNUSED,
-				     tree type_in ATTRIBUTE_UNUSED)
+default_builtin_vectorized_function (unsigned int, tree, tree)
+{
+  return NULL_TREE;
+}
+
+/* Mapping of target builtin functions to vectorized variants.  */
+
+tree
+default_builtin_md_vectorized_function (tree, tree, tree)
 {
   return NULL_TREE;
 }
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index a8e7ebb..ea263da 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -83,7 +83,8 @@ extern bool default_has_ifunc_p (void);
 
 extern const char * default_invalid_within_doloop (const rtx_insn *);
 
-extern tree default_builtin_vectorized_function (tree, tree, tree);
+extern tree default_builtin_vectorized_function (unsigned int, tree, tree);
+extern tree default_builtin_md_vectorized_function (tree, tree, tree);
 
 extern tree default_builtin_vectorized_conversion (unsigned int, tree, tree);
 
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 51dff9e..75389c4 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -1639,20 +1639,20 @@ vect_finish_stmt_generation (gimple *stmt, gimple *vec_stmt,
 tree
 vectorizable_function (gcall *call, tree vectype_out, tree vectype_in)
 {
-  tree fndecl = gimple_call_fndecl (call);
-
-  /* We only handle functions that do not read or clobber memory -- i.e.
-     const or novops ones.  */
-  if (!(gimple_call_flags (call) & (ECF_CONST | ECF_NOVOPS)))
+  /* We only handle functions that do not read or clobber memory.  */
+  if (gimple_vuse (call))
     return NULL_TREE;
 
-  if (!fndecl
-      || TREE_CODE (fndecl) != FUNCTION_DECL
-      || !DECL_BUILT_IN (fndecl))
-    return NULL_TREE;
+  combined_fn fn = gimple_call_combined_fn (call);
+  if (fn != CFN_LAST)
+    return targetm.vectorize.builtin_vectorized_function
+      (fn, vectype_out, vectype_in);
 
-  return targetm.vectorize.builtin_vectorized_function (fndecl, vectype_out,
-						        vectype_in);
+  if (gimple_call_builtin_p (call, BUILT_IN_MD))
+    return targetm.vectorize.builtin_md_vectorized_function
+      (gimple_call_fndecl (call), vectype_out, vectype_in);
+
+  return NULL_TREE;
 }
 
 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 3/6] Vectorize internal functions
  2015-11-09 16:20 [PATCH 0/6] Automatically use vector optabs Richard Sandiford
  2015-11-09 16:21 ` [PATCH 1/6] Use IFN_SQRT in tree-vect-patterns.c Richard Sandiford
  2015-11-09 16:25 ` [PATCH 2/6] Make builtin_vectorized_function take a combined_fn Richard Sandiford
@ 2015-11-09 16:27 ` Richard Sandiford
  2015-11-17  9:30   ` Ping: " Richard Sandiford
  2015-11-09 16:28 ` [PATCH 4/6] Simplify ix86_builtin_vectorized_function Richard Sandiford
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Richard Sandiford @ 2015-11-09 16:27 UTC (permalink / raw)
  To: gcc-patches

This patch tries to vectorize built-in and internal functions as
internal functions first, falling back on the current built-in
target hooks otherwise.


gcc/
	* internal-fn.h (direct_internal_fn_info): Add vectorizable flag.
	* internal-fn.c (direct_internal_fn_array): Update accordingly.
	* tree-vectorizer.h (vectorizable_function): Delete.
	* tree-vect-stmts.c: Include internal-fn.h.
	(vectorizable_internal_function): New function.
	(vectorizable_function): Inline into...
	(vectorizable_call): ...here.  Explicitly reject calls that read
	from or write to memory.  Try using an internal function before
	falling back on the old vectorizable_function behavior.

diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 898c83d..a5bda2f 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -69,13 +69,13 @@ init_internal_fns ()
 
 /* Create static initializers for the information returned by
    direct_internal_fn.  */
-#define not_direct { -2, -2 }
-#define mask_load_direct { -1, -1 }
-#define load_lanes_direct { -1, -1 }
-#define mask_store_direct { 3, 3 }
-#define store_lanes_direct { 0, 0 }
-#define unary_direct { 0, 0 }
-#define binary_direct { 0, 0 }
+#define not_direct { -2, -2, false }
+#define mask_load_direct { -1, -1, false }
+#define load_lanes_direct { -1, -1, false }
+#define mask_store_direct { 3, 3, false }
+#define store_lanes_direct { 0, 0, false }
+#define unary_direct { 0, 0, true }
+#define binary_direct { 0, 0, true }
 
 const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = {
 #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) not_direct,
diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
index 6cb123f..aea6abd 100644
--- a/gcc/internal-fn.h
+++ b/gcc/internal-fn.h
@@ -134,6 +134,14 @@ struct direct_internal_fn_info
      function isn't directly mapped to an optab.  */
   signed int type0 : 8;
   signed int type1 : 8;
+  /* True if the function is pointwise, so that it can be vectorized by
+     converting the return type and all argument types to vectors of the
+     same number of elements.  E.g. we can vectorize an IFN_SQRT on
+     floats as an IFN_SQRT on vectors of N floats.
+
+     This only needs 1 bit, but occupies the full 16 to ensure a nice
+     layout.  */
+  unsigned int vectorizable : 16;
 };
 
 extern const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1];
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 75389c4..1142142 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -47,6 +47,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-scalar-evolution.h"
 #include "tree-vectorizer.h"
 #include "builtins.h"
+#include "internal-fn.h"
 
 /* For lang_hooks.types.type_for_mode.  */
 #include "langhooks.h"
@@ -1632,27 +1633,32 @@ vect_finish_stmt_generation (gimple *stmt, gimple *vec_stmt,
     add_stmt_to_eh_lp (vec_stmt, lp_nr);
 }
 
-/* Checks if CALL can be vectorized in type VECTYPE.  Returns
-   a function declaration if the target has a vectorized version
-   of the function, or NULL_TREE if the function cannot be vectorized.  */
+/* We want to vectorize a call to combined function CFN with function
+   decl FNDECL, using VECTYPE_OUT as the type of the output and VECTYPE_IN
+   as the types of all inputs.  Check whether this is possible using
+   an internal function, returning its code if so or IFN_LAST if not.  */
 
-tree
-vectorizable_function (gcall *call, tree vectype_out, tree vectype_in)
+static internal_fn
+vectorizable_internal_function (combined_fn cfn, tree fndecl,
+				tree vectype_out, tree vectype_in)
 {
-  /* We only handle functions that do not read or clobber memory.  */
-  if (gimple_vuse (call))
-    return NULL_TREE;
-
-  combined_fn fn = gimple_call_combined_fn (call);
-  if (fn != CFN_LAST)
-    return targetm.vectorize.builtin_vectorized_function
-      (fn, vectype_out, vectype_in);
-
-  if (gimple_call_builtin_p (call, BUILT_IN_MD))
-    return targetm.vectorize.builtin_md_vectorized_function
-      (gimple_call_fndecl (call), vectype_out, vectype_in);
-
-  return NULL_TREE;
+  internal_fn ifn;
+  if (internal_fn_p (cfn))
+    ifn = as_internal_fn (cfn);
+  else
+    ifn = associated_internal_fn (fndecl);
+  if (ifn != IFN_LAST && direct_internal_fn_p (ifn))
+    {
+      const direct_internal_fn_info &info = direct_internal_fn (ifn);
+      if (info.vectorizable)
+	{
+	  tree type0 = (info.type0 < 0 ? vectype_out : vectype_in);
+	  tree type1 = (info.type1 < 0 ? vectype_out : vectype_in);
+	  if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1)))
+	    return ifn;
+	}
+    }
+  return IFN_LAST;
 }
 
 
@@ -2232,15 +2238,43 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator *gsi, gimple **vec_stmt,
   else
     return false;
 
+  /* We only handle functions that do not read or clobber memory.  */
+  if (gimple_vuse (stmt))
+    {
+      if (dump_enabled_p ())
+	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+			 "function reads from or writes to memory.\n");
+      return false;
+    }
+
   /* For now, we only vectorize functions if a target specific builtin
      is available.  TODO -- in some cases, it might be profitable to
      insert the calls for pieces of the vector, in order to be able
      to vectorize other operations in the loop.  */
-  fndecl = vectorizable_function (stmt, vectype_out, vectype_in);
-  if (fndecl == NULL_TREE)
+  fndecl = NULL_TREE;
+  internal_fn ifn = IFN_LAST;
+  combined_fn cfn = gimple_call_combined_fn (stmt);
+  tree callee = gimple_call_fndecl (stmt);
+
+  /* First try using an internal function.  */
+  if (cfn != CFN_LAST)
+    ifn = vectorizable_internal_function (cfn, callee, vectype_out,
+					  vectype_in);
+
+  /* If that fails, try asking for a target-specific built-in function.  */
+  if (ifn == IFN_LAST)
+    {
+      if (cfn != CFN_LAST)
+	fndecl = targetm.vectorize.builtin_vectorized_function
+	  (cfn, vectype_out, vectype_in);
+      else
+	fndecl = targetm.vectorize.builtin_md_vectorized_function
+	  (callee, vectype_out, vectype_in);
+    }
+
+  if (ifn == IFN_LAST && !fndecl)
     {
-      if (gimple_call_internal_p (stmt)
-	  && gimple_call_internal_fn (stmt) == IFN_GOMP_SIMD_LANE
+      if (cfn == CFN_GOMP_SIMD_LANE
 	  && !slp_node
 	  && loop_vinfo
 	  && LOOP_VINFO_LOOP (loop_vinfo)->simduid
@@ -2261,8 +2295,6 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator *gsi, gimple **vec_stmt,
 	}
     }
 
-  gcc_assert (!gimple_vuse (stmt));
-
   if (slp_node || PURE_SLP_STMT (stmt_info))
     ncopies = 1;
   else if (modifier == NARROW)
@@ -2324,7 +2356,10 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator *gsi, gimple **vec_stmt,
 		      vec<tree> vec_oprndsk = vec_defs[k];
 		      vargs[k] = vec_oprndsk[i];
 		    }
-		  new_stmt = gimple_build_call_vec (fndecl, vargs);
+		  if (ifn != IFN_LAST)
+		    new_stmt = gimple_build_call_internal_vec (ifn, vargs);
+		  else
+		    new_stmt = gimple_build_call_vec (fndecl, vargs);
 		  new_temp = make_ssa_name (vec_dest, new_stmt);
 		  gimple_call_set_lhs (new_stmt, new_temp);
 		  vect_finish_stmt_generation (stmt, new_stmt, gsi);
@@ -2372,7 +2407,10 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator *gsi, gimple **vec_stmt,
 	    }
 	  else
 	    {
-	      new_stmt = gimple_build_call_vec (fndecl, vargs);
+	      if (ifn != IFN_LAST)
+		new_stmt = gimple_build_call_internal_vec (ifn, vargs);
+	      else
+		new_stmt = gimple_build_call_vec (fndecl, vargs);
 	      new_temp = make_ssa_name (vec_dest, new_stmt);
 	      gimple_call_set_lhs (new_stmt, new_temp);
 	    }
@@ -2418,7 +2456,10 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator *gsi, gimple **vec_stmt,
 		      vargs.quick_push (vec_oprndsk[i]);
 		      vargs.quick_push (vec_oprndsk[i + 1]);
 		    }
-		  new_stmt = gimple_build_call_vec (fndecl, vargs);
+		  if (ifn != IFN_LAST)
+		    new_stmt = gimple_build_call_internal_vec (ifn, vargs);
+		  else
+		    new_stmt = gimple_build_call_vec (fndecl, vargs);
 		  new_temp = make_ssa_name (vec_dest, new_stmt);
 		  gimple_call_set_lhs (new_stmt, new_temp);
 		  vect_finish_stmt_generation (stmt, new_stmt, gsi);
@@ -2456,7 +2497,10 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator *gsi, gimple **vec_stmt,
 	      vargs.quick_push (vec_oprnd1);
 	    }
 
-	  new_stmt = gimple_build_call_vec (fndecl, vargs);
+	  if (ifn != IFN_LAST)
+	    new_stmt = gimple_build_call_internal_vec (ifn, vargs);
+	  else
+	    new_stmt = gimple_build_call_vec (fndecl, vargs);
 	  new_temp = make_ssa_name (vec_dest, new_stmt);
 	  gimple_call_set_lhs (new_stmt, new_temp);
 	  vect_finish_stmt_generation (stmt, new_stmt, gsi);
diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 9cde091..bb1ab39 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -958,7 +958,6 @@ extern bool supportable_narrowing_operation (enum tree_code, tree, tree,
 					     int *, vec<tree> *);
 extern stmt_vec_info new_stmt_vec_info (gimple *stmt, vec_info *);
 extern void free_stmt_vec_info (gimple *stmt);
-extern tree vectorizable_function (gcall *, tree, tree);
 extern void vect_model_simple_cost (stmt_vec_info, int, enum vect_def_type *,
                                     stmt_vector_for_cost *,
 				    stmt_vector_for_cost *);

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 4/6] Simplify ix86_builtin_vectorized_function
  2015-11-09 16:20 [PATCH 0/6] Automatically use vector optabs Richard Sandiford
                   ` (2 preceding siblings ...)
  2015-11-09 16:27 ` [PATCH 3/6] Vectorize internal functions Richard Sandiford
@ 2015-11-09 16:28 ` Richard Sandiford
  2015-11-09 19:55   ` Uros Bizjak
  2015-11-09 16:30 ` [PATCH 5/6] Simplify rs6000_builtin_vectorized_function Richard Sandiford
  2015-11-09 16:32 ` [PATCH 6/6] Simplify aarch64_builtin_vectorized_function Richard Sandiford
  5 siblings, 1 reply; 19+ messages in thread
From: Richard Sandiford @ 2015-11-09 16:28 UTC (permalink / raw)
  To: gcc-patches; +Cc: ubizjak

After the previous patches it's no longer necessary for
TARGET_BUILTIN_VECTORIZED_FUNCTION to return functions that
map to the vector optab of the original operation.  We'll use
a vector form of the internal function instead.


gcc/
	* config/i386/i386.c (ix86_builtin_vectorized_function): Remove
	entries that map directly to optabs.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index a1d59a5..1003ce1 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -41746,27 +41746,6 @@ ix86_builtin_vectorized_function (unsigned int fn, tree type_out,
 
   switch (fn)
     {
-    CASE_CFN_SQRT:
-      if (out_mode == DFmode && in_mode == DFmode)
-	{
-	  if (out_n == 2 && in_n == 2)
-	    return ix86_get_builtin (IX86_BUILTIN_SQRTPD);
-	  else if (out_n == 4 && in_n == 4)
-	    return ix86_get_builtin (IX86_BUILTIN_SQRTPD256);
-	  else if (out_n == 8 && in_n == 8)
-	    return ix86_get_builtin (IX86_BUILTIN_SQRTPD512);
-	}
-      if (out_mode == SFmode && in_mode == SFmode)
-	{
-	  if (out_n == 4 && in_n == 4)
-	    return ix86_get_builtin (IX86_BUILTIN_SQRTPS_NR);
-	  else if (out_n == 8 && in_n == 8)
-	    return ix86_get_builtin (IX86_BUILTIN_SQRTPS_NR256);
-	  else if (out_n == 16 && in_n == 16)
-	    return ix86_get_builtin (IX86_BUILTIN_SQRTPS_NR512);
-	}
-      break;
-
     CASE_CFN_EXP2:
       if (out_mode == SFmode && in_mode == SFmode)
 	{
@@ -41869,27 +41848,6 @@ ix86_builtin_vectorized_function (unsigned int fn, tree type_out,
 	}
       break;
 
-    CASE_CFN_COPYSIGN:
-      if (out_mode == DFmode && in_mode == DFmode)
-	{
-	  if (out_n == 2 && in_n == 2)
-	    return ix86_get_builtin (IX86_BUILTIN_CPYSGNPD);
-	  else if (out_n == 4 && in_n == 4)
-	    return ix86_get_builtin (IX86_BUILTIN_CPYSGNPD256);
-	  else if (out_n == 8 && in_n == 8)
-	    return ix86_get_builtin (IX86_BUILTIN_CPYSGNPD512);
-	}
-      if (out_mode == SFmode && in_mode == SFmode)
-	{
-	  if (out_n == 4 && in_n == 4)
-	    return ix86_get_builtin (IX86_BUILTIN_CPYSGNPS);
-	  else if (out_n == 8 && in_n == 8)
-	    return ix86_get_builtin (IX86_BUILTIN_CPYSGNPS256);
-	  else if (out_n == 16 && in_n == 16)
-	    return ix86_get_builtin (IX86_BUILTIN_CPYSGNPS512);
-	}
-      break;
-
     CASE_CFN_FLOOR:
       /* The round insn does not trap on denormals.  */
       if (flag_trapping_math || !TARGET_ROUND)
@@ -41974,27 +41932,6 @@ ix86_builtin_vectorized_function (unsigned int fn, tree type_out,
 	}
       break;
 
-    CASE_CFN_ROUND:
-      /* The round insn does not trap on denormals.  */
-      if (flag_trapping_math || !TARGET_ROUND)
-	break;
-
-      if (out_mode == DFmode && in_mode == DFmode)
-	{
-	  if (out_n == 2 && in_n == 2)
-	    return ix86_get_builtin (IX86_BUILTIN_ROUNDPD_AZ);
-	  else if (out_n == 4 && in_n == 4)
-	    return ix86_get_builtin (IX86_BUILTIN_ROUNDPD_AZ256);
-	}
-      if (out_mode == SFmode && in_mode == SFmode)
-	{
-	  if (out_n == 4 && in_n == 4)
-	    return ix86_get_builtin (IX86_BUILTIN_ROUNDPS_AZ);
-	  else if (out_n == 8 && in_n == 8)
-	    return ix86_get_builtin (IX86_BUILTIN_ROUNDPS_AZ256);
-	}
-      break;
-
     CASE_CFN_FMA:
       if (out_mode == DFmode && in_mode == DFmode)
 	{

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 5/6] Simplify rs6000_builtin_vectorized_function
  2015-11-09 16:20 [PATCH 0/6] Automatically use vector optabs Richard Sandiford
                   ` (3 preceding siblings ...)
  2015-11-09 16:28 ` [PATCH 4/6] Simplify ix86_builtin_vectorized_function Richard Sandiford
@ 2015-11-09 16:30 ` Richard Sandiford
  2015-11-09 17:47   ` David Edelsohn
  2015-11-09 16:32 ` [PATCH 6/6] Simplify aarch64_builtin_vectorized_function Richard Sandiford
  5 siblings, 1 reply; 19+ messages in thread
From: Richard Sandiford @ 2015-11-09 16:30 UTC (permalink / raw)
  To: gcc-patches; +Cc: dje.gcc

After the previous patches it's no longer necessary for
TARGET_BUILTIN_VECTORIZED_FUNCTION to return functions that
map to the vector optab of the original operation.  We'll use
a vector form of the internal function instead.


gcc/
	* config/rs6000/rs6000.c (rs6000_builtin_vectorized_function): Remove
	entries that map directly to optabs.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 26a0410..aa55b8e 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4913,19 +4913,6 @@ rs6000_builtin_vectorized_function (unsigned int fn, tree type_out,
 
   switch (fn)
     {
-    CASE_CFN_CLZ:
-      if (TARGET_P8_VECTOR && in_mode == out_mode && out_n == in_n)
-	{
-	  if (out_mode == QImode && out_n == 16)
-	    return rs6000_builtin_decls[P8V_BUILTIN_VCLZB];
-	  else if (out_mode == HImode && out_n == 8)
-	    return rs6000_builtin_decls[P8V_BUILTIN_VCLZH];
-	  else if (out_mode == SImode && out_n == 4)
-	    return rs6000_builtin_decls[P8V_BUILTIN_VCLZW];
-	  else if (out_mode == DImode && out_n == 2)
-	    return rs6000_builtin_decls[P8V_BUILTIN_VCLZD];
-	}
-      break;
     CASE_CFN_COPYSIGN:
       if (VECTOR_UNIT_VSX_P (V2DFmode)
 	  && out_mode == DFmode && out_n == 2
@@ -4940,29 +4927,6 @@ rs6000_builtin_vectorized_function (unsigned int fn, tree type_out,
 	  && in_mode == SFmode && in_n == 4)
 	return rs6000_builtin_decls[ALTIVEC_BUILTIN_COPYSIGN_V4SF];
       break;
-    CASE_CFN_POPCOUNT:
-      if (TARGET_P8_VECTOR && in_mode == out_mode && out_n == in_n)
-	{
-	  if (out_mode == QImode && out_n == 16)
-	    return rs6000_builtin_decls[P8V_BUILTIN_VPOPCNTB];
-	  else if (out_mode == HImode && out_n == 8)
-	    return rs6000_builtin_decls[P8V_BUILTIN_VPOPCNTH];
-	  else if (out_mode == SImode && out_n == 4)
-	    return rs6000_builtin_decls[P8V_BUILTIN_VPOPCNTW];
-	  else if (out_mode == DImode && out_n == 2)
-	    return rs6000_builtin_decls[P8V_BUILTIN_VPOPCNTD];
-	}
-      break;
-    CASE_CFN_SQRT:
-      if (VECTOR_UNIT_VSX_P (V2DFmode)
-	  && out_mode == DFmode && out_n == 2
-	  && in_mode == DFmode && in_n == 2)
-	return rs6000_builtin_decls[VSX_BUILTIN_XVSQRTDP];
-      if (VECTOR_UNIT_VSX_P (V4SFmode)
-	  && out_mode == SFmode && out_n == 4
-	  && in_mode == SFmode && in_n == 4)
-	return rs6000_builtin_decls[VSX_BUILTIN_XVSQRTSP];
-      break;
     CASE_CFN_CEIL:
       if (VECTOR_UNIT_VSX_P (V2DFmode)
 	  && out_mode == DFmode && out_n == 2

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 6/6] Simplify aarch64_builtin_vectorized_function
  2015-11-09 16:20 [PATCH 0/6] Automatically use vector optabs Richard Sandiford
                   ` (4 preceding siblings ...)
  2015-11-09 16:30 ` [PATCH 5/6] Simplify rs6000_builtin_vectorized_function Richard Sandiford
@ 2015-11-09 16:32 ` Richard Sandiford
  5 siblings, 0 replies; 19+ messages in thread
From: Richard Sandiford @ 2015-11-09 16:32 UTC (permalink / raw)
  To: gcc-patches; +Cc: marcus.shawcroft, richard.earnshaw, james.greenhalgh

After the previous patches it's no longer necessary for
TARGET_BUILTIN_VECTORIZED_FUNCTION to return functions that
map to the vector optab of the original operation.  We'll use
a vector form of the internal function instead.


gcc/
	* config/aarch64/aarch64-builtins.c
	(aarch64_builtin_vectorized_function): Remove entries that map
	directly to optabs.

diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index c4cda4f..2a560a9 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -1288,40 +1288,6 @@ aarch64_builtin_vectorized_function (unsigned int fn, tree type_out,
     {
 #undef AARCH64_CHECK_BUILTIN_MODE
 #define AARCH64_CHECK_BUILTIN_MODE(C, N) \
-  (out_mode == N##Fmode && out_n == C \
-   && in_mode == N##Fmode && in_n == C)
-    CASE_CFN_FLOOR:
-      return AARCH64_FIND_FRINT_VARIANT (floor);
-    CASE_CFN_CEIL:
-      return AARCH64_FIND_FRINT_VARIANT (ceil);
-    CASE_CFN_TRUNC:
-      return AARCH64_FIND_FRINT_VARIANT (btrunc);
-    CASE_CFN_ROUND:
-      return AARCH64_FIND_FRINT_VARIANT (round);
-    CASE_CFN_NEARBYINT:
-      return AARCH64_FIND_FRINT_VARIANT (nearbyint);
-    CASE_CFN_SQRT:
-      return AARCH64_FIND_FRINT_VARIANT (sqrt);
-#undef AARCH64_CHECK_BUILTIN_MODE
-#define AARCH64_CHECK_BUILTIN_MODE(C, N) \
-  (out_mode == SImode && out_n == C \
-   && in_mode == N##Imode && in_n == C)
-    CASE_CFN_CLZ:
-      {
-	if (AARCH64_CHECK_BUILTIN_MODE (4, S))
-	  return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOP_clzv4si];
-	return NULL_TREE;
-      }
-    CASE_CFN_CTZ:
-      {
-	if (AARCH64_CHECK_BUILTIN_MODE (2, S))
-	  return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOP_ctzv2si];
-	else if (AARCH64_CHECK_BUILTIN_MODE (4, S))
-	  return aarch64_builtin_decls[AARCH64_SIMD_BUILTIN_UNOP_ctzv4si];
-	return NULL_TREE;
-      }
-#undef AARCH64_CHECK_BUILTIN_MODE
-#define AARCH64_CHECK_BUILTIN_MODE(C, N) \
   (out_mode == N##Imode && out_n == C \
    && in_mode == N##Fmode && in_n == C)
     CASE_CFN_IFLOOR:

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 5/6] Simplify rs6000_builtin_vectorized_function
  2015-11-09 16:30 ` [PATCH 5/6] Simplify rs6000_builtin_vectorized_function Richard Sandiford
@ 2015-11-09 17:47   ` David Edelsohn
  0 siblings, 0 replies; 19+ messages in thread
From: David Edelsohn @ 2015-11-09 17:47 UTC (permalink / raw)
  To: GCC Patches, richard.sandiford

On Mon, Nov 9, 2015 at 8:30 AM, Richard Sandiford
<richard.sandiford@arm.com> wrote:
> After the previous patches it's no longer necessary for
> TARGET_BUILTIN_VECTORIZED_FUNCTION to return functions that
> map to the vector optab of the original operation.  We'll use
> a vector form of the internal function instead.
>
>
> gcc/
>         * config/rs6000/rs6000.c (rs6000_builtin_vectorized_function): Remove
>         entries that map directly to optabs.

Okay.

Thanks, David

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 4/6] Simplify ix86_builtin_vectorized_function
  2015-11-09 16:28 ` [PATCH 4/6] Simplify ix86_builtin_vectorized_function Richard Sandiford
@ 2015-11-09 19:55   ` Uros Bizjak
  0 siblings, 0 replies; 19+ messages in thread
From: Uros Bizjak @ 2015-11-09 19:55 UTC (permalink / raw)
  To: gcc-patches, Uros Bizjak, Richard Sandiford

On Mon, Nov 9, 2015 at 5:28 PM, Richard Sandiford
<richard.sandiford@arm.com> wrote:
> After the previous patches it's no longer necessary for
> TARGET_BUILTIN_VECTORIZED_FUNCTION to return functions that
> map to the vector optab of the original operation.  We'll use
> a vector form of the internal function instead.
>
>
> gcc/
>         * config/i386/i386.c (ix86_builtin_vectorized_function): Remove
>         entries that map directly to optabs.

OK.

Thanks,
Uros.

> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index a1d59a5..1003ce1 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -41746,27 +41746,6 @@ ix86_builtin_vectorized_function (unsigned int fn, tree type_out,
>
>    switch (fn)
>      {
> -    CASE_CFN_SQRT:
> -      if (out_mode == DFmode && in_mode == DFmode)
> -       {
> -         if (out_n == 2 && in_n == 2)
> -           return ix86_get_builtin (IX86_BUILTIN_SQRTPD);
> -         else if (out_n == 4 && in_n == 4)
> -           return ix86_get_builtin (IX86_BUILTIN_SQRTPD256);
> -         else if (out_n == 8 && in_n == 8)
> -           return ix86_get_builtin (IX86_BUILTIN_SQRTPD512);
> -       }
> -      if (out_mode == SFmode && in_mode == SFmode)
> -       {
> -         if (out_n == 4 && in_n == 4)
> -           return ix86_get_builtin (IX86_BUILTIN_SQRTPS_NR);
> -         else if (out_n == 8 && in_n == 8)
> -           return ix86_get_builtin (IX86_BUILTIN_SQRTPS_NR256);
> -         else if (out_n == 16 && in_n == 16)
> -           return ix86_get_builtin (IX86_BUILTIN_SQRTPS_NR512);
> -       }
> -      break;
> -
>      CASE_CFN_EXP2:
>        if (out_mode == SFmode && in_mode == SFmode)
>         {
> @@ -41869,27 +41848,6 @@ ix86_builtin_vectorized_function (unsigned int fn, tree type_out,
>         }
>        break;
>
> -    CASE_CFN_COPYSIGN:
> -      if (out_mode == DFmode && in_mode == DFmode)
> -       {
> -         if (out_n == 2 && in_n == 2)
> -           return ix86_get_builtin (IX86_BUILTIN_CPYSGNPD);
> -         else if (out_n == 4 && in_n == 4)
> -           return ix86_get_builtin (IX86_BUILTIN_CPYSGNPD256);
> -         else if (out_n == 8 && in_n == 8)
> -           return ix86_get_builtin (IX86_BUILTIN_CPYSGNPD512);
> -       }
> -      if (out_mode == SFmode && in_mode == SFmode)
> -       {
> -         if (out_n == 4 && in_n == 4)
> -           return ix86_get_builtin (IX86_BUILTIN_CPYSGNPS);
> -         else if (out_n == 8 && in_n == 8)
> -           return ix86_get_builtin (IX86_BUILTIN_CPYSGNPS256);
> -         else if (out_n == 16 && in_n == 16)
> -           return ix86_get_builtin (IX86_BUILTIN_CPYSGNPS512);
> -       }
> -      break;
> -
>      CASE_CFN_FLOOR:
>        /* The round insn does not trap on denormals.  */
>        if (flag_trapping_math || !TARGET_ROUND)
> @@ -41974,27 +41932,6 @@ ix86_builtin_vectorized_function (unsigned int fn, tree type_out,
>         }
>        break;
>
> -    CASE_CFN_ROUND:
> -      /* The round insn does not trap on denormals.  */
> -      if (flag_trapping_math || !TARGET_ROUND)
> -       break;
> -
> -      if (out_mode == DFmode && in_mode == DFmode)
> -       {
> -         if (out_n == 2 && in_n == 2)
> -           return ix86_get_builtin (IX86_BUILTIN_ROUNDPD_AZ);
> -         else if (out_n == 4 && in_n == 4)
> -           return ix86_get_builtin (IX86_BUILTIN_ROUNDPD_AZ256);
> -       }
> -      if (out_mode == SFmode && in_mode == SFmode)
> -       {
> -         if (out_n == 4 && in_n == 4)
> -           return ix86_get_builtin (IX86_BUILTIN_ROUNDPS_AZ);
> -         else if (out_n == 8 && in_n == 8)
> -           return ix86_get_builtin (IX86_BUILTIN_ROUNDPS_AZ256);
> -       }
> -      break;
> -
>      CASE_CFN_FMA:
>        if (out_mode == DFmode && in_mode == DFmode)
>         {
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] Use IFN_SQRT in tree-vect-patterns.c
  2015-11-09 16:21 ` [PATCH 1/6] Use IFN_SQRT in tree-vect-patterns.c Richard Sandiford
@ 2015-11-10 10:21   ` Richard Biener
  2015-11-10 10:57     ` Richard Sandiford
  2015-11-10 17:29     ` Joseph Myers
  0 siblings, 2 replies; 19+ messages in thread
From: Richard Biener @ 2015-11-10 10:21 UTC (permalink / raw)
  To: GCC Patches, richard.sandiford

On Mon, Nov 9, 2015 at 5:21 PM, Richard Sandiford
<richard.sandiford@arm.com> wrote:
> In practice all targets that can vectorise sqrt define the appropriate
> sqrt<mode>2 optab.  The only case where this isn't immediately obvious
> is the libmass support in rs6000.c, but Mike Meissner said that it shouldn't
> be exercised for sqrt.
>
> This patch therefore uses the internal function interface instead of
> going via the target hook.
>
>
> gcc/
>         * tree-vect-patterns.c: Include internal-fn.h.
>         (vect_recog_pow_pattern): Use IFN_SQRT instead of BUILT_IN_SQRT*.
>
> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> index bab9a4f..a803e8c 100644
> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-vectorizer.h"
>  #include "dumpfile.h"
>  #include "builtins.h"
> +#include "internal-fn.h"
>  #include "case-cfn-macros.h"
>
>  /* Pattern recognition functions  */
> @@ -1052,18 +1053,13 @@ vect_recog_pow_pattern (vec<gimple *> *stmts, tree *type_in,
>    if (TREE_CODE (exp) == REAL_CST
>        && real_equal (&TREE_REAL_CST (exp), &dconsthalf))
>      {
> -      tree newfn = mathfn_built_in (TREE_TYPE (base), BUILT_IN_SQRT);
>        *type_in = get_vectype_for_scalar_type (TREE_TYPE (base));
> -      if (*type_in)
> +      if (*type_in && direct_internal_fn_supported_p (IFN_SQRT, *type_in))
>         {
> -         gcall *stmt = gimple_build_call (newfn, 1, base);
> -         if (vectorizable_function (stmt, *type_in, *type_in)
> -             != NULL_TREE)
> -           {
> -             var = vect_recog_temp_ssa_var (TREE_TYPE (base), stmt);
> -             gimple_call_set_lhs (stmt, var);
> -             return stmt;
> -           }
> +         gcall *stmt = gimple_build_call_internal (IFN_SQRT, 1, base);
> +         var = vect_recog_temp_ssa_var (TREE_TYPE (base), stmt);
> +         gimple_call_set_lhs (stmt, var);
> +         return stmt;

Looks ok but I wonder if this is dead code with

(for pows (POW)
     sqrts (SQRT)
     cbrts (CBRT)
 (simplify
  (pows @0 REAL_CST@1)
  (with {
    const REAL_VALUE_TYPE *value = TREE_REAL_CST_PTR (@1);
    REAL_VALUE_TYPE tmp;
   }
   (switch
...
    /* pow(x,0.5) -> sqrt(x).  */
    (if (flag_unsafe_math_optimizations
         && canonicalize_math_p ()
         && real_equal (value, &dconsthalf))
     (sqrts @0))

also wondering here about canonicalize_math_p (), I'd expected the
reverse transform as canonicalization.  Also wondering about
flag_unsafe_math_optimizations (missing from the vectorizer pattern).

Anyway, patch is ok.

Thanks,
Richard.

>         }
>      }
>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] Make builtin_vectorized_function take a combined_fn
  2015-11-09 16:25 ` [PATCH 2/6] Make builtin_vectorized_function take a combined_fn Richard Sandiford
@ 2015-11-10 10:36   ` Richard Biener
  2015-11-13 12:27     ` Richard Sandiford
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Biener @ 2015-11-10 10:36 UTC (permalink / raw)
  To: GCC Patches, richard.sandiford

On Mon, Nov 9, 2015 at 5:25 PM, Richard Sandiford
<richard.sandiford@arm.com> wrote:
> This patch replaces the fndecl argument to builtin_vectorized_function
> with a combined_fn and gets the vectoriser to call it for internal
> functions too.  The patch also moves vectorisation of machine-specific
> built-ins to a new hook, builtin_md_vectorized_function.
>
> I've attached a -b version too since that's easier to read.

@@ -42095,8 +42018,7 @@ ix86_builtin_vectorized_function (tree fndecl,
tree type_out,

   /* Dispatch to a handler for a vectorization library.  */
   if (ix86_veclib_handler)
-    return ix86_veclib_handler ((enum built_in_function) fn, type_out,
-                               type_in);
+    return ix86_veclib_handler (combined_fn (fn), type_out, type_in);

   return NULL_TREE;
 }

fn is already a combined_fn?  Why does the builtin_vectorized_function
not take one but an unsigned int?

@@ -42176,11 +42077,12 @@ ix86_veclibabi_svml (enum built_in_function
fn, tree type_out, tree type_in)
       return NULL_TREE;
     }

-  bname = IDENTIFIER_POINTER (DECL_NAME (builtin_decl_implicit (fn)));
+  tree fndecl = mathfn_built_in (TREE_TYPE (type_in), fn);
+  bname = IDENTIFIER_POINTER (DECL_NAME (fndecl));

-  if (fn == BUILT_IN_LOGF)
+  if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_LOGF)

with 'fn' now a combined_fn how is this going to work with IFNs?

@@ -42194,9 +42096,7 @@ ix86_veclibabi_svml (enum built_in_function
fn, tree type_out, tree type_in)
   name[4] &= ~0x20;

   arity = 0;
-  for (args = DECL_ARGUMENTS (builtin_decl_implicit (fn));
-       args;
-       args = TREE_CHAIN (args))
+  for (args = DECL_ARGUMENTS (fndecl); args; args = TREE_CHAIN (args))
     arity++;


or this?

Did you try this out?  We have only two basic testcases for all this
code using sin()
which may not end up as IFN even with -ffast-math(?).

+/* Implement TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION.  */
+
+static tree
+rs6000_builtin_md_vectorized_function (tree fndecl, tree type_out,
+                                      tree type_in)
+{

any reason you are using a fndecl for this hook instead of the function code?

@@ -1639,20 +1639,20 @@ vect_finish_stmt_generation (gimple *stmt,
gimple *vec_stmt,
 tree
 vectorizable_function (gcall *call, tree vectype_out, tree vectype_in)
 {
-  tree fndecl = gimple_call_fndecl (call);
-
-  /* We only handle functions that do not read or clobber memory -- i.e.
-     const or novops ones.  */
-  if (!(gimple_call_flags (call) & (ECF_CONST | ECF_NOVOPS)))
+  /* We only handle functions that do not read or clobber memory.  */
+  if (gimple_vuse (call))
     return NULL_TREE;

-  if (!fndecl
-      || TREE_CODE (fndecl) != FUNCTION_DECL
-      || !DECL_BUILT_IN (fndecl))
-    return NULL_TREE;
+  combined_fn fn = gimple_call_combined_fn (call);
+  if (fn != CFN_LAST)
+    return targetm.vectorize.builtin_vectorized_function
+      (fn, vectype_out, vectype_in);

-  return targetm.vectorize.builtin_vectorized_function (fndecl, vectype_out,
-                                                       vectype_in);
+  if (gimple_call_builtin_p (call, BUILT_IN_MD))
+    return targetm.vectorize.builtin_md_vectorized_function
+      (gimple_call_fndecl (call), vectype_out, vectype_in);
+
+  return NULL_TREE;

Looking at this and the issues above wouldn't it be easier to simply
pass the call stmt to the hook (which then can again handle
both normal and target builtins)?  And it has context available
(actual arguments and number of arguments for IFN calls).

Richard.

>
> gcc/
>         * target.def (builtin_vectorized_function): Take a combined_fn (in
>         the form of an unsigned int) rather than a function decl.
>         (builtin_md_vectorized_function): New.
>         * targhooks.h (default_builtin_vectorized_function): Replace the
>         fndecl argument with an unsigned int.
>         (default_builtin_md_vectorized_function): Declare.
>         * targhooks.c (default_builtin_vectorized_function): Replace the
>         fndecl argument with an unsigned int.
>         (default_builtin_md_vectorized_function): New function.
>         * doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION):
>         New hook.
>         * doc/tm.texi: Regenerate.
>         * tree-vect-stmts.c (vectorizable_function): Update call to
>         builtin_vectorized_function, also passing internal functions.
>         Call builtin_md_vectorized_function for target-specific builtins.
>         * config/aarch64/aarch64-protos.h
>         (aarch64_builtin_vectorized_function): Replace fndecl argument
>         with an unsigned int.
>         * config/aarch64/aarch64-builtins.c: Include case-cfn-macros.h.
>         (aarch64_builtin_vectorized_function): Update after above changes.
>         Use CASE_CFN_*.
>         * config/arm/arm-protos.h (arm_builtin_vectorized_function): Replace
>         fndecl argument with an unsigned int.
>         * config/arm/arm-builtins.c: Include case-cfn-macros.h
>         (arm_builtin_vectorized_function): Update after above changes.
>         Use CASE_CFN_*.
>         * config/i386/i386.c: Include case-cfn-macros.h
>         (ix86_veclib_handler): Take a combined_fn rather than a
>         built_in_function.
>         (ix86_veclibabi_svml, ix86_veclibabi_acml): Likewise.  Use
>         mathfn_built_in rather than calling builtin_decl_implicit directly.
>         (ix86_builtin_vectorized_function) Update after above changes.
>         Use CASE_CFN_*.
>         * config/rs6000/rs6000.c: Include case-cfn-macros.h
>         (rs6000_builtin_vectorized_libmass): Replace fndecl argument
>         with a combined_fn.  Use CASE_CFN_*.  Use mathfn_built_in rather
>         than calling builtin_decl_implicit directly.
>         (rs6000_builtin_vectorized_function): Update after above changes.
>         Use CASE_CFN_*.  Move BUILT_IN_MD to...
>         (rs6000_builtin_md_vectorized_function): ...this new function.
>         (TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION): Define.
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] Use IFN_SQRT in tree-vect-patterns.c
  2015-11-10 10:21   ` Richard Biener
@ 2015-11-10 10:57     ` Richard Sandiford
  2015-11-10 14:42       ` Richard Biener
  2015-11-10 17:29     ` Joseph Myers
  1 sibling, 1 reply; 19+ messages in thread
From: Richard Sandiford @ 2015-11-10 10:57 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Mon, Nov 9, 2015 at 5:21 PM, Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>> In practice all targets that can vectorise sqrt define the appropriate
>> sqrt<mode>2 optab.  The only case where this isn't immediately obvious
>> is the libmass support in rs6000.c, but Mike Meissner said that it shouldn't
>> be exercised for sqrt.
>>
>> This patch therefore uses the internal function interface instead of
>> going via the target hook.
>>
>>
>> gcc/
>>         * tree-vect-patterns.c: Include internal-fn.h.
>>         (vect_recog_pow_pattern): Use IFN_SQRT instead of BUILT_IN_SQRT*.
>>
>> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
>> index bab9a4f..a803e8c 100644
>> --- a/gcc/tree-vect-patterns.c
>> +++ b/gcc/tree-vect-patterns.c
>> @@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "tree-vectorizer.h"
>>  #include "dumpfile.h"
>>  #include "builtins.h"
>> +#include "internal-fn.h"
>>  #include "case-cfn-macros.h"
>>
>>  /* Pattern recognition functions  */
>> @@ -1052,18 +1053,13 @@ vect_recog_pow_pattern (vec<gimple *> *stmts, tree *type_in,
>>    if (TREE_CODE (exp) == REAL_CST
>>        && real_equal (&TREE_REAL_CST (exp), &dconsthalf))
>>      {
>> -      tree newfn = mathfn_built_in (TREE_TYPE (base), BUILT_IN_SQRT);
>>        *type_in = get_vectype_for_scalar_type (TREE_TYPE (base));
>> -      if (*type_in)
>> +      if (*type_in && direct_internal_fn_supported_p (IFN_SQRT, *type_in))
>>         {
>> -         gcall *stmt = gimple_build_call (newfn, 1, base);
>> -         if (vectorizable_function (stmt, *type_in, *type_in)
>> -             != NULL_TREE)
>> -           {
>> -             var = vect_recog_temp_ssa_var (TREE_TYPE (base), stmt);
>> -             gimple_call_set_lhs (stmt, var);
>> -             return stmt;
>> -           }
>> +         gcall *stmt = gimple_build_call_internal (IFN_SQRT, 1, base);
>> +         var = vect_recog_temp_ssa_var (TREE_TYPE (base), stmt);
>> +         gimple_call_set_lhs (stmt, var);
>> +         return stmt;
>
> Looks ok but I wonder if this is dead code with
>
> (for pows (POW)
>      sqrts (SQRT)
>      cbrts (CBRT)
>  (simplify
>   (pows @0 REAL_CST@1)
>   (with {
>     const REAL_VALUE_TYPE *value = TREE_REAL_CST_PTR (@1);
>     REAL_VALUE_TYPE tmp;
>    }
>    (switch
> ...
>     /* pow(x,0.5) -> sqrt(x).  */
>     (if (flag_unsafe_math_optimizations
>          && canonicalize_math_p ()
>          && real_equal (value, &dconsthalf))
>      (sqrts @0))

Yeah, I wondered that too, although I think it's more likely to be dead
because of sincos.  In the end it just seemed like a rabiit hole too far
though.

> Anyway, patch is ok.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] Use IFN_SQRT in tree-vect-patterns.c
  2015-11-10 10:57     ` Richard Sandiford
@ 2015-11-10 14:42       ` Richard Biener
  0 siblings, 0 replies; 19+ messages in thread
From: Richard Biener @ 2015-11-10 14:42 UTC (permalink / raw)
  To: Richard Biener, GCC Patches, richard.sandiford

On Tue, Nov 10, 2015 at 11:57 AM, Richard Sandiford
<richard.sandiford@arm.com> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Mon, Nov 9, 2015 at 5:21 PM, Richard Sandiford
>> <richard.sandiford@arm.com> wrote:
>>> In practice all targets that can vectorise sqrt define the appropriate
>>> sqrt<mode>2 optab.  The only case where this isn't immediately obvious
>>> is the libmass support in rs6000.c, but Mike Meissner said that it shouldn't
>>> be exercised for sqrt.
>>>
>>> This patch therefore uses the internal function interface instead of
>>> going via the target hook.
>>>
>>>
>>> gcc/
>>>         * tree-vect-patterns.c: Include internal-fn.h.
>>>         (vect_recog_pow_pattern): Use IFN_SQRT instead of BUILT_IN_SQRT*.
>>>
>>> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
>>> index bab9a4f..a803e8c 100644
>>> --- a/gcc/tree-vect-patterns.c
>>> +++ b/gcc/tree-vect-patterns.c
>>> @@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.  If not see
>>>  #include "tree-vectorizer.h"
>>>  #include "dumpfile.h"
>>>  #include "builtins.h"
>>> +#include "internal-fn.h"
>>>  #include "case-cfn-macros.h"
>>>
>>>  /* Pattern recognition functions  */
>>> @@ -1052,18 +1053,13 @@ vect_recog_pow_pattern (vec<gimple *> *stmts, tree *type_in,
>>>    if (TREE_CODE (exp) == REAL_CST
>>>        && real_equal (&TREE_REAL_CST (exp), &dconsthalf))
>>>      {
>>> -      tree newfn = mathfn_built_in (TREE_TYPE (base), BUILT_IN_SQRT);
>>>        *type_in = get_vectype_for_scalar_type (TREE_TYPE (base));
>>> -      if (*type_in)
>>> +      if (*type_in && direct_internal_fn_supported_p (IFN_SQRT, *type_in))
>>>         {
>>> -         gcall *stmt = gimple_build_call (newfn, 1, base);
>>> -         if (vectorizable_function (stmt, *type_in, *type_in)
>>> -             != NULL_TREE)
>>> -           {
>>> -             var = vect_recog_temp_ssa_var (TREE_TYPE (base), stmt);
>>> -             gimple_call_set_lhs (stmt, var);
>>> -             return stmt;
>>> -           }
>>> +         gcall *stmt = gimple_build_call_internal (IFN_SQRT, 1, base);
>>> +         var = vect_recog_temp_ssa_var (TREE_TYPE (base), stmt);
>>> +         gimple_call_set_lhs (stmt, var);
>>> +         return stmt;
>>
>> Looks ok but I wonder if this is dead code with
>>
>> (for pows (POW)
>>      sqrts (SQRT)
>>      cbrts (CBRT)
>>  (simplify
>>   (pows @0 REAL_CST@1)
>>   (with {
>>     const REAL_VALUE_TYPE *value = TREE_REAL_CST_PTR (@1);
>>     REAL_VALUE_TYPE tmp;
>>    }
>>    (switch
>> ...
>>     /* pow(x,0.5) -> sqrt(x).  */
>>     (if (flag_unsafe_math_optimizations
>>          && canonicalize_math_p ()
>>          && real_equal (value, &dconsthalf))
>>      (sqrts @0))
>
> Yeah, I wondered that too, although I think it's more likely to be dead
> because of sincos.  In the end it just seemed like a rabiit hole too far
> though.

Indeed.  sincos should also fix the pow(x, 2.) case handled.

Richard.

>> Anyway, patch is ok.
>
> Thanks,
> Richard
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] Use IFN_SQRT in tree-vect-patterns.c
  2015-11-10 10:21   ` Richard Biener
  2015-11-10 10:57     ` Richard Sandiford
@ 2015-11-10 17:29     ` Joseph Myers
  2015-11-10 19:10       ` Richard Biener
  1 sibling, 1 reply; 19+ messages in thread
From: Joseph Myers @ 2015-11-10 17:29 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, richard.sandiford

On Tue, 10 Nov 2015, Richard Biener wrote:

> Looks ok but I wonder if this is dead code with
> 
> (for pows (POW)
>      sqrts (SQRT)
>      cbrts (CBRT)
>  (simplify
>   (pows @0 REAL_CST@1)
>   (with {
>     const REAL_VALUE_TYPE *value = TREE_REAL_CST_PTR (@1);
>     REAL_VALUE_TYPE tmp;
>    }
>    (switch
> ...
>     /* pow(x,0.5) -> sqrt(x).  */
>     (if (flag_unsafe_math_optimizations
>          && canonicalize_math_p ()
>          && real_equal (value, &dconsthalf))
>      (sqrts @0))
> 
> also wondering here about canonicalize_math_p (), I'd expected the
> reverse transform as canonicalization.  Also wondering about
> flag_unsafe_math_optimizations (missing from the vectorizer pattern).

pow(x,0.5) -> sqrt(x) is unsafe because: pow (-0, 0.5) is specified in 
Annex F to be +0 but sqrt (-0) is -0; pow (-Inf, 0.5) is specified in 
Annex F to be +Inf, but sqrt (-Inf) is NaN with "invalid" exception 
raised.  I think it's safe in other cases (the reverse of course is not 
safe, sqrt is a fully-specified correctly-rounded IEEE operation and pow 
isn't).

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/6] Use IFN_SQRT in tree-vect-patterns.c
  2015-11-10 17:29     ` Joseph Myers
@ 2015-11-10 19:10       ` Richard Biener
  0 siblings, 0 replies; 19+ messages in thread
From: Richard Biener @ 2015-11-10 19:10 UTC (permalink / raw)
  To: Joseph Myers; +Cc: GCC Patches, richard.sandiford

On November 10, 2015 6:29:36 PM GMT+01:00, Joseph Myers <joseph@codesourcery.com> wrote:
>On Tue, 10 Nov 2015, Richard Biener wrote:
>
>> Looks ok but I wonder if this is dead code with
>> 
>> (for pows (POW)
>>      sqrts (SQRT)
>>      cbrts (CBRT)
>>  (simplify
>>   (pows @0 REAL_CST@1)
>>   (with {
>>     const REAL_VALUE_TYPE *value = TREE_REAL_CST_PTR (@1);
>>     REAL_VALUE_TYPE tmp;
>>    }
>>    (switch
>> ...
>>     /* pow(x,0.5) -> sqrt(x).  */
>>     (if (flag_unsafe_math_optimizations
>>          && canonicalize_math_p ()
>>          && real_equal (value, &dconsthalf))
>>      (sqrts @0))
>> 
>> also wondering here about canonicalize_math_p (), I'd expected the
>> reverse transform as canonicalization.  Also wondering about
>> flag_unsafe_math_optimizations (missing from the vectorizer pattern).
>
>pow(x,0.5) -> sqrt(x) is unsafe because: pow (-0, 0.5) is specified in 
>Annex F to be +0 but sqrt (-0) is -0; pow (-Inf, 0.5) is specified in 
>Annex F to be +Inf, but sqrt (-Inf) is NaN with "invalid" exception 
>raised.  I think it's safe in other cases

So it's safe with no signed zeros and finite math rather than unsafe.  The reverse would be unsafe in addition (not fully specified and rounded).

 (the reverse of course is not
>
>safe, sqrt is a fully-specified correctly-rounded IEEE operation and
>pow 
>isn't).


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] Make builtin_vectorized_function take a combined_fn
  2015-11-10 10:36   ` Richard Biener
@ 2015-11-13 12:27     ` Richard Sandiford
  2015-11-16 13:58       ` Richard Biener
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Sandiford @ 2015-11-13 12:27 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Mon, Nov 9, 2015 at 5:25 PM, Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>> This patch replaces the fndecl argument to builtin_vectorized_function
>> with a combined_fn and gets the vectoriser to call it for internal
>> functions too.  The patch also moves vectorisation of machine-specific
>> built-ins to a new hook, builtin_md_vectorized_function.
>>
>> I've attached a -b version too since that's easier to read.
>
> @@ -42095,8 +42018,7 @@ ix86_builtin_vectorized_function (tree fndecl,
> tree type_out,
>
>    /* Dispatch to a handler for a vectorization library.  */
>    if (ix86_veclib_handler)
> -    return ix86_veclib_handler ((enum built_in_function) fn, type_out,
> -                               type_in);
> +    return ix86_veclib_handler (combined_fn (fn), type_out, type_in);
>
>    return NULL_TREE;
>  }
>
> fn is already a combined_fn?  Why does the builtin_vectorized_function
> not take one but an unsigned int?

Not everything that includes the target headers includes tree.h.
This is like builtin_conversion taking a tree_code as an unsigned int.

> @@ -42176,11 +42077,12 @@ ix86_veclibabi_svml (enum built_in_function
> fn, tree type_out, tree type_in)
>        return NULL_TREE;
>      }
>
> -  bname = IDENTIFIER_POINTER (DECL_NAME (builtin_decl_implicit (fn)));
> +  tree fndecl = mathfn_built_in (TREE_TYPE (type_in), fn);
> +  bname = IDENTIFIER_POINTER (DECL_NAME (fndecl));
>
> -  if (fn == BUILT_IN_LOGF)
> +  if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_LOGF)
>
> with 'fn' now a combined_fn how is this going to work with IFNs?

By this point we already know that the function has one of the
supported modes.  A previous patch extended matchfn_built_in
to handle combined_fns.  E.g.

  mathfn_built_in (float_type_node, IFN_SQRT)

returns BUILT_IN_SQRTF.

> +/* Implement TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION.  */
> +
> +static tree
> +rs6000_builtin_md_vectorized_function (tree fndecl, tree type_out,
> +                                      tree type_in)
> +{
>
> any reason you are using a fndecl for this hook instead of the function code?

It just seems more helpful to pass the fndecl when we have it.
It's cheap to go from the decl to the code but it's less cheap
to go the other way.

> @@ -1639,20 +1639,20 @@ vect_finish_stmt_generation (gimple *stmt,
> gimple *vec_stmt,
>  tree
>  vectorizable_function (gcall *call, tree vectype_out, tree vectype_in)
>  {
> -  tree fndecl = gimple_call_fndecl (call);
> -
> -  /* We only handle functions that do not read or clobber memory -- i.e.
> -     const or novops ones.  */
> -  if (!(gimple_call_flags (call) & (ECF_CONST | ECF_NOVOPS)))
> +  /* We only handle functions that do not read or clobber memory.  */
> +  if (gimple_vuse (call))
>      return NULL_TREE;
>
> -  if (!fndecl
> -      || TREE_CODE (fndecl) != FUNCTION_DECL
> -      || !DECL_BUILT_IN (fndecl))
> -    return NULL_TREE;
> +  combined_fn fn = gimple_call_combined_fn (call);
> +  if (fn != CFN_LAST)
> +    return targetm.vectorize.builtin_vectorized_function
> +      (fn, vectype_out, vectype_in);
>
> -  return targetm.vectorize.builtin_vectorized_function (fndecl, vectype_out,
> -                                                       vectype_in);
> +  if (gimple_call_builtin_p (call, BUILT_IN_MD))
> +    return targetm.vectorize.builtin_md_vectorized_function
> +      (gimple_call_fndecl (call), vectype_out, vectype_in);
> +
> +  return NULL_TREE;
>
> Looking at this and the issues above wouldn't it be easier to simply
> pass the call stmt to the hook (which then can again handle
> both normal and target builtins)?  And it has context available
> (actual arguments and number of arguments for IFN calls).

I'd rather not do that, since it means we have to construct a gcall *
in cases where we're not asking about a straight-forward vectorisation
of a preexisting scalar statement.

The number of arguments is an inherent property of the function,
it doesn't require access to a particular call.  The hook tells
us what vector types we're using (and by extension what types
the scalar op would have).

Thanks,
Richard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/6] Make builtin_vectorized_function take a combined_fn
  2015-11-13 12:27     ` Richard Sandiford
@ 2015-11-16 13:58       ` Richard Biener
  0 siblings, 0 replies; 19+ messages in thread
From: Richard Biener @ 2015-11-16 13:58 UTC (permalink / raw)
  To: Richard Biener, GCC Patches, richard.sandiford

On Fri, Nov 13, 2015 at 1:27 PM, Richard Sandiford
<richard.sandiford@arm.com> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Mon, Nov 9, 2015 at 5:25 PM, Richard Sandiford
>> <richard.sandiford@arm.com> wrote:
>>> This patch replaces the fndecl argument to builtin_vectorized_function
>>> with a combined_fn and gets the vectoriser to call it for internal
>>> functions too.  The patch also moves vectorisation of machine-specific
>>> built-ins to a new hook, builtin_md_vectorized_function.
>>>
>>> I've attached a -b version too since that's easier to read.
>>
>> @@ -42095,8 +42018,7 @@ ix86_builtin_vectorized_function (tree fndecl,
>> tree type_out,
>>
>>    /* Dispatch to a handler for a vectorization library.  */
>>    if (ix86_veclib_handler)
>> -    return ix86_veclib_handler ((enum built_in_function) fn, type_out,
>> -                               type_in);
>> +    return ix86_veclib_handler (combined_fn (fn), type_out, type_in);
>>
>>    return NULL_TREE;
>>  }
>>
>> fn is already a combined_fn?  Why does the builtin_vectorized_function
>> not take one but an unsigned int?
>
> Not everything that includes the target headers includes tree.h.
> This is like builtin_conversion taking a tree_code as an unsigned int.
>
>> @@ -42176,11 +42077,12 @@ ix86_veclibabi_svml (enum built_in_function
>> fn, tree type_out, tree type_in)
>>        return NULL_TREE;
>>      }
>>
>> -  bname = IDENTIFIER_POINTER (DECL_NAME (builtin_decl_implicit (fn)));
>> +  tree fndecl = mathfn_built_in (TREE_TYPE (type_in), fn);
>> +  bname = IDENTIFIER_POINTER (DECL_NAME (fndecl));
>>
>> -  if (fn == BUILT_IN_LOGF)
>> +  if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_LOGF)
>>
>> with 'fn' now a combined_fn how is this going to work with IFNs?
>
> By this point we already know that the function has one of the
> supported modes.  A previous patch extended matchfn_built_in
> to handle combined_fns.  E.g.
>
>   mathfn_built_in (float_type_node, IFN_SQRT)
>
> returns BUILT_IN_SQRTF.

Ah, I missed that I suppose.

>> +/* Implement TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION.  */
>> +
>> +static tree
>> +rs6000_builtin_md_vectorized_function (tree fndecl, tree type_out,
>> +                                      tree type_in)
>> +{
>>
>> any reason you are using a fndecl for this hook instead of the function code?
>
> It just seems more helpful to pass the fndecl when we have it.
> It's cheap to go from the decl to the code but it's less cheap
> to go the other way.

Ok, but for the other hook you changed it ...

>> @@ -1639,20 +1639,20 @@ vect_finish_stmt_generation (gimple *stmt,
>> gimple *vec_stmt,
>>  tree
>>  vectorizable_function (gcall *call, tree vectype_out, tree vectype_in)
>>  {
>> -  tree fndecl = gimple_call_fndecl (call);
>> -
>> -  /* We only handle functions that do not read or clobber memory -- i.e.
>> -     const or novops ones.  */
>> -  if (!(gimple_call_flags (call) & (ECF_CONST | ECF_NOVOPS)))
>> +  /* We only handle functions that do not read or clobber memory.  */
>> +  if (gimple_vuse (call))
>>      return NULL_TREE;
>>
>> -  if (!fndecl
>> -      || TREE_CODE (fndecl) != FUNCTION_DECL
>> -      || !DECL_BUILT_IN (fndecl))
>> -    return NULL_TREE;
>> +  combined_fn fn = gimple_call_combined_fn (call);
>> +  if (fn != CFN_LAST)
>> +    return targetm.vectorize.builtin_vectorized_function
>> +      (fn, vectype_out, vectype_in);
>>
>> -  return targetm.vectorize.builtin_vectorized_function (fndecl, vectype_out,
>> -                                                       vectype_in);
>> +  if (gimple_call_builtin_p (call, BUILT_IN_MD))
>> +    return targetm.vectorize.builtin_md_vectorized_function
>> +      (gimple_call_fndecl (call), vectype_out, vectype_in);
>> +
>> +  return NULL_TREE;
>>
>> Looking at this and the issues above wouldn't it be easier to simply
>> pass the call stmt to the hook (which then can again handle
>> both normal and target builtins)?  And it has context available
>> (actual arguments and number of arguments for IFN calls).
>
> I'd rather not do that, since it means we have to construct a gcall *
> in cases where we're not asking about a straight-forward vectorisation
> of a preexisting scalar statement.
>
> The number of arguments is an inherent property of the function,
> it doesn't require access to a particular call.
>  The hook tells
> us what vector types we're using (and by extension what types
> the scalar op would have).

... so merging the hooks by passing both the combined fn code
and the decl would be possible?  The decl can be NULL if
the fn code is not CFN_LAST and if it is CFN_LAST then the decl
may be a target builtin?

Maybe I'm just too worried about that clean separation...  so decide
for yourselves here.

Thus, ok.

Thanks,
Richard.

> Thanks,
> Richard
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Ping: [PATCH 3/6] Vectorize internal functions
  2015-11-09 16:27 ` [PATCH 3/6] Vectorize internal functions Richard Sandiford
@ 2015-11-17  9:30   ` Richard Sandiford
  2015-11-17 14:33     ` Richard Biener
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Sandiford @ 2015-11-17  9:30 UTC (permalink / raw)
  To: gcc-patches

Thanks for all the reviews for this series.  I think the patch below
is the only target-independent one that hasn't had any comments.

Richard

Richard Sandiford <richard.sandiford@arm.com> writes:
> This patch tries to vectorize built-in and internal functions as
> internal functions first, falling back on the current built-in
> target hooks otherwise.
>
>
> gcc/
> 	* internal-fn.h (direct_internal_fn_info): Add vectorizable flag.
> 	* internal-fn.c (direct_internal_fn_array): Update accordingly.
> 	* tree-vectorizer.h (vectorizable_function): Delete.
> 	* tree-vect-stmts.c: Include internal-fn.h.
> 	(vectorizable_internal_function): New function.
> 	(vectorizable_function): Inline into...
> 	(vectorizable_call): ...here.  Explicitly reject calls that read
> 	from or write to memory.  Try using an internal function before
> 	falling back on the old vectorizable_function behavior.
>
> diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
> index 898c83d..a5bda2f 100644
> --- a/gcc/internal-fn.c
> +++ b/gcc/internal-fn.c
> @@ -69,13 +69,13 @@ init_internal_fns ()
>  
>  /* Create static initializers for the information returned by
>     direct_internal_fn.  */
> -#define not_direct { -2, -2 }
> -#define mask_load_direct { -1, -1 }
> -#define load_lanes_direct { -1, -1 }
> -#define mask_store_direct { 3, 3 }
> -#define store_lanes_direct { 0, 0 }
> -#define unary_direct { 0, 0 }
> -#define binary_direct { 0, 0 }
> +#define not_direct { -2, -2, false }
> +#define mask_load_direct { -1, -1, false }
> +#define load_lanes_direct { -1, -1, false }
> +#define mask_store_direct { 3, 3, false }
> +#define store_lanes_direct { 0, 0, false }
> +#define unary_direct { 0, 0, true }
> +#define binary_direct { 0, 0, true }
>  
>  const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = {
>  #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) not_direct,
> diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
> index 6cb123f..aea6abd 100644
> --- a/gcc/internal-fn.h
> +++ b/gcc/internal-fn.h
> @@ -134,6 +134,14 @@ struct direct_internal_fn_info
>       function isn't directly mapped to an optab.  */
>    signed int type0 : 8;
>    signed int type1 : 8;
> +  /* True if the function is pointwise, so that it can be vectorized by
> +     converting the return type and all argument types to vectors of the
> +     same number of elements.  E.g. we can vectorize an IFN_SQRT on
> +     floats as an IFN_SQRT on vectors of N floats.
> +
> +     This only needs 1 bit, but occupies the full 16 to ensure a nice
> +     layout.  */
> +  unsigned int vectorizable : 16;
>  };
>  
>  extern const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1];
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index 75389c4..1142142 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -47,6 +47,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-scalar-evolution.h"
>  #include "tree-vectorizer.h"
>  #include "builtins.h"
> +#include "internal-fn.h"
>  
>  /* For lang_hooks.types.type_for_mode.  */
>  #include "langhooks.h"
> @@ -1632,27 +1633,32 @@ vect_finish_stmt_generation (gimple *stmt, gimple *vec_stmt,
>      add_stmt_to_eh_lp (vec_stmt, lp_nr);
>  }
>  
> -/* Checks if CALL can be vectorized in type VECTYPE.  Returns
> -   a function declaration if the target has a vectorized version
> -   of the function, or NULL_TREE if the function cannot be vectorized.  */
> +/* We want to vectorize a call to combined function CFN with function
> +   decl FNDECL, using VECTYPE_OUT as the type of the output and VECTYPE_IN
> +   as the types of all inputs.  Check whether this is possible using
> +   an internal function, returning its code if so or IFN_LAST if not.  */
>  
> -tree
> -vectorizable_function (gcall *call, tree vectype_out, tree vectype_in)
> +static internal_fn
> +vectorizable_internal_function (combined_fn cfn, tree fndecl,
> +				tree vectype_out, tree vectype_in)
>  {
> -  /* We only handle functions that do not read or clobber memory.  */
> -  if (gimple_vuse (call))
> -    return NULL_TREE;
> -
> -  combined_fn fn = gimple_call_combined_fn (call);
> -  if (fn != CFN_LAST)
> -    return targetm.vectorize.builtin_vectorized_function
> -      (fn, vectype_out, vectype_in);
> -
> -  if (gimple_call_builtin_p (call, BUILT_IN_MD))
> -    return targetm.vectorize.builtin_md_vectorized_function
> -      (gimple_call_fndecl (call), vectype_out, vectype_in);
> -
> -  return NULL_TREE;
> +  internal_fn ifn;
> +  if (internal_fn_p (cfn))
> +    ifn = as_internal_fn (cfn);
> +  else
> +    ifn = associated_internal_fn (fndecl);
> +  if (ifn != IFN_LAST && direct_internal_fn_p (ifn))
> +    {
> +      const direct_internal_fn_info &info = direct_internal_fn (ifn);
> +      if (info.vectorizable)
> +	{
> +	  tree type0 = (info.type0 < 0 ? vectype_out : vectype_in);
> +	  tree type1 = (info.type1 < 0 ? vectype_out : vectype_in);
> +	  if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1)))
> +	    return ifn;
> +	}
> +    }
> +  return IFN_LAST;
>  }
>  
>  
> @@ -2232,15 +2238,43 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator *gsi, gimple **vec_stmt,
>    else
>      return false;
>  
> +  /* We only handle functions that do not read or clobber memory.  */
> +  if (gimple_vuse (stmt))
> +    {
> +      if (dump_enabled_p ())
> +	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +			 "function reads from or writes to memory.\n");
> +      return false;
> +    }
> +
>    /* For now, we only vectorize functions if a target specific builtin
>       is available.  TODO -- in some cases, it might be profitable to
>       insert the calls for pieces of the vector, in order to be able
>       to vectorize other operations in the loop.  */
> -  fndecl = vectorizable_function (stmt, vectype_out, vectype_in);
> -  if (fndecl == NULL_TREE)
> +  fndecl = NULL_TREE;
> +  internal_fn ifn = IFN_LAST;
> +  combined_fn cfn = gimple_call_combined_fn (stmt);
> +  tree callee = gimple_call_fndecl (stmt);
> +
> +  /* First try using an internal function.  */
> +  if (cfn != CFN_LAST)
> +    ifn = vectorizable_internal_function (cfn, callee, vectype_out,
> +					  vectype_in);
> +
> +  /* If that fails, try asking for a target-specific built-in function.  */
> +  if (ifn == IFN_LAST)
> +    {
> +      if (cfn != CFN_LAST)
> +	fndecl = targetm.vectorize.builtin_vectorized_function
> +	  (cfn, vectype_out, vectype_in);
> +      else
> +	fndecl = targetm.vectorize.builtin_md_vectorized_function
> +	  (callee, vectype_out, vectype_in);
> +    }
> +
> +  if (ifn == IFN_LAST && !fndecl)
>      {
> -      if (gimple_call_internal_p (stmt)
> -	  && gimple_call_internal_fn (stmt) == IFN_GOMP_SIMD_LANE
> +      if (cfn == CFN_GOMP_SIMD_LANE
>  	  && !slp_node
>  	  && loop_vinfo
>  	  && LOOP_VINFO_LOOP (loop_vinfo)->simduid
> @@ -2261,8 +2295,6 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator *gsi, gimple **vec_stmt,
>  	}
>      }
>  
> -  gcc_assert (!gimple_vuse (stmt));
> -
>    if (slp_node || PURE_SLP_STMT (stmt_info))
>      ncopies = 1;
>    else if (modifier == NARROW)
> @@ -2324,7 +2356,10 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator *gsi, gimple **vec_stmt,
>  		      vec<tree> vec_oprndsk = vec_defs[k];
>  		      vargs[k] = vec_oprndsk[i];
>  		    }
> -		  new_stmt = gimple_build_call_vec (fndecl, vargs);
> +		  if (ifn != IFN_LAST)
> +		    new_stmt = gimple_build_call_internal_vec (ifn, vargs);
> +		  else
> +		    new_stmt = gimple_build_call_vec (fndecl, vargs);
>  		  new_temp = make_ssa_name (vec_dest, new_stmt);
>  		  gimple_call_set_lhs (new_stmt, new_temp);
>  		  vect_finish_stmt_generation (stmt, new_stmt, gsi);
> @@ -2372,7 +2407,10 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator *gsi, gimple **vec_stmt,
>  	    }
>  	  else
>  	    {
> -	      new_stmt = gimple_build_call_vec (fndecl, vargs);
> +	      if (ifn != IFN_LAST)
> +		new_stmt = gimple_build_call_internal_vec (ifn, vargs);
> +	      else
> +		new_stmt = gimple_build_call_vec (fndecl, vargs);
>  	      new_temp = make_ssa_name (vec_dest, new_stmt);
>  	      gimple_call_set_lhs (new_stmt, new_temp);
>  	    }
> @@ -2418,7 +2456,10 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator *gsi, gimple **vec_stmt,
>  		      vargs.quick_push (vec_oprndsk[i]);
>  		      vargs.quick_push (vec_oprndsk[i + 1]);
>  		    }
> -		  new_stmt = gimple_build_call_vec (fndecl, vargs);
> +		  if (ifn != IFN_LAST)
> +		    new_stmt = gimple_build_call_internal_vec (ifn, vargs);
> +		  else
> +		    new_stmt = gimple_build_call_vec (fndecl, vargs);
>  		  new_temp = make_ssa_name (vec_dest, new_stmt);
>  		  gimple_call_set_lhs (new_stmt, new_temp);
>  		  vect_finish_stmt_generation (stmt, new_stmt, gsi);
> @@ -2456,7 +2497,10 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator *gsi, gimple **vec_stmt,
>  	      vargs.quick_push (vec_oprnd1);
>  	    }
>  
> -	  new_stmt = gimple_build_call_vec (fndecl, vargs);
> +	  if (ifn != IFN_LAST)
> +	    new_stmt = gimple_build_call_internal_vec (ifn, vargs);
> +	  else
> +	    new_stmt = gimple_build_call_vec (fndecl, vargs);
>  	  new_temp = make_ssa_name (vec_dest, new_stmt);
>  	  gimple_call_set_lhs (new_stmt, new_temp);
>  	  vect_finish_stmt_generation (stmt, new_stmt, gsi);
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 9cde091..bb1ab39 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -958,7 +958,6 @@ extern bool supportable_narrowing_operation (enum tree_code, tree, tree,
>  					     int *, vec<tree> *);
>  extern stmt_vec_info new_stmt_vec_info (gimple *stmt, vec_info *);
>  extern void free_stmt_vec_info (gimple *stmt);
> -extern tree vectorizable_function (gcall *, tree, tree);
>  extern void vect_model_simple_cost (stmt_vec_info, int, enum vect_def_type *,
>                                      stmt_vector_for_cost *,
>  				    stmt_vector_for_cost *);

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Ping: [PATCH 3/6] Vectorize internal functions
  2015-11-17  9:30   ` Ping: " Richard Sandiford
@ 2015-11-17 14:33     ` Richard Biener
  0 siblings, 0 replies; 19+ messages in thread
From: Richard Biener @ 2015-11-17 14:33 UTC (permalink / raw)
  To: GCC Patches, richard.sandiford

On Tue, Nov 17, 2015 at 10:30 AM, Richard Sandiford
<richard.sandiford@arm.com> wrote:
> Thanks for all the reviews for this series.  I think the patch below
> is the only target-independent one that hasn't had any comments.

This patch is ok.

Thanks,
Richard.

> Richard
>
> Richard Sandiford <richard.sandiford@arm.com> writes:
>> This patch tries to vectorize built-in and internal functions as
>> internal functions first, falling back on the current built-in
>> target hooks otherwise.
>>
>>
>> gcc/
>>       * internal-fn.h (direct_internal_fn_info): Add vectorizable flag.
>>       * internal-fn.c (direct_internal_fn_array): Update accordingly.
>>       * tree-vectorizer.h (vectorizable_function): Delete.
>>       * tree-vect-stmts.c: Include internal-fn.h.
>>       (vectorizable_internal_function): New function.
>>       (vectorizable_function): Inline into...
>>       (vectorizable_call): ...here.  Explicitly reject calls that read
>>       from or write to memory.  Try using an internal function before
>>       falling back on the old vectorizable_function behavior.
>>
>> diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
>> index 898c83d..a5bda2f 100644
>> --- a/gcc/internal-fn.c
>> +++ b/gcc/internal-fn.c
>> @@ -69,13 +69,13 @@ init_internal_fns ()
>>
>>  /* Create static initializers for the information returned by
>>     direct_internal_fn.  */
>> -#define not_direct { -2, -2 }
>> -#define mask_load_direct { -1, -1 }
>> -#define load_lanes_direct { -1, -1 }
>> -#define mask_store_direct { 3, 3 }
>> -#define store_lanes_direct { 0, 0 }
>> -#define unary_direct { 0, 0 }
>> -#define binary_direct { 0, 0 }
>> +#define not_direct { -2, -2, false }
>> +#define mask_load_direct { -1, -1, false }
>> +#define load_lanes_direct { -1, -1, false }
>> +#define mask_store_direct { 3, 3, false }
>> +#define store_lanes_direct { 0, 0, false }
>> +#define unary_direct { 0, 0, true }
>> +#define binary_direct { 0, 0, true }
>>
>>  const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = {
>>  #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) not_direct,
>> diff --git a/gcc/internal-fn.h b/gcc/internal-fn.h
>> index 6cb123f..aea6abd 100644
>> --- a/gcc/internal-fn.h
>> +++ b/gcc/internal-fn.h
>> @@ -134,6 +134,14 @@ struct direct_internal_fn_info
>>       function isn't directly mapped to an optab.  */
>>    signed int type0 : 8;
>>    signed int type1 : 8;
>> +  /* True if the function is pointwise, so that it can be vectorized by
>> +     converting the return type and all argument types to vectors of the
>> +     same number of elements.  E.g. we can vectorize an IFN_SQRT on
>> +     floats as an IFN_SQRT on vectors of N floats.
>> +
>> +     This only needs 1 bit, but occupies the full 16 to ensure a nice
>> +     layout.  */
>> +  unsigned int vectorizable : 16;
>>  };
>>
>>  extern const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1];
>> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
>> index 75389c4..1142142 100644
>> --- a/gcc/tree-vect-stmts.c
>> +++ b/gcc/tree-vect-stmts.c
>> @@ -47,6 +47,7 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "tree-scalar-evolution.h"
>>  #include "tree-vectorizer.h"
>>  #include "builtins.h"
>> +#include "internal-fn.h"
>>
>>  /* For lang_hooks.types.type_for_mode.  */
>>  #include "langhooks.h"
>> @@ -1632,27 +1633,32 @@ vect_finish_stmt_generation (gimple *stmt, gimple *vec_stmt,
>>      add_stmt_to_eh_lp (vec_stmt, lp_nr);
>>  }
>>
>> -/* Checks if CALL can be vectorized in type VECTYPE.  Returns
>> -   a function declaration if the target has a vectorized version
>> -   of the function, or NULL_TREE if the function cannot be vectorized.  */
>> +/* We want to vectorize a call to combined function CFN with function
>> +   decl FNDECL, using VECTYPE_OUT as the type of the output and VECTYPE_IN
>> +   as the types of all inputs.  Check whether this is possible using
>> +   an internal function, returning its code if so or IFN_LAST if not.  */
>>
>> -tree
>> -vectorizable_function (gcall *call, tree vectype_out, tree vectype_in)
>> +static internal_fn
>> +vectorizable_internal_function (combined_fn cfn, tree fndecl,
>> +                             tree vectype_out, tree vectype_in)
>>  {
>> -  /* We only handle functions that do not read or clobber memory.  */
>> -  if (gimple_vuse (call))
>> -    return NULL_TREE;
>> -
>> -  combined_fn fn = gimple_call_combined_fn (call);
>> -  if (fn != CFN_LAST)
>> -    return targetm.vectorize.builtin_vectorized_function
>> -      (fn, vectype_out, vectype_in);
>> -
>> -  if (gimple_call_builtin_p (call, BUILT_IN_MD))
>> -    return targetm.vectorize.builtin_md_vectorized_function
>> -      (gimple_call_fndecl (call), vectype_out, vectype_in);
>> -
>> -  return NULL_TREE;
>> +  internal_fn ifn;
>> +  if (internal_fn_p (cfn))
>> +    ifn = as_internal_fn (cfn);
>> +  else
>> +    ifn = associated_internal_fn (fndecl);
>> +  if (ifn != IFN_LAST && direct_internal_fn_p (ifn))
>> +    {
>> +      const direct_internal_fn_info &info = direct_internal_fn (ifn);
>> +      if (info.vectorizable)
>> +     {
>> +       tree type0 = (info.type0 < 0 ? vectype_out : vectype_in);
>> +       tree type1 = (info.type1 < 0 ? vectype_out : vectype_in);
>> +       if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1)))
>> +         return ifn;
>> +     }
>> +    }
>> +  return IFN_LAST;
>>  }
>>
>>
>> @@ -2232,15 +2238,43 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator *gsi, gimple **vec_stmt,
>>    else
>>      return false;
>>
>> +  /* We only handle functions that do not read or clobber memory.  */
>> +  if (gimple_vuse (stmt))
>> +    {
>> +      if (dump_enabled_p ())
>> +     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> +                      "function reads from or writes to memory.\n");
>> +      return false;
>> +    }
>> +
>>    /* For now, we only vectorize functions if a target specific builtin
>>       is available.  TODO -- in some cases, it might be profitable to
>>       insert the calls for pieces of the vector, in order to be able
>>       to vectorize other operations in the loop.  */
>> -  fndecl = vectorizable_function (stmt, vectype_out, vectype_in);
>> -  if (fndecl == NULL_TREE)
>> +  fndecl = NULL_TREE;
>> +  internal_fn ifn = IFN_LAST;
>> +  combined_fn cfn = gimple_call_combined_fn (stmt);
>> +  tree callee = gimple_call_fndecl (stmt);
>> +
>> +  /* First try using an internal function.  */
>> +  if (cfn != CFN_LAST)
>> +    ifn = vectorizable_internal_function (cfn, callee, vectype_out,
>> +                                       vectype_in);
>> +
>> +  /* If that fails, try asking for a target-specific built-in function.  */
>> +  if (ifn == IFN_LAST)
>> +    {
>> +      if (cfn != CFN_LAST)
>> +     fndecl = targetm.vectorize.builtin_vectorized_function
>> +       (cfn, vectype_out, vectype_in);
>> +      else
>> +     fndecl = targetm.vectorize.builtin_md_vectorized_function
>> +       (callee, vectype_out, vectype_in);
>> +    }
>> +
>> +  if (ifn == IFN_LAST && !fndecl)
>>      {
>> -      if (gimple_call_internal_p (stmt)
>> -       && gimple_call_internal_fn (stmt) == IFN_GOMP_SIMD_LANE
>> +      if (cfn == CFN_GOMP_SIMD_LANE
>>         && !slp_node
>>         && loop_vinfo
>>         && LOOP_VINFO_LOOP (loop_vinfo)->simduid
>> @@ -2261,8 +2295,6 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator *gsi, gimple **vec_stmt,
>>       }
>>      }
>>
>> -  gcc_assert (!gimple_vuse (stmt));
>> -
>>    if (slp_node || PURE_SLP_STMT (stmt_info))
>>      ncopies = 1;
>>    else if (modifier == NARROW)
>> @@ -2324,7 +2356,10 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator *gsi, gimple **vec_stmt,
>>                     vec<tree> vec_oprndsk = vec_defs[k];
>>                     vargs[k] = vec_oprndsk[i];
>>                   }
>> -               new_stmt = gimple_build_call_vec (fndecl, vargs);
>> +               if (ifn != IFN_LAST)
>> +                 new_stmt = gimple_build_call_internal_vec (ifn, vargs);
>> +               else
>> +                 new_stmt = gimple_build_call_vec (fndecl, vargs);
>>                 new_temp = make_ssa_name (vec_dest, new_stmt);
>>                 gimple_call_set_lhs (new_stmt, new_temp);
>>                 vect_finish_stmt_generation (stmt, new_stmt, gsi);
>> @@ -2372,7 +2407,10 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator *gsi, gimple **vec_stmt,
>>           }
>>         else
>>           {
>> -           new_stmt = gimple_build_call_vec (fndecl, vargs);
>> +           if (ifn != IFN_LAST)
>> +             new_stmt = gimple_build_call_internal_vec (ifn, vargs);
>> +           else
>> +             new_stmt = gimple_build_call_vec (fndecl, vargs);
>>             new_temp = make_ssa_name (vec_dest, new_stmt);
>>             gimple_call_set_lhs (new_stmt, new_temp);
>>           }
>> @@ -2418,7 +2456,10 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator *gsi, gimple **vec_stmt,
>>                     vargs.quick_push (vec_oprndsk[i]);
>>                     vargs.quick_push (vec_oprndsk[i + 1]);
>>                   }
>> -               new_stmt = gimple_build_call_vec (fndecl, vargs);
>> +               if (ifn != IFN_LAST)
>> +                 new_stmt = gimple_build_call_internal_vec (ifn, vargs);
>> +               else
>> +                 new_stmt = gimple_build_call_vec (fndecl, vargs);
>>                 new_temp = make_ssa_name (vec_dest, new_stmt);
>>                 gimple_call_set_lhs (new_stmt, new_temp);
>>                 vect_finish_stmt_generation (stmt, new_stmt, gsi);
>> @@ -2456,7 +2497,10 @@ vectorizable_call (gimple *gs, gimple_stmt_iterator *gsi, gimple **vec_stmt,
>>             vargs.quick_push (vec_oprnd1);
>>           }
>>
>> -       new_stmt = gimple_build_call_vec (fndecl, vargs);
>> +       if (ifn != IFN_LAST)
>> +         new_stmt = gimple_build_call_internal_vec (ifn, vargs);
>> +       else
>> +         new_stmt = gimple_build_call_vec (fndecl, vargs);
>>         new_temp = make_ssa_name (vec_dest, new_stmt);
>>         gimple_call_set_lhs (new_stmt, new_temp);
>>         vect_finish_stmt_generation (stmt, new_stmt, gsi);
>> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
>> index 9cde091..bb1ab39 100644
>> --- a/gcc/tree-vectorizer.h
>> +++ b/gcc/tree-vectorizer.h
>> @@ -958,7 +958,6 @@ extern bool supportable_narrowing_operation (enum tree_code, tree, tree,
>>                                            int *, vec<tree> *);
>>  extern stmt_vec_info new_stmt_vec_info (gimple *stmt, vec_info *);
>>  extern void free_stmt_vec_info (gimple *stmt);
>> -extern tree vectorizable_function (gcall *, tree, tree);
>>  extern void vect_model_simple_cost (stmt_vec_info, int, enum vect_def_type *,
>>                                      stmt_vector_for_cost *,
>>                                   stmt_vector_for_cost *);
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2015-11-17 14:33 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-09 16:20 [PATCH 0/6] Automatically use vector optabs Richard Sandiford
2015-11-09 16:21 ` [PATCH 1/6] Use IFN_SQRT in tree-vect-patterns.c Richard Sandiford
2015-11-10 10:21   ` Richard Biener
2015-11-10 10:57     ` Richard Sandiford
2015-11-10 14:42       ` Richard Biener
2015-11-10 17:29     ` Joseph Myers
2015-11-10 19:10       ` Richard Biener
2015-11-09 16:25 ` [PATCH 2/6] Make builtin_vectorized_function take a combined_fn Richard Sandiford
2015-11-10 10:36   ` Richard Biener
2015-11-13 12:27     ` Richard Sandiford
2015-11-16 13:58       ` Richard Biener
2015-11-09 16:27 ` [PATCH 3/6] Vectorize internal functions Richard Sandiford
2015-11-17  9:30   ` Ping: " Richard Sandiford
2015-11-17 14:33     ` Richard Biener
2015-11-09 16:28 ` [PATCH 4/6] Simplify ix86_builtin_vectorized_function Richard Sandiford
2015-11-09 19:55   ` Uros Bizjak
2015-11-09 16:30 ` [PATCH 5/6] Simplify rs6000_builtin_vectorized_function Richard Sandiford
2015-11-09 17:47   ` David Edelsohn
2015-11-09 16:32 ` [PATCH 6/6] Simplify aarch64_builtin_vectorized_function Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).