public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCHv5 00/18] Replace the Power target-specific builtin machinery
@ 2021-09-01 16:13 Bill Schmidt
  2021-09-01 16:13 ` [PATCH 01/18] rs6000: Handle overloads during program parsing Bill Schmidt
                   ` (18 more replies)
  0 siblings, 19 replies; 52+ messages in thread
From: Bill Schmidt @ 2021-09-01 16:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc

Hi!

Original patch series here:
https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568840.html

V2 patch series here:
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572231.html

V3 patch series here:
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573020.html

V4 patch series here:
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/576284.html

Thanks for all the reviews so far!  We're into the home stretch.  I needed
to rebase this series again in order to pick up some changes from upstream.

Patch 01/18 is a reposting of V4 patch 19/34, addressing some of the
comments.  Full refactoring of this stuff will be done later, after this
patch series can burn in a little.  This wasn't yet formally approved.

Patch 02/18 is new, and is a minor bug fix.

Patches 03/18 through 17/18 correspond to V4 patches 20/34 through 34/34.
These were adjusted for upstream changes, and I did some formatting
cleanups.  I also provided better descriptions for some of the patches.

Patch 18/18 is new, and improves the parser to handle escape-newline
input.  With that in place, it cleans up all the long lines in the
input files.

Bootstrapped and tested on powerpc64le-linux-gnu (P10) and
powerpc64-linux-gnu (32- and 64-bit, P8).  There are no regressions for
little endian.  There are a small handful of big-endian regressions that
have crept in, and I'll post patches for those after I work through them.
But no need to hold up reviews on the rest of this in the meantime.

Thanks again for all of the helpful reviews so far!

Bill


Bill Schmidt (18):
  rs6000: Handle overloads during program parsing
  rs6000: Move __builtin_mffsl to the [always] stanza
  rs6000: Handle gimple folding of target built-ins
  rs6000: Handle some recent MMA builtin changes
  rs6000: Support for vectorizing built-in functions
  rs6000: Builtin expansion, part 1
  rs6000: Builtin expansion, part 2
  rs6000: Builtin expansion, part 3
  rs6000: Builtin expansion, part 4
  rs6000: Builtin expansion, part 5
  rs6000: Builtin expansion, part 6
  rs6000: Update rs6000_builtin_decl
  rs6000: Miscellaneous uses of rs6000_builtins_decl_x
  rs6000: Debug support
  rs6000: Update altivec.h for automated interfaces
  rs6000: Test case adjustments
  rs6000: Enable the new builtin support
  rs6000: Add escape-newline support for builtins files

 gcc/config/rs6000/altivec.h                   |  519 +--
 gcc/config/rs6000/rs6000-builtin-new.def      |  442 ++-
 gcc/config/rs6000/rs6000-c.c                  | 1088 ++++++
 gcc/config/rs6000/rs6000-call.c               | 3132 +++++++++++++++--
 gcc/config/rs6000/rs6000-gen-builtins.c       |  312 +-
 gcc/config/rs6000/rs6000.c                    |  272 +-
 .../powerpc/bfp/scalar-extract-exp-2.c        |    2 +-
 .../powerpc/bfp/scalar-extract-sig-2.c        |    2 +-
 .../powerpc/bfp/scalar-insert-exp-2.c         |    2 +-
 .../powerpc/bfp/scalar-insert-exp-5.c         |    2 +-
 .../powerpc/bfp/scalar-insert-exp-8.c         |    2 +-
 .../powerpc/bfp/scalar-test-neg-2.c           |    2 +-
 .../powerpc/bfp/scalar-test-neg-3.c           |    2 +-
 .../powerpc/bfp/scalar-test-neg-5.c           |    2 +-
 .../gcc.target/powerpc/byte-in-set-2.c        |    2 +-
 gcc/testsuite/gcc.target/powerpc/cmpb-2.c     |    2 +-
 gcc/testsuite/gcc.target/powerpc/cmpb32-2.c   |    2 +-
 .../gcc.target/powerpc/crypto-builtin-2.c     |   14 +-
 .../powerpc/fold-vec-splat-floatdouble.c      |    4 +-
 .../powerpc/fold-vec-splat-longlong.c         |   10 +-
 .../powerpc/fold-vec-splat-misc-invalid.c     |    8 +-
 .../gcc.target/powerpc/int_128bit-runnable.c  |    6 +-
 .../gcc.target/powerpc/p8vector-builtin-8.c   |    1 +
 gcc/testsuite/gcc.target/powerpc/pr80315-1.c  |    2 +-
 gcc/testsuite/gcc.target/powerpc/pr80315-2.c  |    2 +-
 gcc/testsuite/gcc.target/powerpc/pr80315-3.c  |    2 +-
 gcc/testsuite/gcc.target/powerpc/pr80315-4.c  |    2 +-
 gcc/testsuite/gcc.target/powerpc/pr88100.c    |   12 +-
 .../gcc.target/powerpc/pragma_misc9.c         |    2 +-
 .../gcc.target/powerpc/pragma_power8.c        |    2 +
 .../gcc.target/powerpc/pragma_power9.c        |    3 +
 .../powerpc/test_fpscr_drn_builtin_error.c    |    4 +-
 .../powerpc/test_fpscr_rn_builtin_error.c     |   12 +-
 gcc/testsuite/gcc.target/powerpc/test_mffsl.c |    3 +-
 gcc/testsuite/gcc.target/powerpc/vec-gnb-2.c  |    2 +-
 .../gcc.target/powerpc/vsu/vec-all-nez-7.c    |    2 +-
 .../gcc.target/powerpc/vsu/vec-any-eqz-7.c    |    2 +-
 .../gcc.target/powerpc/vsu/vec-cmpnez-7.c     |    2 +-
 .../gcc.target/powerpc/vsu/vec-cntlz-lsbb-2.c |    2 +-
 .../gcc.target/powerpc/vsu/vec-cnttz-lsbb-2.c |    2 +-
 .../gcc.target/powerpc/vsu/vec-xl-len-13.c    |    2 +-
 .../gcc.target/powerpc/vsu/vec-xst-len-12.c   |    2 +-
 42 files changed, 4803 insertions(+), 1089 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 01/18] rs6000: Handle overloads during program parsing
  2021-09-01 16:13 [PATCHv5 00/18] Replace the Power target-specific builtin machinery Bill Schmidt
@ 2021-09-01 16:13 ` Bill Schmidt
  2021-09-13 17:17   ` will schmidt
  2021-09-13 23:53   ` Segher Boessenkool
  2021-09-01 16:13 ` [PATCH 02/18] rs6000: Move __builtin_mffsl to the [always] stanza Bill Schmidt
                   ` (17 subsequent siblings)
  18 siblings, 2 replies; 52+ messages in thread
From: Bill Schmidt @ 2021-09-01 16:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc

Although this patch looks quite large, the changes are fairly minimal.
Most of it is duplicating the large function that does the overload
resolution using the automatically generated data structures instead of
the old hand-generated ones.  This doesn't make the patch terribly easy to
review, unfortunately.  Just be aware that generally we aren't changing
the logic and functionality of overload handling.

2021-08-31  Bill Schmidt  <wschmidt@linux.ibm.com>

gcc/
	* config/rs6000/rs6000-c.c (rs6000-builtins.h): New include.
	(altivec_resolve_new_overloaded_builtin): New forward decl.
	(rs6000_new_builtin_type_compatible): New function.
	(altivec_resolve_overloaded_builtin): Call
	altivec_resolve_new_overloaded_builtin.
	(altivec_build_new_resolved_builtin): New function.
	(altivec_resolve_new_overloaded_builtin): Likewise.
	* config/rs6000/rs6000-call.c (rs6000_new_builtin_is_supported):
	Likewise.
	* config/rs6000/rs6000-gen-builtins.c (write_decls): Remove _p from
	name of rs6000_new_builtin_is_supported.
---
 gcc/config/rs6000/rs6000-c.c            | 1088 +++++++++++++++++++++++
 gcc/config/rs6000/rs6000-call.c         |   53 ++
 gcc/config/rs6000/rs6000-gen-builtins.c |    2 +-
 3 files changed, 1142 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index afcb5bb6e39..aafb4e6a98f 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -35,6 +35,9 @@
 #include "langhooks.h"
 #include "c/c-tree.h"
 
+#include "rs6000-builtins.h"
+
+static tree altivec_resolve_new_overloaded_builtin (location_t, tree, void *);
 
 
 /* Handle the machine specific pragma longcall.  Its syntax is
@@ -811,6 +814,30 @@ is_float128_p (tree t)
 	      && t == long_double_type_node));
 }
   
+static bool
+rs6000_new_builtin_type_compatible (tree t, tree u)
+{
+  if (t == error_mark_node)
+    return false;
+
+  if (INTEGRAL_TYPE_P (t) && INTEGRAL_TYPE_P (u))
+    return true;
+
+  if (TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128
+      && is_float128_p (t) && is_float128_p (u))
+    return true;
+
+  if (POINTER_TYPE_P (t) && POINTER_TYPE_P (u))
+    {
+      t = TREE_TYPE (t);
+      u = TREE_TYPE (u);
+      if (TYPE_READONLY (u))
+	t = build_qualified_type (t, TYPE_QUAL_CONST);
+    }
+
+  return lang_hooks.types_compatible_p (t, u);
+}
+
 static inline bool
 rs6000_builtin_type_compatible (tree t, int id)
 {
@@ -927,6 +954,10 @@ tree
 altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,
 				    void *passed_arglist)
 {
+  if (new_builtins_are_live)
+    return altivec_resolve_new_overloaded_builtin (loc, fndecl,
+						   passed_arglist);
+
   vec<tree, va_gc> *arglist = static_cast<vec<tree, va_gc> *> (passed_arglist);
   unsigned int nargs = vec_safe_length (arglist);
   enum rs6000_builtins fcode
@@ -1930,3 +1961,1060 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,
     return error_mark_node;
   }
 }
+
+/* Build a tree for a function call to an Altivec non-overloaded builtin.
+   The overloaded builtin that matched the types and args is described
+   by DESC.  The N arguments are given in ARGS, respectively.
+
+   Actually the only thing it does is calling fold_convert on ARGS, with
+   a small exception for vec_{all,any}_{ge,le} predicates. */
+
+static tree
+altivec_build_new_resolved_builtin (tree *args, int n, tree fntype,
+				    tree ret_type,
+				    rs6000_gen_builtins bif_id,
+				    rs6000_gen_builtins ovld_id)
+{
+  tree argtypes = TYPE_ARG_TYPES (fntype);
+  tree arg_type[MAX_OVLD_ARGS];
+  tree fndecl = rs6000_builtin_decls_x[bif_id];
+  tree call;
+
+  for (int i = 0; i < n; i++)
+    arg_type[i] = TREE_VALUE (argtypes), argtypes = TREE_CHAIN (argtypes);
+
+  /* The AltiVec overloading implementation is overall gross, but this
+     is particularly disgusting.  The vec_{all,any}_{ge,le} builtins
+     are completely different for floating-point vs. integer vector
+     types, because the former has vcmpgefp, but the latter should use
+     vcmpgtXX.
+
+     In practice, the second and third arguments are swapped, and the
+     condition (LT vs. EQ, which is recognizable by bit 1 of the first
+     argument) is reversed.  Patch the arguments here before building
+     the resolved CALL_EXPR.  */
+  if (n == 3
+      && ovld_id == RS6000_OVLD_VEC_CMPGE_P
+      && bif_id != RS6000_BIF_VCMPGEFP_P
+      && bif_id != RS6000_BIF_XVCMPGEDP_P)
+    {
+      std::swap (args[1], args[2]);
+      std::swap (arg_type[1], arg_type[2]);
+
+      args[0] = fold_build2 (BIT_XOR_EXPR, TREE_TYPE (args[0]), args[0],
+			     build_int_cst (NULL_TREE, 2));
+    }
+
+  /* If the number of arguments to an overloaded function increases,
+     we must expand this switch.  */
+  gcc_assert (MAX_OVLD_ARGS <= 4);
+
+  switch (n)
+    {
+    case 0:
+      call = build_call_expr (fndecl, 0);
+      break;
+    case 1:
+      call = build_call_expr (fndecl, 1,
+			      fully_fold_convert (arg_type[0], args[0]));
+      break;
+    case 2:
+      call = build_call_expr (fndecl, 2,
+			      fully_fold_convert (arg_type[0], args[0]),
+			      fully_fold_convert (arg_type[1], args[1]));
+      break;
+    case 3:
+      call = build_call_expr (fndecl, 3,
+			      fully_fold_convert (arg_type[0], args[0]),
+			      fully_fold_convert (arg_type[1], args[1]),
+			      fully_fold_convert (arg_type[2], args[2]));
+      break;
+    case 4:
+      call = build_call_expr (fndecl, 4,
+			      fully_fold_convert (arg_type[0], args[0]),
+			      fully_fold_convert (arg_type[1], args[1]),
+			      fully_fold_convert (arg_type[2], args[2]),
+			      fully_fold_convert (arg_type[3], args[3]));
+      break;
+    default:
+      gcc_unreachable ();
+    }
+  return fold_convert (ret_type, call);
+}
+
+/* Implementation of the resolve_overloaded_builtin target hook, to
+   support Altivec's overloaded builtins.  */
+
+static tree
+altivec_resolve_new_overloaded_builtin (location_t loc, tree fndecl,
+					void *passed_arglist)
+{
+  vec<tree, va_gc> *arglist = static_cast<vec<tree, va_gc> *> (passed_arglist);
+  unsigned int nargs = vec_safe_length (arglist);
+  enum rs6000_gen_builtins fcode
+    = (enum rs6000_gen_builtins) DECL_MD_FUNCTION_CODE (fndecl);
+  tree fnargs = TYPE_ARG_TYPES (TREE_TYPE (fndecl));
+  tree types[MAX_OVLD_ARGS], args[MAX_OVLD_ARGS];
+  unsigned int n;
+
+  /* Return immediately if this isn't an overload.  */
+  if (fcode <= RS6000_OVLD_NONE)
+    return NULL_TREE;
+
+  unsigned int adj_fcode = fcode - RS6000_OVLD_NONE;
+
+  if (TARGET_DEBUG_BUILTIN)
+    fprintf (stderr, "altivec_resolve_overloaded_builtin, code = %4d, %s\n",
+	     (int) fcode, IDENTIFIER_POINTER (DECL_NAME (fndecl)));
+
+  /* vec_lvsl and vec_lvsr are deprecated for use with LE element order.  */
+  if (fcode == RS6000_OVLD_VEC_LVSL && !BYTES_BIG_ENDIAN)
+    warning (OPT_Wdeprecated,
+	     "%<vec_lvsl%> is deprecated for little endian; use "
+	     "assignment for unaligned loads and stores");
+  else if (fcode == RS6000_OVLD_VEC_LVSR && !BYTES_BIG_ENDIAN)
+    warning (OPT_Wdeprecated,
+	     "%<vec_lvsr%> is deprecated for little endian; use "
+	     "assignment for unaligned loads and stores");
+
+  if (fcode == RS6000_OVLD_VEC_MUL)
+    {
+      /* vec_mul needs to be special cased because there are no instructions
+	 for it for the {un}signed char, {un}signed short, and {un}signed int
+	 types.  */
+      if (nargs != 2)
+	{
+	  error ("builtin %qs only accepts 2 arguments", "vec_mul");
+	  return error_mark_node;
+	}
+
+      tree arg0 = (*arglist)[0];
+      tree arg0_type = TREE_TYPE (arg0);
+      tree arg1 = (*arglist)[1];
+      tree arg1_type = TREE_TYPE (arg1);
+
+      /* Both arguments must be vectors and the types must be compatible.  */
+      if (TREE_CODE (arg0_type) != VECTOR_TYPE)
+	goto bad;
+      if (!lang_hooks.types_compatible_p (arg0_type, arg1_type))
+	goto bad;
+
+      switch (TYPE_MODE (TREE_TYPE (arg0_type)))
+	{
+	  case E_QImode:
+	  case E_HImode:
+	  case E_SImode:
+	  case E_DImode:
+	  case E_TImode:
+	    {
+	      /* For scalar types just use a multiply expression.  */
+	      return fold_build2_loc (loc, MULT_EXPR, TREE_TYPE (arg0), arg0,
+				      fold_convert (TREE_TYPE (arg0), arg1));
+	    }
+	  case E_SFmode:
+	    {
+	      /* For floats use the xvmulsp instruction directly.  */
+	      tree call = rs6000_builtin_decls_x[RS6000_BIF_XVMULSP];
+	      return build_call_expr (call, 2, arg0, arg1);
+	    }
+	  case E_DFmode:
+	    {
+	      /* For doubles use the xvmuldp instruction directly.  */
+	      tree call = rs6000_builtin_decls_x[RS6000_BIF_XVMULDP];
+	      return build_call_expr (call, 2, arg0, arg1);
+	    }
+	  /* Other types are errors.  */
+	  default:
+	    goto bad;
+	}
+    }
+
+  if (fcode == RS6000_OVLD_VEC_CMPNE)
+    {
+      /* vec_cmpne needs to be special cased because there are no instructions
+	 for it (prior to power 9).  */
+      if (nargs != 2)
+	{
+	  error ("builtin %qs only accepts 2 arguments", "vec_cmpne");
+	  return error_mark_node;
+	}
+
+      tree arg0 = (*arglist)[0];
+      tree arg0_type = TREE_TYPE (arg0);
+      tree arg1 = (*arglist)[1];
+      tree arg1_type = TREE_TYPE (arg1);
+
+      /* Both arguments must be vectors and the types must be compatible.  */
+      if (TREE_CODE (arg0_type) != VECTOR_TYPE)
+	goto bad;
+      if (!lang_hooks.types_compatible_p (arg0_type, arg1_type))
+	goto bad;
+
+      /* Power9 instructions provide the most efficient implementation of
+	 ALTIVEC_BUILTIN_VEC_CMPNE if the mode is not DImode or TImode
+	 or SFmode or DFmode.  */
+      if (!TARGET_P9_VECTOR
+	  || (TYPE_MODE (TREE_TYPE (arg0_type)) == DImode)
+	  || (TYPE_MODE (TREE_TYPE (arg0_type)) == TImode)
+	  || (TYPE_MODE (TREE_TYPE (arg0_type)) == SFmode)
+	  || (TYPE_MODE (TREE_TYPE (arg0_type)) == DFmode))
+	{
+	  switch (TYPE_MODE (TREE_TYPE (arg0_type)))
+	    {
+	      /* vec_cmpneq (va, vb) == vec_nor (vec_cmpeq (va, vb),
+		 vec_cmpeq (va, vb)).  */
+	      /* Note:  vec_nand also works but opt changes vec_nand's
+		 to vec_nor's anyway.  */
+	    case E_QImode:
+	    case E_HImode:
+	    case E_SImode:
+	    case E_DImode:
+	    case E_TImode:
+	    case E_SFmode:
+	    case E_DFmode:
+	      {
+		/* call = vec_cmpeq (va, vb)
+		   result = vec_nor (call, call).  */
+		vec<tree, va_gc> *params = make_tree_vector ();
+		vec_safe_push (params, arg0);
+		vec_safe_push (params, arg1);
+		tree call = altivec_resolve_new_overloaded_builtin
+		  (loc, rs6000_builtin_decls_x[RS6000_OVLD_VEC_CMPEQ],
+		   params);
+		/* Use save_expr to ensure that operands used more than once
+		   that may have side effects (like calls) are only evaluated
+		   once.  */
+		call = save_expr (call);
+		params = make_tree_vector ();
+		vec_safe_push (params, call);
+		vec_safe_push (params, call);
+		return altivec_resolve_new_overloaded_builtin
+		  (loc, rs6000_builtin_decls_x[RS6000_OVLD_VEC_NOR], params);
+	      }
+	      /* Other types are errors.  */
+	    default:
+	      goto bad;
+	    }
+	}
+      /* else, fall through and process the Power9 alternative below */
+    }
+
+  if (fcode == RS6000_OVLD_VEC_ADDE || fcode == RS6000_OVLD_VEC_SUBE)
+    {
+      /* vec_adde needs to be special cased because there is no instruction
+	  for the {un}signed int version.  */
+      if (nargs != 3)
+	{
+	  const char *name;
+	  name = fcode == RS6000_OVLD_VEC_ADDE ? "vec_adde" : "vec_sube";
+	  error ("builtin %qs only accepts 3 arguments", name);
+	  return error_mark_node;
+	}
+
+      tree arg0 = (*arglist)[0];
+      tree arg0_type = TREE_TYPE (arg0);
+      tree arg1 = (*arglist)[1];
+      tree arg1_type = TREE_TYPE (arg1);
+      tree arg2 = (*arglist)[2];
+      tree arg2_type = TREE_TYPE (arg2);
+
+      /* All 3 arguments must be vectors of (signed or unsigned) (int or
+	 __int128) and the types must be compatible.  */
+      if (TREE_CODE (arg0_type) != VECTOR_TYPE)
+	goto bad;
+      if (!lang_hooks.types_compatible_p (arg0_type, arg1_type)
+	  || !lang_hooks.types_compatible_p (arg1_type, arg2_type))
+	goto bad;
+
+      switch (TYPE_MODE (TREE_TYPE (arg0_type)))
+	{
+	  /* For {un}signed ints,
+	     vec_adde (va, vb, carryv) == vec_add (vec_add (va, vb),
+						   vec_and (carryv, 1)).
+	     vec_sube (va, vb, carryv) == vec_sub (vec_sub (va, vb),
+						   vec_and (carryv, 1)).  */
+	  case E_SImode:
+	    {
+	      tree add_sub_builtin;
+
+	      vec<tree, va_gc> *params = make_tree_vector ();
+	      vec_safe_push (params, arg0);
+	      vec_safe_push (params, arg1);
+
+	      if (fcode == RS6000_OVLD_VEC_ADDE)
+		add_sub_builtin = rs6000_builtin_decls_x[RS6000_OVLD_VEC_ADD];
+	      else
+		add_sub_builtin = rs6000_builtin_decls_x[RS6000_OVLD_VEC_SUB];
+
+	      tree call
+		= altivec_resolve_new_overloaded_builtin (loc,
+							  add_sub_builtin,
+							  params);
+	      tree const1 = build_int_cstu (TREE_TYPE (arg0_type), 1);
+	      tree ones_vector = build_vector_from_val (arg0_type, const1);
+	      tree and_expr = fold_build2_loc (loc, BIT_AND_EXPR, arg0_type,
+					       arg2, ones_vector);
+	      params = make_tree_vector ();
+	      vec_safe_push (params, call);
+	      vec_safe_push (params, and_expr);
+	      return altivec_resolve_new_overloaded_builtin (loc,
+							     add_sub_builtin,
+							     params);
+	    }
+	  /* For {un}signed __int128s use the vaddeuqm/vsubeuqm instruction
+	     directly.  */
+	  case E_TImode:
+	    break;
+
+	  /* Types other than {un}signed int and {un}signed __int128
+		are errors.  */
+	  default:
+	    goto bad;
+	}
+    }
+
+  if (fcode == RS6000_OVLD_VEC_ADDEC || fcode == RS6000_OVLD_VEC_SUBEC)
+    {
+      /* vec_addec and vec_subec needs to be special cased because there is
+	 no instruction for the {un}signed int version.  */
+      if (nargs != 3)
+	{
+	  const char *name;
+	  name = fcode == RS6000_OVLD_VEC_ADDEC ? "vec_addec" : "vec_subec";
+	  error ("builtin %qs only accepts 3 arguments", name);
+	  return error_mark_node;
+	}
+
+      tree arg0 = (*arglist)[0];
+      tree arg0_type = TREE_TYPE (arg0);
+      tree arg1 = (*arglist)[1];
+      tree arg1_type = TREE_TYPE (arg1);
+      tree arg2 = (*arglist)[2];
+      tree arg2_type = TREE_TYPE (arg2);
+
+      /* All 3 arguments must be vectors of (signed or unsigned) (int or
+	 __int128) and the types must be compatible.  */
+      if (TREE_CODE (arg0_type) != VECTOR_TYPE)
+	goto bad;
+      if (!lang_hooks.types_compatible_p (arg0_type, arg1_type)
+	  || !lang_hooks.types_compatible_p (arg1_type, arg2_type))
+	goto bad;
+
+      switch (TYPE_MODE (TREE_TYPE (arg0_type)))
+	{
+	  /* For {un}signed ints,
+	      vec_addec (va, vb, carryv) ==
+				vec_or (vec_addc (va, vb),
+					vec_addc (vec_add (va, vb),
+						  vec_and (carryv, 0x1))).  */
+	  case E_SImode:
+	    {
+	    /* Use save_expr to ensure that operands used more than once
+		that may have side effects (like calls) are only evaluated
+		once.  */
+	    tree as_builtin;
+	    tree as_c_builtin;
+
+	    arg0 = save_expr (arg0);
+	    arg1 = save_expr (arg1);
+	    vec<tree, va_gc> *params = make_tree_vector ();
+	    vec_safe_push (params, arg0);
+	    vec_safe_push (params, arg1);
+
+	    if (fcode == RS6000_OVLD_VEC_ADDEC)
+	      as_c_builtin = rs6000_builtin_decls_x[RS6000_OVLD_VEC_ADDC];
+	    else
+	      as_c_builtin = rs6000_builtin_decls_x[RS6000_OVLD_VEC_SUBC];
+
+	    tree call1 = altivec_resolve_new_overloaded_builtin (loc,
+								 as_c_builtin,
+								 params);
+	    params = make_tree_vector ();
+	    vec_safe_push (params, arg0);
+	    vec_safe_push (params, arg1);
+
+
+	    if (fcode == RS6000_OVLD_VEC_ADDEC)
+	      as_builtin = rs6000_builtin_decls_x[RS6000_OVLD_VEC_ADD];
+	    else
+	      as_builtin = rs6000_builtin_decls_x[RS6000_OVLD_VEC_SUB];
+
+	    tree call2 = altivec_resolve_new_overloaded_builtin (loc,
+								 as_builtin,
+								 params);
+	    tree const1 = build_int_cstu (TREE_TYPE (arg0_type), 1);
+	    tree ones_vector = build_vector_from_val (arg0_type, const1);
+	    tree and_expr = fold_build2_loc (loc, BIT_AND_EXPR, arg0_type,
+					     arg2, ones_vector);
+	    params = make_tree_vector ();
+	    vec_safe_push (params, call2);
+	    vec_safe_push (params, and_expr);
+	    call2 = altivec_resolve_new_overloaded_builtin (loc, as_c_builtin,
+							    params);
+	    params = make_tree_vector ();
+	    vec_safe_push (params, call1);
+	    vec_safe_push (params, call2);
+	    tree or_builtin = rs6000_builtin_decls_x[RS6000_OVLD_VEC_OR];
+	    return altivec_resolve_new_overloaded_builtin (loc, or_builtin,
+							   params);
+	    }
+	  /* For {un}signed __int128s use the vaddecuq/vsubbecuq
+	     instructions.  This occurs through normal processing.  */
+	  case E_TImode:
+	    break;
+
+	  /* Types other than {un}signed int and {un}signed __int128
+		are errors.  */
+	  default:
+	    goto bad;
+	}
+    }
+
+  /* For now treat vec_splats and vec_promote as the same.  */
+  if (fcode == RS6000_OVLD_VEC_SPLATS || fcode == RS6000_OVLD_VEC_PROMOTE)
+    {
+      tree type, arg;
+      int size;
+      int i;
+      bool unsigned_p;
+      vec<constructor_elt, va_gc> *vec;
+      const char *name;
+      name = fcode == RS6000_OVLD_VEC_SPLATS ? "vec_splats" : "vec_promote";
+
+      if (fcode == RS6000_OVLD_VEC_SPLATS && nargs != 1)
+	{
+	  error ("builtin %qs only accepts 1 argument", name);
+	  return error_mark_node;
+	}
+      if (fcode == RS6000_OVLD_VEC_PROMOTE && nargs != 2)
+	{
+	  error ("builtin %qs only accepts 2 arguments", name);
+	  return error_mark_node;
+	}
+      /* Ignore promote's element argument.  */
+      if (fcode == RS6000_OVLD_VEC_PROMOTE
+	  && !INTEGRAL_TYPE_P (TREE_TYPE ((*arglist)[1])))
+	goto bad;
+
+      arg = (*arglist)[0];
+      type = TREE_TYPE (arg);
+      if (!SCALAR_FLOAT_TYPE_P (type)
+	  && !INTEGRAL_TYPE_P (type))
+	goto bad;
+      unsigned_p = TYPE_UNSIGNED (type);
+      switch (TYPE_MODE (type))
+	{
+	  case E_TImode:
+	    type = (unsigned_p ? unsigned_V1TI_type_node : V1TI_type_node);
+	    size = 1;
+	    break;
+	  case E_DImode:
+	    type = (unsigned_p ? unsigned_V2DI_type_node : V2DI_type_node);
+	    size = 2;
+	    break;
+	  case E_SImode:
+	    type = (unsigned_p ? unsigned_V4SI_type_node : V4SI_type_node);
+	    size = 4;
+	    break;
+	  case E_HImode:
+	    type = (unsigned_p ? unsigned_V8HI_type_node : V8HI_type_node);
+	    size = 8;
+	    break;
+	  case E_QImode:
+	    type = (unsigned_p ? unsigned_V16QI_type_node : V16QI_type_node);
+	    size = 16;
+	    break;
+	  case E_SFmode:
+	    type = V4SF_type_node;
+	    size = 4;
+	    break;
+	  case E_DFmode:
+	    type = V2DF_type_node;
+	    size = 2;
+	    break;
+	  default:
+	    goto bad;
+	}
+      arg = save_expr (fold_convert (TREE_TYPE (type), arg));
+      vec_alloc (vec, size);
+      for (i = 0; i < size; i++)
+	{
+	  constructor_elt elt = {NULL_TREE, arg};
+	  vec->quick_push (elt);
+	}
+      return build_constructor (type, vec);
+    }
+
+  /* For now use pointer tricks to do the extraction, unless we are on VSX
+     extracting a double from a constant offset.  */
+  if (fcode == RS6000_OVLD_VEC_EXTRACT)
+    {
+      tree arg1;
+      tree arg1_type;
+      tree arg2;
+      tree arg1_inner_type;
+      tree decl, stmt;
+      tree innerptrtype;
+      machine_mode mode;
+
+      /* No second argument. */
+      if (nargs != 2)
+	{
+	  error ("builtin %qs only accepts 2 arguments", "vec_extract");
+	  return error_mark_node;
+	}
+
+      arg2 = (*arglist)[1];
+      arg1 = (*arglist)[0];
+      arg1_type = TREE_TYPE (arg1);
+
+      if (TREE_CODE (arg1_type) != VECTOR_TYPE)
+	goto bad;
+      if (!INTEGRAL_TYPE_P (TREE_TYPE (arg2)))
+	goto bad;
+
+      /* See if we can optimize vec_extracts with the current VSX instruction
+	 set.  */
+      mode = TYPE_MODE (arg1_type);
+      if (VECTOR_MEM_VSX_P (mode))
+
+	{
+	  tree call = NULL_TREE;
+	  int nunits = GET_MODE_NUNITS (mode);
+
+	  arg2 = fold_for_warn (arg2);
+
+	  /* If the second argument is an integer constant, generate
+	     the built-in code if we can.  We need 64-bit and direct
+	     move to extract the small integer vectors.  */
+	  if (TREE_CODE (arg2) == INTEGER_CST)
+	    {
+	      wide_int selector = wi::to_wide (arg2);
+	      selector = wi::umod_trunc (selector, nunits);
+	      arg2 = wide_int_to_tree (TREE_TYPE (arg2), selector);
+	      switch (mode)
+		{
+		default:
+		  break;
+
+		case E_V1TImode:
+		  call = rs6000_builtin_decls_x[RS6000_BIF_VEC_EXT_V1TI];
+		  break;
+
+		case E_V2DFmode:
+		  call = rs6000_builtin_decls_x[RS6000_BIF_VEC_EXT_V2DF];
+		  break;
+
+		case E_V2DImode:
+		  call = rs6000_builtin_decls_x[RS6000_BIF_VEC_EXT_V2DI];
+		  break;
+
+		case E_V4SFmode:
+		  call = rs6000_builtin_decls_x[RS6000_BIF_VEC_EXT_V4SF];
+		  break;
+
+		case E_V4SImode:
+		  if (TARGET_DIRECT_MOVE_64BIT)
+		    call = rs6000_builtin_decls_x[RS6000_BIF_VEC_EXT_V4SI];
+		  break;
+
+		case E_V8HImode:
+		  if (TARGET_DIRECT_MOVE_64BIT)
+		    call = rs6000_builtin_decls_x[RS6000_BIF_VEC_EXT_V8HI];
+		  break;
+
+		case E_V16QImode:
+		  if (TARGET_DIRECT_MOVE_64BIT)
+		    call = rs6000_builtin_decls_x[RS6000_BIF_VEC_EXT_V16QI];
+		  break;
+		}
+	    }
+
+	  /* If the second argument is variable, we can optimize it if we are
+	     generating 64-bit code on a machine with direct move.  */
+	  else if (TREE_CODE (arg2) != INTEGER_CST && TARGET_DIRECT_MOVE_64BIT)
+	    {
+	      switch (mode)
+		{
+		default:
+		  break;
+
+		case E_V2DFmode:
+		  call = rs6000_builtin_decls_x[RS6000_BIF_VEC_EXT_V2DF];
+		  break;
+
+		case E_V2DImode:
+		  call = rs6000_builtin_decls_x[RS6000_BIF_VEC_EXT_V2DI];
+		  break;
+
+		case E_V4SFmode:
+		  call = rs6000_builtin_decls_x[RS6000_BIF_VEC_EXT_V4SF];
+		  break;
+
+		case E_V4SImode:
+		  call = rs6000_builtin_decls_x[RS6000_BIF_VEC_EXT_V4SI];
+		  break;
+
+		case E_V8HImode:
+		  call = rs6000_builtin_decls_x[RS6000_BIF_VEC_EXT_V8HI];
+		  break;
+
+		case E_V16QImode:
+		  call = rs6000_builtin_decls_x[RS6000_BIF_VEC_EXT_V16QI];
+		  break;
+		}
+	    }
+
+	  if (call)
+	    {
+	      tree result = build_call_expr (call, 2, arg1, arg2);
+	      /* Coerce the result to vector element type.  May be no-op.  */
+	      arg1_inner_type = TREE_TYPE (arg1_type);
+	      result = fold_convert (arg1_inner_type, result);
+	      return result;
+	    }
+	}
+
+      /* Build *(((arg1_inner_type*)&(vector type){arg1})+arg2). */
+      arg1_inner_type = TREE_TYPE (arg1_type);
+      arg2 = build_binary_op (loc, BIT_AND_EXPR, arg2,
+			      build_int_cst (TREE_TYPE (arg2),
+					     TYPE_VECTOR_SUBPARTS (arg1_type)
+					     - 1), 0);
+      decl = build_decl (loc, VAR_DECL, NULL_TREE, arg1_type);
+      DECL_EXTERNAL (decl) = 0;
+      TREE_PUBLIC (decl) = 0;
+      DECL_CONTEXT (decl) = current_function_decl;
+      TREE_USED (decl) = 1;
+      TREE_TYPE (decl) = arg1_type;
+      TREE_READONLY (decl) = TYPE_READONLY (arg1_type);
+      if (c_dialect_cxx ())
+	{
+	  stmt = build4 (TARGET_EXPR, arg1_type, decl, arg1,
+			 NULL_TREE, NULL_TREE);
+	  SET_EXPR_LOCATION (stmt, loc);
+	}
+      else
+	{
+	  DECL_INITIAL (decl) = arg1;
+	  stmt = build1 (DECL_EXPR, arg1_type, decl);
+	  TREE_ADDRESSABLE (decl) = 1;
+	  SET_EXPR_LOCATION (stmt, loc);
+	  stmt = build1 (COMPOUND_LITERAL_EXPR, arg1_type, stmt);
+	}
+
+      innerptrtype = build_pointer_type (arg1_inner_type);
+
+      stmt = build_unary_op (loc, ADDR_EXPR, stmt, 0);
+      stmt = convert (innerptrtype, stmt);
+      stmt = build_binary_op (loc, PLUS_EXPR, stmt, arg2, 1);
+      stmt = build_indirect_ref (loc, stmt, RO_NULL);
+
+      /* PR83660: We mark this as having side effects so that
+	 downstream in fold_build_cleanup_point_expr () it will get a
+	 CLEANUP_POINT_EXPR.  If it does not we can run into an ICE
+	 later in gimplify_cleanup_point_expr ().  Potentially this
+	 causes missed optimization because there actually is no side
+	 effect.  */
+      if (c_dialect_cxx ())
+	TREE_SIDE_EFFECTS (stmt) = 1;
+
+      return stmt;
+    }
+
+  /* For now use pointer tricks to do the insertion, unless we are on VSX
+     inserting a double to a constant offset..  */
+  if (fcode == RS6000_OVLD_VEC_INSERT)
+    {
+      tree arg0;
+      tree arg1;
+      tree arg2;
+      tree arg1_type;
+      tree decl, stmt;
+      machine_mode mode;
+
+      /* No second or third arguments. */
+      if (nargs != 3)
+	{
+	  error ("builtin %qs only accepts 3 arguments", "vec_insert");
+	  return error_mark_node;
+	}
+
+      arg0 = (*arglist)[0];
+      arg1 = (*arglist)[1];
+      arg1_type = TREE_TYPE (arg1);
+      arg2 = fold_for_warn ((*arglist)[2]);
+
+      if (TREE_CODE (arg1_type) != VECTOR_TYPE)
+	goto bad;
+      if (!INTEGRAL_TYPE_P (TREE_TYPE (arg2)))
+	goto bad;
+
+      /* If we can use the VSX xxpermdi instruction, use that for insert.  */
+      mode = TYPE_MODE (arg1_type);
+      if ((mode == V2DFmode || mode == V2DImode) && VECTOR_UNIT_VSX_P (mode)
+	  && TREE_CODE (arg2) == INTEGER_CST)
+	{
+	  wide_int selector = wi::to_wide (arg2);
+	  selector = wi::umod_trunc (selector, 2);
+	  tree call = NULL_TREE;
+
+	  arg2 = wide_int_to_tree (TREE_TYPE (arg2), selector);
+	  if (mode == V2DFmode)
+	    call = rs6000_builtin_decls_x[RS6000_BIF_VEC_SET_V2DF];
+	  else if (mode == V2DImode)
+	    call = rs6000_builtin_decls_x[RS6000_BIF_VEC_SET_V2DI];
+
+	  /* Note, __builtin_vec_insert_<xxx> has vector and scalar types
+	     reversed.  */
+	  if (call)
+	    return build_call_expr (call, 3, arg1, arg0, arg2);
+	}
+      else if (mode == V1TImode && VECTOR_UNIT_VSX_P (mode)
+	       && TREE_CODE (arg2) == INTEGER_CST)
+	{
+	  tree call = rs6000_builtin_decls_x[RS6000_BIF_VEC_SET_V1TI];
+	  wide_int selector = wi::zero(32);
+
+	  arg2 = wide_int_to_tree (TREE_TYPE (arg2), selector);
+	  /* Note, __builtin_vec_insert_<xxx> has vector and scalar types
+	     reversed.  */
+	  return build_call_expr (call, 3, arg1, arg0, arg2);
+	}
+
+      /* Build *(((arg1_inner_type*)&(vector type){arg1})+arg2) = arg0 with
+	 VIEW_CONVERT_EXPR.  i.e.:
+	 D.3192 = v1;
+	 _1 = n & 3;
+	 VIEW_CONVERT_EXPR<int[4]>(D.3192)[_1] = i;
+	 v1 = D.3192;
+	 D.3194 = v1;  */
+      if (TYPE_VECTOR_SUBPARTS (arg1_type) == 1)
+	arg2 = build_int_cst (TREE_TYPE (arg2), 0);
+      else
+	arg2 = build_binary_op (loc, BIT_AND_EXPR, arg2,
+				build_int_cst (TREE_TYPE (arg2),
+					       TYPE_VECTOR_SUBPARTS (arg1_type)
+					       - 1), 0);
+      decl = build_decl (loc, VAR_DECL, NULL_TREE, arg1_type);
+      DECL_EXTERNAL (decl) = 0;
+      TREE_PUBLIC (decl) = 0;
+      DECL_CONTEXT (decl) = current_function_decl;
+      TREE_USED (decl) = 1;
+      TREE_TYPE (decl) = arg1_type;
+      TREE_READONLY (decl) = TYPE_READONLY (arg1_type);
+      TREE_ADDRESSABLE (decl) = 1;
+      if (c_dialect_cxx ())
+	{
+	  stmt = build4 (TARGET_EXPR, arg1_type, decl, arg1,
+			 NULL_TREE, NULL_TREE);
+	  SET_EXPR_LOCATION (stmt, loc);
+	}
+      else
+	{
+	  DECL_INITIAL (decl) = arg1;
+	  stmt = build1 (DECL_EXPR, arg1_type, decl);
+	  SET_EXPR_LOCATION (stmt, loc);
+	  stmt = build1 (COMPOUND_LITERAL_EXPR, arg1_type, stmt);
+	}
+
+      if (TARGET_VSX)
+	{
+	  stmt = build_array_ref (loc, stmt, arg2);
+	  stmt = fold_build2 (MODIFY_EXPR, TREE_TYPE (arg0), stmt,
+			      convert (TREE_TYPE (stmt), arg0));
+	  stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
+	}
+      else
+	{
+	  tree arg1_inner_type;
+	  tree innerptrtype;
+	  arg1_inner_type = TREE_TYPE (arg1_type);
+	  innerptrtype = build_pointer_type (arg1_inner_type);
+
+	  stmt = build_unary_op (loc, ADDR_EXPR, stmt, 0);
+	  stmt = convert (innerptrtype, stmt);
+	  stmt = build_binary_op (loc, PLUS_EXPR, stmt, arg2, 1);
+	  stmt = build_indirect_ref (loc, stmt, RO_NULL);
+	  stmt = build2 (MODIFY_EXPR, TREE_TYPE (stmt), stmt,
+			 convert (TREE_TYPE (stmt), arg0));
+	  stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
+	}
+      return stmt;
+    }
+
+  for (n = 0;
+       !VOID_TYPE_P (TREE_VALUE (fnargs)) && n < nargs;
+       fnargs = TREE_CHAIN (fnargs), n++)
+    {
+      tree decl_type = TREE_VALUE (fnargs);
+      tree arg = (*arglist)[n];
+      tree type;
+
+      if (arg == error_mark_node)
+	return error_mark_node;
+
+      if (n >= MAX_OVLD_ARGS)
+	abort ();
+
+      arg = default_conversion (arg);
+
+      /* The C++ front-end converts float * to const void * using
+	 NOP_EXPR<const void *> (NOP_EXPR<void *> (x)).  */
+      type = TREE_TYPE (arg);
+      if (POINTER_TYPE_P (type)
+	  && TREE_CODE (arg) == NOP_EXPR
+	  && lang_hooks.types_compatible_p (TREE_TYPE (arg),
+					    const_ptr_type_node)
+	  && lang_hooks.types_compatible_p (TREE_TYPE (TREE_OPERAND (arg, 0)),
+					    ptr_type_node))
+	{
+	  arg = TREE_OPERAND (arg, 0);
+	  type = TREE_TYPE (arg);
+	}
+
+      /* Remove the const from the pointers to simplify the overload
+	 matching further down.  */
+      if (POINTER_TYPE_P (decl_type)
+	  && POINTER_TYPE_P (type)
+	  && TYPE_QUALS (TREE_TYPE (type)) != 0)
+	{
+	  if (TYPE_READONLY (TREE_TYPE (type))
+	      && !TYPE_READONLY (TREE_TYPE (decl_type)))
+	    warning (0, "passing argument %d of %qE discards qualifiers from "
+		     "pointer target type", n + 1, fndecl);
+	  type = build_pointer_type (build_qualified_type (TREE_TYPE (type),
+							   0));
+	  arg = fold_convert (type, arg);
+	}
+
+      /* For RS6000_OVLD_VEC_LXVL, convert any const * to its non constant
+	 equivalent to simplify the overload matching below.  */
+      if (fcode == RS6000_OVLD_VEC_LXVL)
+	{
+	  if (POINTER_TYPE_P (type)
+	      && TYPE_READONLY (TREE_TYPE (type)))
+	    {
+	      type = build_pointer_type (build_qualified_type (
+						TREE_TYPE (type),0));
+	      arg = fold_convert (type, arg);
+	    }
+	}
+
+      args[n] = arg;
+      types[n] = type;
+    }
+
+  /* If the number of arguments did not match the prototype, return NULL
+     and the generic code will issue the appropriate error message.  */
+  if (!VOID_TYPE_P (TREE_VALUE (fnargs)) || n < nargs)
+    return NULL;
+
+  if (fcode == RS6000_OVLD_VEC_STEP)
+    {
+      if (TREE_CODE (types[0]) != VECTOR_TYPE)
+	goto bad;
+
+      return build_int_cst (NULL_TREE, TYPE_VECTOR_SUBPARTS (types[0]));
+    }
+
+  {
+    bool unsupported_builtin = false;
+    enum rs6000_gen_builtins overloaded_code;
+    bool supported = false;
+    ovlddata *instance = rs6000_overload_info[adj_fcode].first_instance;
+    gcc_assert (instance != NULL);
+
+    /* Need to special case __builtin_cmpb because the overloaded forms
+       of this function take (unsigned int, unsigned int) or (unsigned
+       long long int, unsigned long long int).  Since C conventions
+       allow the respective argument types to be implicitly coerced into
+       each other, the default handling does not provide adequate
+       discrimination between the desired forms of the function.  */
+    if (fcode == RS6000_OVLD_SCAL_CMPB)
+      {
+	machine_mode arg1_mode = TYPE_MODE (types[0]);
+	machine_mode arg2_mode = TYPE_MODE (types[1]);
+
+	if (nargs != 2)
+	  {
+	    error ("builtin %qs only accepts 2 arguments", "__builtin_cmpb");
+	    return error_mark_node;
+	  }
+
+	/* If any supplied arguments are wider than 32 bits, resolve to
+	   64-bit variant of built-in function.  */
+	if ((GET_MODE_PRECISION (arg1_mode) > 32)
+	    || (GET_MODE_PRECISION (arg2_mode) > 32))
+	  {
+	    /* Assure all argument and result types are compatible with
+	       the built-in function represented by RS6000_BIF_CMPB.  */
+	    overloaded_code = RS6000_BIF_CMPB;
+	  }
+	else
+	  {
+	    /* Assure all argument and result types are compatible with
+	       the built-in function represented by RS6000_BIF_CMPB_32.  */
+	    overloaded_code = RS6000_BIF_CMPB_32;
+	  }
+
+	while (instance && instance->bifid != overloaded_code)
+	  instance = instance->next;
+
+	gcc_assert (instance != NULL);
+	tree fntype = rs6000_builtin_info_x[instance->bifid].fntype;
+	tree parmtype0 = TREE_VALUE (TYPE_ARG_TYPES (fntype));
+	tree parmtype1 = TREE_VALUE (TREE_CHAIN (TYPE_ARG_TYPES (fntype)));
+
+	if (rs6000_new_builtin_type_compatible (types[0], parmtype0)
+	    && rs6000_new_builtin_type_compatible (types[1], parmtype1))
+	  {
+	    if (rs6000_builtin_decl (instance->bifid, false) != error_mark_node
+		&& rs6000_new_builtin_is_supported (instance->bifid))
+	      {
+		tree ret_type = TREE_TYPE (instance->fntype);
+		return altivec_build_new_resolved_builtin (args, n, fntype,
+							   ret_type,
+							   instance->bifid,
+							   fcode);
+	      }
+	    else
+	      unsupported_builtin = true;
+	  }
+      }
+    else if (fcode == RS6000_OVLD_VEC_VSIE)
+      {
+	machine_mode arg1_mode = TYPE_MODE (types[0]);
+
+	if (nargs != 2)
+	  {
+	    error ("builtin %qs only accepts 2 arguments",
+		   "scalar_insert_exp");
+	    return error_mark_node;
+	  }
+
+	/* If supplied first argument is wider than 64 bits, resolve to
+	   128-bit variant of built-in function.  */
+	if (GET_MODE_PRECISION (arg1_mode) > 64)
+	  {
+	    /* If first argument is of float variety, choose variant
+	       that expects __ieee128 argument.  Otherwise, expect
+	       __int128 argument.  */
+	    if (GET_MODE_CLASS (arg1_mode) == MODE_FLOAT)
+	      overloaded_code = RS6000_BIF_VSIEQPF;
+	    else
+	      overloaded_code = RS6000_BIF_VSIEQP;
+	  }
+	else
+	  {
+	    /* If first argument is of float variety, choose variant
+	       that expects double argument.  Otherwise, expect
+	       long long int argument.  */
+	    if (GET_MODE_CLASS (arg1_mode) == MODE_FLOAT)
+	      overloaded_code = RS6000_BIF_VSIEDPF;
+	    else
+	      overloaded_code = RS6000_BIF_VSIEDP;
+	  }
+
+	while (instance && instance->bifid != overloaded_code)
+	  instance = instance->next;
+
+	gcc_assert (instance != NULL);
+	tree fntype = rs6000_builtin_info_x[instance->bifid].fntype;
+	tree parmtype0 = TREE_VALUE (TYPE_ARG_TYPES (fntype));
+	tree parmtype1 = TREE_VALUE (TREE_CHAIN (TYPE_ARG_TYPES (fntype)));
+
+	if (rs6000_new_builtin_type_compatible (types[0], parmtype0)
+	    && rs6000_new_builtin_type_compatible (types[1], parmtype1))
+	  {
+	    if (rs6000_builtin_decl (instance->bifid, false) != error_mark_node
+		&& rs6000_new_builtin_is_supported (instance->bifid))
+	      {
+		tree ret_type = TREE_TYPE (instance->fntype);
+		return altivec_build_new_resolved_builtin (args, n, fntype,
+							   ret_type,
+							   instance->bifid,
+							   fcode);
+	      }
+	    else
+	      unsupported_builtin = true;
+	  }
+      }
+    else
+      {
+	/* Functions with no arguments can have only one overloaded
+	   instance.  */
+	gcc_assert (n > 0 || !instance->next);
+
+	for (; instance != NULL; instance = instance->next)
+	  {
+	    bool mismatch = false;
+	    tree nextparm = TYPE_ARG_TYPES (instance->fntype);
+
+	    for (unsigned int arg_i = 0;
+		 arg_i < nargs && nextparm != NULL;
+		 arg_i++)
+	      {
+		tree parmtype = TREE_VALUE (nextparm);
+		if (!rs6000_new_builtin_type_compatible (types[arg_i],
+							 parmtype))
+		  {
+		    mismatch = true;
+		    break;
+		  }
+		nextparm = TREE_CHAIN (nextparm);
+	      }
+
+	    if (mismatch)
+	      continue;
+
+	    supported = rs6000_new_builtin_is_supported (instance->bifid);
+	    if (rs6000_builtin_decl (instance->bifid, false) != error_mark_node
+		&& supported)
+	      {
+		tree fntype = rs6000_builtin_info_x[instance->bifid].fntype;
+		tree ret_type = TREE_TYPE (instance->fntype);
+		return altivec_build_new_resolved_builtin (args, n, fntype,
+							   ret_type,
+							   instance->bifid,
+							   fcode);
+	      }
+	    else
+	      {
+		unsupported_builtin = true;
+		break;
+	      }
+	  }
+      }
+
+    if (unsupported_builtin)
+      {
+	const char *name = rs6000_overload_info[adj_fcode].ovld_name;
+	if (!supported)
+	  {
+	    const char *internal_name
+	      = rs6000_builtin_info_x[instance->bifid].bifname;
+	    /* An error message making reference to the name of the
+	       non-overloaded function has already been issued.  Add
+	       clarification of the previous message.  */
+	    rich_location richloc (line_table, input_location);
+	    inform (&richloc, "builtin %qs requires builtin %qs",
+		    name, internal_name);
+	  }
+	else
+	  error ("%qs is not supported in this compiler configuration", name);
+	/* If an error-representing  result tree was returned from
+	   altivec_build_resolved_builtin above, use it.  */
+	/*
+	return (result != NULL) ? result : error_mark_node;
+	*/
+	return error_mark_node;
+      }
+  }
+ bad:
+  {
+    const char *name = rs6000_overload_info[adj_fcode].ovld_name;
+    error ("invalid parameter combination for AltiVec intrinsic %qs", name);
+    return error_mark_node;
+  }
+}
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index e8625d17d18..2c68aa3580c 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -12971,6 +12971,59 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
   return false;
 }
 
+/* Check whether a builtin function is supported in this target
+   configuration.  */
+bool
+rs6000_new_builtin_is_supported (enum rs6000_gen_builtins fncode)
+{
+  switch (rs6000_builtin_info_x[(size_t) fncode].enable)
+    {
+    default:
+      gcc_unreachable ();
+    case ENB_ALWAYS:
+      return true;
+    case ENB_P5:
+      return TARGET_POPCNTB;
+    case ENB_P6:
+      return TARGET_CMPB;
+    case ENB_ALTIVEC:
+      return TARGET_ALTIVEC;
+    case ENB_CELL:
+      return TARGET_ALTIVEC && rs6000_cpu == PROCESSOR_CELL;
+    case ENB_VSX:
+      return TARGET_VSX;
+    case ENB_P7:
+      return TARGET_POPCNTD;
+    case ENB_P7_64:
+      return TARGET_POPCNTD && TARGET_POWERPC64;
+    case ENB_P8:
+      return TARGET_DIRECT_MOVE;
+    case ENB_P8V:
+      return TARGET_P8_VECTOR;
+    case ENB_P9:
+      return TARGET_MODULO;
+    case ENB_P9_64:
+      return TARGET_MODULO && TARGET_POWERPC64;
+    case ENB_P9V:
+      return TARGET_P9_VECTOR;
+    case ENB_IEEE128_HW:
+      return TARGET_FLOAT128_HW;
+    case ENB_DFP:
+      return TARGET_DFP;
+    case ENB_CRYPTO:
+      return TARGET_CRYPTO;
+    case ENB_HTM:
+      return TARGET_HTM;
+    case ENB_P10:
+      return TARGET_POWER10;
+    case ENB_P10_64:
+      return TARGET_POWER10 && TARGET_POWERPC64;
+    case ENB_MMA:
+      return TARGET_MMA;
+    }
+  gcc_unreachable ();
+}
+
 /* Expand an expression EXP that calls a built-in function,
    with result going to TARGET if that's convenient
    (and in mode MODE if that's convenient).
diff --git a/gcc/config/rs6000/rs6000-gen-builtins.c b/gcc/config/rs6000/rs6000-gen-builtins.c
index f3d6156400a..f65932e1cd5 100644
--- a/gcc/config/rs6000/rs6000-gen-builtins.c
+++ b/gcc/config/rs6000/rs6000-gen-builtins.c
@@ -2314,7 +2314,7 @@ write_decls (void)
 
   fprintf (header_file, "extern void rs6000_init_generated_builtins ();\n\n");
   fprintf (header_file,
-	   "extern bool rs6000_new_builtin_is_supported_p "
+	   "extern bool rs6000_new_builtin_is_supported "
 	   "(rs6000_gen_builtins);\n");
   fprintf (header_file,
 	   "extern tree rs6000_builtin_decl (unsigned, "
-- 
2.27.0


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 02/18] rs6000: Move __builtin_mffsl to the [always] stanza
  2021-09-01 16:13 [PATCHv5 00/18] Replace the Power target-specific builtin machinery Bill Schmidt
  2021-09-01 16:13 ` [PATCH 01/18] rs6000: Handle overloads during program parsing Bill Schmidt
@ 2021-09-01 16:13 ` Bill Schmidt
  2021-09-13 17:53   ` will schmidt
  2021-09-16 22:52   ` Segher Boessenkool
  2021-09-01 16:13 ` [PATCH 03/18] rs6000: Handle gimple folding of target built-ins Bill Schmidt
                   ` (16 subsequent siblings)
  18 siblings, 2 replies; 52+ messages in thread
From: Bill Schmidt @ 2021-09-01 16:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc

I over-restricted use of __builtin_mffsl, since I was unaware that it
automatically uses mffs when mffsl is not available.  Paul Clarke pointed
this out in discussion of his SSE 4.1 compatibility patches.

2021-08-31  Bill Schmidt  <wschmidt@linux.ibm.com>

gcc/
	* config/rs6000/rs6000-call.c (__builtin_mffsl): Move from [power9]
	to [always].
---
 gcc/config/rs6000/rs6000-builtin-new.def | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin-new.def b/gcc/config/rs6000/rs6000-builtin-new.def
index 6a28d5189f8..a8c6b9e988f 100644
--- a/gcc/config/rs6000/rs6000-builtin-new.def
+++ b/gcc/config/rs6000/rs6000-builtin-new.def
@@ -208,6 +208,12 @@
   double __builtin_mffs ();
     MFFS rs6000_mffs {}
 
+; Although the mffsl instruction is only available on POWER9 and later
+; processors, this builtin automatically falls back to mffs on older
+; platforms.  Thus it appears here in the [always] stanza.
+  double __builtin_mffsl ();
+    MFFSL rs6000_mffsl {}
+
 ; This thing really assumes long double == __ibm128, and I'm told it has
 ; been used as such within libgcc.  Given that __builtin_pack_ibm128
 ; exists for the same purpose, this should really not be used at all.
@@ -2784,9 +2790,6 @@
   signed long long __builtin_darn_raw ();
     DARN_RAW darn_raw {}
 
-  double __builtin_mffsl ();
-    MFFSL rs6000_mffsl {}
-
   const signed int __builtin_dtstsfi_eq_dd (const int<6>, _Decimal64);
     TSTSFI_EQ_DD dfptstsfi_eq_dd {}
 
-- 
2.27.0


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 03/18] rs6000: Handle gimple folding of target built-ins
  2021-09-01 16:13 [PATCHv5 00/18] Replace the Power target-specific builtin machinery Bill Schmidt
  2021-09-01 16:13 ` [PATCH 01/18] rs6000: Handle overloads during program parsing Bill Schmidt
  2021-09-01 16:13 ` [PATCH 02/18] rs6000: Move __builtin_mffsl to the [always] stanza Bill Schmidt
@ 2021-09-01 16:13 ` Bill Schmidt
  2021-09-13 18:42   ` will schmidt
  2021-09-16 22:58   ` Segher Boessenkool
  2021-09-01 16:13 ` [PATCH 04/18] rs6000: Handle some recent MMA builtin changes Bill Schmidt
                   ` (15 subsequent siblings)
  18 siblings, 2 replies; 52+ messages in thread
From: Bill Schmidt @ 2021-09-01 16:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc

This is another patch that looks bigger than it really is.  Because we
have a new namespace for the builtins, allowing us to have both the old
and new builtin infrastructure supported at once, we need versions of
these functions that use the new builtin namespace.  Otherwise the code is
unchanged.

2021-08-31  Bill Schmidt  <wschmidt@linux.ibm.com>

gcc/
	* config/rs6000/rs6000-call.c (rs6000_gimple_fold_new_builtin):
	New forward decl.
	(rs6000_gimple_fold_builtin): Call rs6000_gimple_fold_new_builtin.
	(rs6000_new_builtin_valid_without_lhs): New function.
	(rs6000_gimple_fold_new_mma_builtin): Likewise.
	(rs6000_gimple_fold_new_builtin): Likewise.
---
 gcc/config/rs6000/rs6000-call.c | 1165 +++++++++++++++++++++++++++++++
 1 file changed, 1165 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 2c68aa3580c..eae4e15df1e 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -190,6 +190,7 @@ static tree builtin_function_type (machine_mode, machine_mode,
 static void rs6000_common_init_builtins (void);
 static void htm_init_builtins (void);
 static void mma_init_builtins (void);
+static bool rs6000_gimple_fold_new_builtin (gimple_stmt_iterator *gsi);
 
 
 /* Hash table to keep track of the argument types for builtin functions.  */
@@ -12024,6 +12025,9 @@ rs6000_gimple_fold_mma_builtin (gimple_stmt_iterator *gsi)
 bool
 rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
 {
+  if (new_builtins_are_live)
+    return rs6000_gimple_fold_new_builtin (gsi);
+
   gimple *stmt = gsi_stmt (*gsi);
   tree fndecl = gimple_call_fndecl (stmt);
   gcc_checking_assert (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_MD);
@@ -12971,6 +12975,35 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
   return false;
 }
 
+/*  Helper function to sort out which built-ins may be valid without having
+    a LHS.  */
+static bool
+rs6000_new_builtin_valid_without_lhs (enum rs6000_gen_builtins fn_code,
+				      tree fndecl)
+{
+  if (TREE_TYPE (TREE_TYPE (fndecl)) == void_type_node)
+    return true;
+
+  switch (fn_code)
+    {
+    case RS6000_BIF_STVX_V16QI:
+    case RS6000_BIF_STVX_V8HI:
+    case RS6000_BIF_STVX_V4SI:
+    case RS6000_BIF_STVX_V4SF:
+    case RS6000_BIF_STVX_V2DI:
+    case RS6000_BIF_STVX_V2DF:
+    case RS6000_BIF_STXVW4X_V16QI:
+    case RS6000_BIF_STXVW4X_V8HI:
+    case RS6000_BIF_STXVW4X_V4SF:
+    case RS6000_BIF_STXVW4X_V4SI:
+    case RS6000_BIF_STXVD2X_V2DF:
+    case RS6000_BIF_STXVD2X_V2DI:
+      return true;
+    default:
+      return false;
+    }
+}
+
 /* Check whether a builtin function is supported in this target
    configuration.  */
 bool
@@ -13024,6 +13057,1138 @@ rs6000_new_builtin_is_supported (enum rs6000_gen_builtins fncode)
   gcc_unreachable ();
 }
 
+/* Expand the MMA built-ins early, so that we can convert the pass-by-reference
+   __vector_quad arguments into pass-by-value arguments, leading to more
+   efficient code generation.  */
+static bool
+rs6000_gimple_fold_new_mma_builtin (gimple_stmt_iterator *gsi,
+				    rs6000_gen_builtins fn_code)
+{
+  gimple *stmt = gsi_stmt (*gsi);
+  size_t fncode = (size_t) fn_code;
+
+  if (!bif_is_mma (rs6000_builtin_info_x[fncode]))
+    return false;
+
+  /* Each call that can be gimple-expanded has an associated built-in
+     function that it will expand into.  If this one doesn't, we have
+     already expanded it!  */
+  if (rs6000_builtin_info_x[fncode].assoc_bif == RS6000_BIF_NONE)
+    return false;
+
+  bifdata *bd = &rs6000_builtin_info_x[fncode];
+  unsigned nopnds = bd->nargs;
+  gimple_seq new_seq = NULL;
+  gimple *new_call;
+  tree new_decl;
+
+  /* Compatibility built-ins; we used to call these
+     __builtin_mma_{dis,}assemble_pair, but now we call them
+     __builtin_vsx_{dis,}assemble_pair.  Handle the old versions.  */
+  if (fncode == RS6000_BIF_ASSEMBLE_PAIR)
+    fncode = RS6000_BIF_ASSEMBLE_PAIR_V;
+  else if (fncode == RS6000_BIF_DISASSEMBLE_PAIR)
+    fncode = RS6000_BIF_DISASSEMBLE_PAIR_V;
+
+  if (fncode == RS6000_BIF_DISASSEMBLE_ACC
+      || fncode == RS6000_BIF_DISASSEMBLE_PAIR_V)
+    {
+      /* This is an MMA disassemble built-in function.  */
+      push_gimplify_context (true);
+      unsigned nvec = (fncode == RS6000_BIF_DISASSEMBLE_ACC) ? 4 : 2;
+      tree dst_ptr = gimple_call_arg (stmt, 0);
+      tree src_ptr = gimple_call_arg (stmt, 1);
+      tree src_type = TREE_TYPE (src_ptr);
+      tree src = create_tmp_reg_or_ssa_name (TREE_TYPE (src_type));
+      gimplify_assign (src, build_simple_mem_ref (src_ptr), &new_seq);
+
+      /* If we are not disassembling an accumulator/pair or our destination is
+	 another accumulator/pair, then just copy the entire thing as is.  */
+      if ((fncode == RS6000_BIF_DISASSEMBLE_ACC
+	   && TREE_TYPE (TREE_TYPE (dst_ptr)) == vector_quad_type_node)
+	  || (fncode == RS6000_BIF_DISASSEMBLE_PAIR_V
+	      && TREE_TYPE (TREE_TYPE (dst_ptr)) == vector_pair_type_node))
+	{
+	  tree dst = build_simple_mem_ref (build1 (VIEW_CONVERT_EXPR,
+						   src_type, dst_ptr));
+	  gimplify_assign (dst, src, &new_seq);
+	  pop_gimplify_context (NULL);
+	  gsi_replace_with_seq (gsi, new_seq, true);
+	  return true;
+	}
+
+      /* If we're disassembling an accumulator into a different type, we need
+	 to emit a xxmfacc instruction now, since we cannot do it later.  */
+      if (fncode == RS6000_BIF_DISASSEMBLE_ACC)
+	{
+	  new_decl = rs6000_builtin_decls_x[RS6000_BIF_XXMFACC_INTERNAL];
+	  new_call = gimple_build_call (new_decl, 1, src);
+	  src = create_tmp_reg_or_ssa_name (vector_quad_type_node);
+	  gimple_call_set_lhs (new_call, src);
+	  gimple_seq_add_stmt (&new_seq, new_call);
+	}
+
+      /* Copy the accumulator/pair vector by vector.  */
+      new_decl
+	= rs6000_builtin_decls_x[rs6000_builtin_info_x[fncode].assoc_bif];
+      tree dst_type = build_pointer_type_for_mode (unsigned_V16QI_type_node,
+						   ptr_mode, true);
+      tree dst_base = build1 (VIEW_CONVERT_EXPR, dst_type, dst_ptr);
+      for (unsigned i = 0; i < nvec; i++)
+	{
+	  unsigned index = WORDS_BIG_ENDIAN ? i : nvec - 1 - i;
+	  tree dst = build2 (MEM_REF, unsigned_V16QI_type_node, dst_base,
+			     build_int_cst (dst_type, index * 16));
+	  tree dstssa = create_tmp_reg_or_ssa_name (unsigned_V16QI_type_node);
+	  new_call = gimple_build_call (new_decl, 2, src,
+					build_int_cstu (uint16_type_node, i));
+	  gimple_call_set_lhs (new_call, dstssa);
+	  gimple_seq_add_stmt (&new_seq, new_call);
+	  gimplify_assign (dst, dstssa, &new_seq);
+	}
+      pop_gimplify_context (NULL);
+      gsi_replace_with_seq (gsi, new_seq, true);
+      return true;
+    }
+
+  /* Convert this built-in into an internal version that uses pass-by-value
+     arguments.  The internal built-in is found in the assoc_bif field.  */
+  new_decl = rs6000_builtin_decls_x[rs6000_builtin_info_x[fncode].assoc_bif];
+  tree lhs, op[MAX_MMA_OPERANDS];
+  tree acc = gimple_call_arg (stmt, 0);
+  push_gimplify_context (true);
+
+  if (bif_is_quad (*bd))
+    {
+      /* This built-in has a pass-by-reference accumulator input, so load it
+	 into a temporary accumulator for use as a pass-by-value input.  */
+      op[0] = create_tmp_reg_or_ssa_name (vector_quad_type_node);
+      for (unsigned i = 1; i < nopnds; i++)
+	op[i] = gimple_call_arg (stmt, i);
+      gimplify_assign (op[0], build_simple_mem_ref (acc), &new_seq);
+    }
+  else
+    {
+      /* This built-in does not use its pass-by-reference accumulator argument
+	 as an input argument, so remove it from the input list.  */
+      nopnds--;
+      for (unsigned i = 0; i < nopnds; i++)
+	op[i] = gimple_call_arg (stmt, i + 1);
+    }
+
+  switch (nopnds)
+    {
+    case 0:
+      new_call = gimple_build_call (new_decl, 0);
+      break;
+    case 1:
+      new_call = gimple_build_call (new_decl, 1, op[0]);
+      break;
+    case 2:
+      new_call = gimple_build_call (new_decl, 2, op[0], op[1]);
+      break;
+    case 3:
+      new_call = gimple_build_call (new_decl, 3, op[0], op[1], op[2]);
+      break;
+    case 4:
+      new_call = gimple_build_call (new_decl, 4, op[0], op[1], op[2], op[3]);
+      break;
+    case 5:
+      new_call = gimple_build_call (new_decl, 5, op[0], op[1], op[2], op[3],
+				    op[4]);
+      break;
+    case 6:
+      new_call = gimple_build_call (new_decl, 6, op[0], op[1], op[2], op[3],
+				    op[4], op[5]);
+      break;
+    case 7:
+      new_call = gimple_build_call (new_decl, 7, op[0], op[1], op[2], op[3],
+				    op[4], op[5], op[6]);
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  if (fncode == RS6000_BIF_BUILD_PAIR || fncode == RS6000_BIF_ASSEMBLE_PAIR_V)
+    lhs = create_tmp_reg_or_ssa_name (vector_pair_type_node);
+  else
+    lhs = create_tmp_reg_or_ssa_name (vector_quad_type_node);
+  gimple_call_set_lhs (new_call, lhs);
+  gimple_seq_add_stmt (&new_seq, new_call);
+  gimplify_assign (build_simple_mem_ref (acc), lhs, &new_seq);
+  pop_gimplify_context (NULL);
+  gsi_replace_with_seq (gsi, new_seq, true);
+
+  return true;
+}
+
+/* Fold a machine-dependent built-in in GIMPLE.  (For folding into
+   a constant, use rs6000_fold_builtin.)  */
+static bool
+rs6000_gimple_fold_new_builtin (gimple_stmt_iterator *gsi)
+{
+  gimple *stmt = gsi_stmt (*gsi);
+  tree fndecl = gimple_call_fndecl (stmt);
+  gcc_checking_assert (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_MD);
+  enum rs6000_gen_builtins fn_code
+    = (enum rs6000_gen_builtins) DECL_MD_FUNCTION_CODE (fndecl);
+  tree arg0, arg1, lhs, temp;
+  enum tree_code bcode;
+  gimple *g;
+
+  size_t uns_fncode = (size_t) fn_code;
+  enum insn_code icode = rs6000_builtin_info_x[uns_fncode].icode;
+  const char *fn_name1 = rs6000_builtin_info_x[uns_fncode].bifname;
+  const char *fn_name2 = (icode != CODE_FOR_nothing)
+			  ? get_insn_name ((int) icode)
+			  : "nothing";
+
+  if (TARGET_DEBUG_BUILTIN)
+      fprintf (stderr, "rs6000_gimple_fold_new_builtin %d %s %s\n",
+	       fn_code, fn_name1, fn_name2);
+
+  if (!rs6000_fold_gimple)
+    return false;
+
+  /* Prevent gimple folding for code that does not have a LHS, unless it is
+     allowed per the rs6000_new_builtin_valid_without_lhs helper function.  */
+  if (!gimple_call_lhs (stmt)
+      && !rs6000_new_builtin_valid_without_lhs (fn_code, fndecl))
+    return false;
+
+  /* Don't fold invalid builtins, let rs6000_expand_builtin diagnose it.  */
+  if (!rs6000_new_builtin_is_supported (fn_code))
+    return false;
+
+  if (rs6000_gimple_fold_new_mma_builtin (gsi, fn_code))
+    return true;
+
+  switch (fn_code)
+    {
+    /* Flavors of vec_add.  We deliberately don't expand
+       RS6000_BIF_VADDUQM as it gets lowered from V1TImode to
+       TImode, resulting in much poorer code generation.  */
+    case RS6000_BIF_VADDUBM:
+    case RS6000_BIF_VADDUHM:
+    case RS6000_BIF_VADDUWM:
+    case RS6000_BIF_VADDUDM:
+    case RS6000_BIF_VADDFP:
+    case RS6000_BIF_XVADDDP:
+    case RS6000_BIF_XVADDSP:
+      bcode = PLUS_EXPR;
+    do_binary:
+      arg0 = gimple_call_arg (stmt, 0);
+      arg1 = gimple_call_arg (stmt, 1);
+      lhs = gimple_call_lhs (stmt);
+      if (INTEGRAL_TYPE_P (TREE_TYPE (TREE_TYPE (lhs)))
+	  && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (TREE_TYPE (lhs))))
+	{
+	  /* Ensure the binary operation is performed in a type
+	     that wraps if it is integral type.  */
+	  gimple_seq stmts = NULL;
+	  tree type = unsigned_type_for (TREE_TYPE (lhs));
+	  tree uarg0 = gimple_build (&stmts, VIEW_CONVERT_EXPR,
+				     type, arg0);
+	  tree uarg1 = gimple_build (&stmts, VIEW_CONVERT_EXPR,
+				     type, arg1);
+	  tree res = gimple_build (&stmts, gimple_location (stmt), bcode,
+				   type, uarg0, uarg1);
+	  gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+	  g = gimple_build_assign (lhs, VIEW_CONVERT_EXPR,
+				   build1 (VIEW_CONVERT_EXPR,
+					   TREE_TYPE (lhs), res));
+	  gsi_replace (gsi, g, true);
+	  return true;
+	}
+      g = gimple_build_assign (lhs, bcode, arg0, arg1);
+      gimple_set_location (g, gimple_location (stmt));
+      gsi_replace (gsi, g, true);
+      return true;
+    /* Flavors of vec_sub.  We deliberately don't expand
+       P8V_BUILTIN_VSUBUQM. */
+    case RS6000_BIF_VSUBUBM:
+    case RS6000_BIF_VSUBUHM:
+    case RS6000_BIF_VSUBUWM:
+    case RS6000_BIF_VSUBUDM:
+    case RS6000_BIF_VSUBFP:
+    case RS6000_BIF_XVSUBDP:
+    case RS6000_BIF_XVSUBSP:
+      bcode = MINUS_EXPR;
+      goto do_binary;
+    case RS6000_BIF_XVMULSP:
+    case RS6000_BIF_XVMULDP:
+      arg0 = gimple_call_arg (stmt, 0);
+      arg1 = gimple_call_arg (stmt, 1);
+      lhs = gimple_call_lhs (stmt);
+      g = gimple_build_assign (lhs, MULT_EXPR, arg0, arg1);
+      gimple_set_location (g, gimple_location (stmt));
+      gsi_replace (gsi, g, true);
+      return true;
+    /* Even element flavors of vec_mul (signed). */
+    case RS6000_BIF_VMULESB:
+    case RS6000_BIF_VMULESH:
+    case RS6000_BIF_VMULESW:
+    /* Even element flavors of vec_mul (unsigned).  */
+    case RS6000_BIF_VMULEUB:
+    case RS6000_BIF_VMULEUH:
+    case RS6000_BIF_VMULEUW:
+      arg0 = gimple_call_arg (stmt, 0);
+      arg1 = gimple_call_arg (stmt, 1);
+      lhs = gimple_call_lhs (stmt);
+      g = gimple_build_assign (lhs, VEC_WIDEN_MULT_EVEN_EXPR, arg0, arg1);
+      gimple_set_location (g, gimple_location (stmt));
+      gsi_replace (gsi, g, true);
+      return true;
+    /* Odd element flavors of vec_mul (signed).  */
+    case RS6000_BIF_VMULOSB:
+    case RS6000_BIF_VMULOSH:
+    case RS6000_BIF_VMULOSW:
+    /* Odd element flavors of vec_mul (unsigned). */
+    case RS6000_BIF_VMULOUB:
+    case RS6000_BIF_VMULOUH:
+    case RS6000_BIF_VMULOUW:
+      arg0 = gimple_call_arg (stmt, 0);
+      arg1 = gimple_call_arg (stmt, 1);
+      lhs = gimple_call_lhs (stmt);
+      g = gimple_build_assign (lhs, VEC_WIDEN_MULT_ODD_EXPR, arg0, arg1);
+      gimple_set_location (g, gimple_location (stmt));
+      gsi_replace (gsi, g, true);
+      return true;
+    /* Flavors of vec_div (Integer).  */
+    case RS6000_BIF_DIV_V2DI:
+    case RS6000_BIF_UDIV_V2DI:
+      arg0 = gimple_call_arg (stmt, 0);
+      arg1 = gimple_call_arg (stmt, 1);
+      lhs = gimple_call_lhs (stmt);
+      g = gimple_build_assign (lhs, TRUNC_DIV_EXPR, arg0, arg1);
+      gimple_set_location (g, gimple_location (stmt));
+      gsi_replace (gsi, g, true);
+      return true;
+    /* Flavors of vec_div (Float).  */
+    case RS6000_BIF_XVDIVSP:
+    case RS6000_BIF_XVDIVDP:
+      arg0 = gimple_call_arg (stmt, 0);
+      arg1 = gimple_call_arg (stmt, 1);
+      lhs = gimple_call_lhs (stmt);
+      g = gimple_build_assign (lhs, RDIV_EXPR, arg0, arg1);
+      gimple_set_location (g, gimple_location (stmt));
+      gsi_replace (gsi, g, true);
+      return true;
+    /* Flavors of vec_and.  */
+    case RS6000_BIF_VAND_V16QI_UNS:
+    case RS6000_BIF_VAND_V16QI:
+    case RS6000_BIF_VAND_V8HI_UNS:
+    case RS6000_BIF_VAND_V8HI:
+    case RS6000_BIF_VAND_V4SI_UNS:
+    case RS6000_BIF_VAND_V4SI:
+    case RS6000_BIF_VAND_V2DI_UNS:
+    case RS6000_BIF_VAND_V2DI:
+    case RS6000_BIF_VAND_V4SF:
+    case RS6000_BIF_VAND_V2DF:
+      arg0 = gimple_call_arg (stmt, 0);
+      arg1 = gimple_call_arg (stmt, 1);
+      lhs = gimple_call_lhs (stmt);
+      g = gimple_build_assign (lhs, BIT_AND_EXPR, arg0, arg1);
+      gimple_set_location (g, gimple_location (stmt));
+      gsi_replace (gsi, g, true);
+      return true;
+    /* Flavors of vec_andc.  */
+    case RS6000_BIF_VANDC_V16QI_UNS:
+    case RS6000_BIF_VANDC_V16QI:
+    case RS6000_BIF_VANDC_V8HI_UNS:
+    case RS6000_BIF_VANDC_V8HI:
+    case RS6000_BIF_VANDC_V4SI_UNS:
+    case RS6000_BIF_VANDC_V4SI:
+    case RS6000_BIF_VANDC_V2DI_UNS:
+    case RS6000_BIF_VANDC_V2DI:
+    case RS6000_BIF_VANDC_V4SF:
+    case RS6000_BIF_VANDC_V2DF:
+      arg0 = gimple_call_arg (stmt, 0);
+      arg1 = gimple_call_arg (stmt, 1);
+      lhs = gimple_call_lhs (stmt);
+      temp = create_tmp_reg_or_ssa_name (TREE_TYPE (arg1));
+      g = gimple_build_assign (temp, BIT_NOT_EXPR, arg1);
+      gimple_set_location (g, gimple_location (stmt));
+      gsi_insert_before (gsi, g, GSI_SAME_STMT);
+      g = gimple_build_assign (lhs, BIT_AND_EXPR, arg0, temp);
+      gimple_set_location (g, gimple_location (stmt));
+      gsi_replace (gsi, g, true);
+      return true;
+    /* Flavors of vec_nand.  */
+    case RS6000_BIF_NAND_V16QI_UNS:
+    case RS6000_BIF_NAND_V16QI:
+    case RS6000_BIF_NAND_V8HI_UNS:
+    case RS6000_BIF_NAND_V8HI:
+    case RS6000_BIF_NAND_V4SI_UNS:
+    case RS6000_BIF_NAND_V4SI:
+    case RS6000_BIF_NAND_V2DI_UNS:
+    case RS6000_BIF_NAND_V2DI:
+    case RS6000_BIF_NAND_V4SF:
+    case RS6000_BIF_NAND_V2DF:
+      arg0 = gimple_call_arg (stmt, 0);
+      arg1 = gimple_call_arg (stmt, 1);
+      lhs = gimple_call_lhs (stmt);
+      temp = create_tmp_reg_or_ssa_name (TREE_TYPE (arg1));
+      g = gimple_build_assign (temp, BIT_AND_EXPR, arg0, arg1);
+      gimple_set_location (g, gimple_location (stmt));
+      gsi_insert_before (gsi, g, GSI_SAME_STMT);
+      g = gimple_build_assign (lhs, BIT_NOT_EXPR, temp);
+      gimple_set_location (g, gimple_location (stmt));
+      gsi_replace (gsi, g, true);
+      return true;
+    /* Flavors of vec_or.  */
+    case RS6000_BIF_VOR_V16QI_UNS:
+    case RS6000_BIF_VOR_V16QI:
+    case RS6000_BIF_VOR_V8HI_UNS:
+    case RS6000_BIF_VOR_V8HI:
+    case RS6000_BIF_VOR_V4SI_UNS:
+    case RS6000_BIF_VOR_V4SI:
+    case RS6000_BIF_VOR_V2DI_UNS:
+    case RS6000_BIF_VOR_V2DI:
+    case RS6000_BIF_VOR_V4SF:
+    case RS6000_BIF_VOR_V2DF:
+      arg0 = gimple_call_arg (stmt, 0);
+      arg1 = gimple_call_arg (stmt, 1);
+      lhs = gimple_call_lhs (stmt);
+      g = gimple_build_assign (lhs, BIT_IOR_EXPR, arg0, arg1);
+      gimple_set_location (g, gimple_location (stmt));
+      gsi_replace (gsi, g, true);
+      return true;
+    /* flavors of vec_orc.  */
+    case RS6000_BIF_ORC_V16QI_UNS:
+    case RS6000_BIF_ORC_V16QI:
+    case RS6000_BIF_ORC_V8HI_UNS:
+    case RS6000_BIF_ORC_V8HI:
+    case RS6000_BIF_ORC_V4SI_UNS:
+    case RS6000_BIF_ORC_V4SI:
+    case RS6000_BIF_ORC_V2DI_UNS:
+    case RS6000_BIF_ORC_V2DI:
+    case RS6000_BIF_ORC_V4SF:
+    case RS6000_BIF_ORC_V2DF:
+      arg0 = gimple_call_arg (stmt, 0);
+      arg1 = gimple_call_arg (stmt, 1);
+      lhs = gimple_call_lhs (stmt);
+      temp = create_tmp_reg_or_ssa_name (TREE_TYPE (arg1));
+      g = gimple_build_assign (temp, BIT_NOT_EXPR, arg1);
+      gimple_set_location (g, gimple_location (stmt));
+      gsi_insert_before (gsi, g, GSI_SAME_STMT);
+      g = gimple_build_assign (lhs, BIT_IOR_EXPR, arg0, temp);
+      gimple_set_location (g, gimple_location (stmt));
+      gsi_replace (gsi, g, true);
+      return true;
+    /* Flavors of vec_xor.  */
+    case RS6000_BIF_VXOR_V16QI_UNS:
+    case RS6000_BIF_VXOR_V16QI:
+    case RS6000_BIF_VXOR_V8HI_UNS:
+    case RS6000_BIF_VXOR_V8HI:
+    case RS6000_BIF_VXOR_V4SI_UNS:
+    case RS6000_BIF_VXOR_V4SI:
+    case RS6000_BIF_VXOR_V2DI_UNS:
+    case RS6000_BIF_VXOR_V2DI:
+    case RS6000_BIF_VXOR_V4SF:
+    case RS6000_BIF_VXOR_V2DF:
+      arg0 = gimple_call_arg (stmt, 0);
+      arg1 = gimple_call_arg (stmt, 1);
+      lhs = gimple_call_lhs (stmt);
+      g = gimple_build_assign (lhs, BIT_XOR_EXPR, arg0, arg1);
+      gimple_set_location (g, gimple_location (stmt));
+      gsi_replace (gsi, g, true);
+      return true;
+    /* Flavors of vec_nor.  */
+    case RS6000_BIF_VNOR_V16QI_UNS:
+    case RS6000_BIF_VNOR_V16QI:
+    case RS6000_BIF_VNOR_V8HI_UNS:
+    case RS6000_BIF_VNOR_V8HI:
+    case RS6000_BIF_VNOR_V4SI_UNS:
+    case RS6000_BIF_VNOR_V4SI:
+    case RS6000_BIF_VNOR_V2DI_UNS:
+    case RS6000_BIF_VNOR_V2DI:
+    case RS6000_BIF_VNOR_V4SF:
+    case RS6000_BIF_VNOR_V2DF:
+      arg0 = gimple_call_arg (stmt, 0);
+      arg1 = gimple_call_arg (stmt, 1);
+      lhs = gimple_call_lhs (stmt);
+      temp = create_tmp_reg_or_ssa_name (TREE_TYPE (arg1));
+      g = gimple_build_assign (temp, BIT_IOR_EXPR, arg0, arg1);
+      gimple_set_location (g, gimple_location (stmt));
+      gsi_insert_before (gsi, g, GSI_SAME_STMT);
+      g = gimple_build_assign (lhs, BIT_NOT_EXPR, temp);
+      gimple_set_location (g, gimple_location (stmt));
+      gsi_replace (gsi, g, true);
+      return true;
+    /* flavors of vec_abs.  */
+    case RS6000_BIF_ABS_V16QI:
+    case RS6000_BIF_ABS_V8HI:
+    case RS6000_BIF_ABS_V4SI:
+    case RS6000_BIF_ABS_V4SF:
+    case RS6000_BIF_ABS_V2DI:
+    case RS6000_BIF_XVABSDP:
+    case RS6000_BIF_XVABSSP:
+      arg0 = gimple_call_arg (stmt, 0);
+      if (INTEGRAL_TYPE_P (TREE_TYPE (TREE_TYPE (arg0)))
+	  && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (TREE_TYPE (arg0))))
+	return false;
+      lhs = gimple_call_lhs (stmt);
+      g = gimple_build_assign (lhs, ABS_EXPR, arg0);
+      gimple_set_location (g, gimple_location (stmt));
+      gsi_replace (gsi, g, true);
+      return true;
+    /* flavors of vec_min.  */
+    case RS6000_BIF_XVMINDP:
+    case RS6000_BIF_XVMINSP:
+    case RS6000_BIF_VMINSD:
+    case RS6000_BIF_VMINUD:
+    case RS6000_BIF_VMINSB:
+    case RS6000_BIF_VMINSH:
+    case RS6000_BIF_VMINSW:
+    case RS6000_BIF_VMINUB:
+    case RS6000_BIF_VMINUH:
+    case RS6000_BIF_VMINUW:
+    case RS6000_BIF_VMINFP:
+      arg0 = gimple_call_arg (stmt, 0);
+      arg1 = gimple_call_arg (stmt, 1);
+      lhs = gimple_call_lhs (stmt);
+      g = gimple_build_assign (lhs, MIN_EXPR, arg0, arg1);
+      gimple_set_location (g, gimple_location (stmt));
+      gsi_replace (gsi, g, true);
+      return true;
+    /* flavors of vec_max.  */
+    case RS6000_BIF_XVMAXDP:
+    case RS6000_BIF_XVMAXSP:
+    case RS6000_BIF_VMAXSD:
+    case RS6000_BIF_VMAXUD:
+    case RS6000_BIF_VMAXSB:
+    case RS6000_BIF_VMAXSH:
+    case RS6000_BIF_VMAXSW:
+    case RS6000_BIF_VMAXUB:
+    case RS6000_BIF_VMAXUH:
+    case RS6000_BIF_VMAXUW:
+    case RS6000_BIF_VMAXFP:
+      arg0 = gimple_call_arg (stmt, 0);
+      arg1 = gimple_call_arg (stmt, 1);
+      lhs = gimple_call_lhs (stmt);
+      g = gimple_build_assign (lhs, MAX_EXPR, arg0, arg1);
+      gimple_set_location (g, gimple_location (stmt));
+      gsi_replace (gsi, g, true);
+      return true;
+    /* Flavors of vec_eqv.  */
+    case RS6000_BIF_EQV_V16QI:
+    case RS6000_BIF_EQV_V8HI:
+    case RS6000_BIF_EQV_V4SI:
+    case RS6000_BIF_EQV_V4SF:
+    case RS6000_BIF_EQV_V2DF:
+    case RS6000_BIF_EQV_V2DI:
+      arg0 = gimple_call_arg (stmt, 0);
+      arg1 = gimple_call_arg (stmt, 1);
+      lhs = gimple_call_lhs (stmt);
+      temp = create_tmp_reg_or_ssa_name (TREE_TYPE (arg1));
+      g = gimple_build_assign (temp, BIT_XOR_EXPR, arg0, arg1);
+      gimple_set_location (g, gimple_location (stmt));
+      gsi_insert_before (gsi, g, GSI_SAME_STMT);
+      g = gimple_build_assign (lhs, BIT_NOT_EXPR, temp);
+      gimple_set_location (g, gimple_location (stmt));
+      gsi_replace (gsi, g, true);
+      return true;
+    /* Flavors of vec_rotate_left.  */
+    case RS6000_BIF_VRLB:
+    case RS6000_BIF_VRLH:
+    case RS6000_BIF_VRLW:
+    case RS6000_BIF_VRLD:
+      arg0 = gimple_call_arg (stmt, 0);
+      arg1 = gimple_call_arg (stmt, 1);
+      lhs = gimple_call_lhs (stmt);
+      g = gimple_build_assign (lhs, LROTATE_EXPR, arg0, arg1);
+      gimple_set_location (g, gimple_location (stmt));
+      gsi_replace (gsi, g, true);
+      return true;
+  /* Flavors of vector shift right algebraic.
+     vec_sra{b,h,w} -> vsra{b,h,w}.  */
+    case RS6000_BIF_VSRAB:
+    case RS6000_BIF_VSRAH:
+    case RS6000_BIF_VSRAW:
+    case RS6000_BIF_VSRAD:
+      {
+	arg0 = gimple_call_arg (stmt, 0);
+	arg1 = gimple_call_arg (stmt, 1);
+	lhs = gimple_call_lhs (stmt);
+	tree arg1_type = TREE_TYPE (arg1);
+	tree unsigned_arg1_type = unsigned_type_for (TREE_TYPE (arg1));
+	tree unsigned_element_type = unsigned_type_for (TREE_TYPE (arg1_type));
+	location_t loc = gimple_location (stmt);
+	/* Force arg1 into the range valid matching the arg0 type.  */
+	/* Build a vector consisting of the max valid bit-size values.  */
+	int n_elts = VECTOR_CST_NELTS (arg1);
+	tree element_size = build_int_cst (unsigned_element_type,
+					   128 / n_elts);
+	tree_vector_builder elts (unsigned_arg1_type, n_elts, 1);
+	for (int i = 0; i < n_elts; i++)
+	  elts.safe_push (element_size);
+	tree modulo_tree = elts.build ();
+	/* Modulo the provided shift value against that vector.  */
+	gimple_seq stmts = NULL;
+	tree unsigned_arg1 = gimple_build (&stmts, VIEW_CONVERT_EXPR,
+					   unsigned_arg1_type, arg1);
+	tree new_arg1 = gimple_build (&stmts, loc, TRUNC_MOD_EXPR,
+				      unsigned_arg1_type, unsigned_arg1,
+				      modulo_tree);
+	gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+	/* And finally, do the shift.  */
+	g = gimple_build_assign (lhs, RSHIFT_EXPR, arg0, new_arg1);
+	gimple_set_location (g, loc);
+	gsi_replace (gsi, g, true);
+	return true;
+      }
+   /* Flavors of vector shift left.
+      builtin_altivec_vsl{b,h,w} -> vsl{b,h,w}.  */
+    case RS6000_BIF_VSLB:
+    case RS6000_BIF_VSLH:
+    case RS6000_BIF_VSLW:
+    case RS6000_BIF_VSLD:
+      {
+	location_t loc;
+	gimple_seq stmts = NULL;
+	arg0 = gimple_call_arg (stmt, 0);
+	tree arg0_type = TREE_TYPE (arg0);
+	if (INTEGRAL_TYPE_P (TREE_TYPE (arg0_type))
+	    && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (arg0_type)))
+	  return false;
+	arg1 = gimple_call_arg (stmt, 1);
+	tree arg1_type = TREE_TYPE (arg1);
+	tree unsigned_arg1_type = unsigned_type_for (TREE_TYPE (arg1));
+	tree unsigned_element_type = unsigned_type_for (TREE_TYPE (arg1_type));
+	loc = gimple_location (stmt);
+	lhs = gimple_call_lhs (stmt);
+	/* Force arg1 into the range valid matching the arg0 type.  */
+	/* Build a vector consisting of the max valid bit-size values.  */
+	int n_elts = VECTOR_CST_NELTS (arg1);
+	int tree_size_in_bits = TREE_INT_CST_LOW (size_in_bytes (arg1_type))
+				* BITS_PER_UNIT;
+	tree element_size = build_int_cst (unsigned_element_type,
+					   tree_size_in_bits / n_elts);
+	tree_vector_builder elts (unsigned_type_for (arg1_type), n_elts, 1);
+	for (int i = 0; i < n_elts; i++)
+	  elts.safe_push (element_size);
+	tree modulo_tree = elts.build ();
+	/* Modulo the provided shift value against that vector.  */
+	tree unsigned_arg1 = gimple_build (&stmts, VIEW_CONVERT_EXPR,
+					   unsigned_arg1_type, arg1);
+	tree new_arg1 = gimple_build (&stmts, loc, TRUNC_MOD_EXPR,
+				      unsigned_arg1_type, unsigned_arg1,
+				      modulo_tree);
+	gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+	/* And finally, do the shift.  */
+	g = gimple_build_assign (lhs, LSHIFT_EXPR, arg0, new_arg1);
+	gimple_set_location (g, gimple_location (stmt));
+	gsi_replace (gsi, g, true);
+	return true;
+      }
+    /* Flavors of vector shift right.  */
+    case RS6000_BIF_VSRB:
+    case RS6000_BIF_VSRH:
+    case RS6000_BIF_VSRW:
+    case RS6000_BIF_VSRD:
+      {
+	arg0 = gimple_call_arg (stmt, 0);
+	arg1 = gimple_call_arg (stmt, 1);
+	lhs = gimple_call_lhs (stmt);
+	tree arg1_type = TREE_TYPE (arg1);
+	tree unsigned_arg1_type = unsigned_type_for (TREE_TYPE (arg1));
+	tree unsigned_element_type = unsigned_type_for (TREE_TYPE (arg1_type));
+	location_t loc = gimple_location (stmt);
+	gimple_seq stmts = NULL;
+	/* Convert arg0 to unsigned.  */
+	tree arg0_unsigned
+	  = gimple_build (&stmts, VIEW_CONVERT_EXPR,
+			  unsigned_type_for (TREE_TYPE (arg0)), arg0);
+	/* Force arg1 into the range valid matching the arg0 type.  */
+	/* Build a vector consisting of the max valid bit-size values.  */
+	int n_elts = VECTOR_CST_NELTS (arg1);
+	tree element_size = build_int_cst (unsigned_element_type,
+					   128 / n_elts);
+	tree_vector_builder elts (unsigned_arg1_type, n_elts, 1);
+	for (int i = 0; i < n_elts; i++)
+	  elts.safe_push (element_size);
+	tree modulo_tree = elts.build ();
+	/* Modulo the provided shift value against that vector.  */
+	tree unsigned_arg1 = gimple_build (&stmts, VIEW_CONVERT_EXPR,
+					   unsigned_arg1_type, arg1);
+	tree new_arg1 = gimple_build (&stmts, loc, TRUNC_MOD_EXPR,
+				      unsigned_arg1_type, unsigned_arg1,
+				      modulo_tree);
+	/* Do the shift.  */
+	tree res
+	  = gimple_build (&stmts, RSHIFT_EXPR,
+			  TREE_TYPE (arg0_unsigned), arg0_unsigned, new_arg1);
+	/* Convert result back to the lhs type.  */
+	res = gimple_build (&stmts, VIEW_CONVERT_EXPR, TREE_TYPE (lhs), res);
+	gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+	replace_call_with_value (gsi, res);
+	return true;
+      }
+    /* Vector loads.  */
+    case RS6000_BIF_LVX_V16QI:
+    case RS6000_BIF_LVX_V8HI:
+    case RS6000_BIF_LVX_V4SI:
+    case RS6000_BIF_LVX_V4SF:
+    case RS6000_BIF_LVX_V2DI:
+    case RS6000_BIF_LVX_V2DF:
+    case RS6000_BIF_LVX_V1TI:
+      {
+	arg0 = gimple_call_arg (stmt, 0);  // offset
+	arg1 = gimple_call_arg (stmt, 1);  // address
+	lhs = gimple_call_lhs (stmt);
+	location_t loc = gimple_location (stmt);
+	/* Since arg1 may be cast to a different type, just use ptr_type_node
+	   here instead of trying to enforce TBAA on pointer types.  */
+	tree arg1_type = ptr_type_node;
+	tree lhs_type = TREE_TYPE (lhs);
+	/* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  Create
+	   the tree using the value from arg0.  The resulting type will match
+	   the type of arg1.  */
+	gimple_seq stmts = NULL;
+	tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg0);
+	tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR,
+				       arg1_type, arg1, temp_offset);
+	/* Mask off any lower bits from the address.  */
+	tree aligned_addr = gimple_build (&stmts, loc, BIT_AND_EXPR,
+					  arg1_type, temp_addr,
+					  build_int_cst (arg1_type, -16));
+	gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+	if (!is_gimple_mem_ref_addr (aligned_addr))
+	  {
+	    tree t = make_ssa_name (TREE_TYPE (aligned_addr));
+	    gimple *g = gimple_build_assign (t, aligned_addr);
+	    gsi_insert_before (gsi, g, GSI_SAME_STMT);
+	    aligned_addr = t;
+	  }
+	/* Use the build2 helper to set up the mem_ref.  The MEM_REF could also
+	   take an offset, but since we've already incorporated the offset
+	   above, here we just pass in a zero.  */
+	gimple *g
+	  = gimple_build_assign (lhs, build2 (MEM_REF, lhs_type, aligned_addr,
+					      build_int_cst (arg1_type, 0)));
+	gimple_set_location (g, loc);
+	gsi_replace (gsi, g, true);
+	return true;
+      }
+    /* Vector stores.  */
+    case RS6000_BIF_STVX_V16QI:
+    case RS6000_BIF_STVX_V8HI:
+    case RS6000_BIF_STVX_V4SI:
+    case RS6000_BIF_STVX_V4SF:
+    case RS6000_BIF_STVX_V2DI:
+    case RS6000_BIF_STVX_V2DF:
+      {
+	arg0 = gimple_call_arg (stmt, 0); /* Value to be stored.  */
+	arg1 = gimple_call_arg (stmt, 1); /* Offset.  */
+	tree arg2 = gimple_call_arg (stmt, 2); /* Store-to address.  */
+	location_t loc = gimple_location (stmt);
+	tree arg0_type = TREE_TYPE (arg0);
+	/* Use ptr_type_node (no TBAA) for the arg2_type.
+	   FIXME: (Richard)  "A proper fix would be to transition this type as
+	   seen from the frontend to GIMPLE, for example in a similar way we
+	   do for MEM_REFs by piggy-backing that on an extra argument, a
+	   constant zero pointer of the alias pointer type to use (which would
+	   also serve as a type indicator of the store itself).  I'd use a
+	   target specific internal function for this (not sure if we can have
+	   those target specific, but I guess if it's folded away then that's
+	   fine) and get away with the overload set."  */
+	tree arg2_type = ptr_type_node;
+	/* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  Create
+	   the tree using the value from arg0.  The resulting type will match
+	   the type of arg2.  */
+	gimple_seq stmts = NULL;
+	tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg1);
+	tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR,
+				       arg2_type, arg2, temp_offset);
+	/* Mask off any lower bits from the address.  */
+	tree aligned_addr = gimple_build (&stmts, loc, BIT_AND_EXPR,
+					  arg2_type, temp_addr,
+					  build_int_cst (arg2_type, -16));
+	gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+	if (!is_gimple_mem_ref_addr (aligned_addr))
+	  {
+	    tree t = make_ssa_name (TREE_TYPE (aligned_addr));
+	    gimple *g = gimple_build_assign (t, aligned_addr);
+	    gsi_insert_before (gsi, g, GSI_SAME_STMT);
+	    aligned_addr = t;
+	  }
+	/* The desired gimple result should be similar to:
+	   MEM[(__vector floatD.1407 *)_1] = vf1D.2697;  */
+	gimple *g
+	  = gimple_build_assign (build2 (MEM_REF, arg0_type, aligned_addr,
+					 build_int_cst (arg2_type, 0)), arg0);
+	gimple_set_location (g, loc);
+	gsi_replace (gsi, g, true);
+	return true;
+      }
+
+    /* unaligned Vector loads.  */
+    case RS6000_BIF_LXVW4X_V16QI:
+    case RS6000_BIF_LXVW4X_V8HI:
+    case RS6000_BIF_LXVW4X_V4SF:
+    case RS6000_BIF_LXVW4X_V4SI:
+    case RS6000_BIF_LXVD2X_V2DF:
+    case RS6000_BIF_LXVD2X_V2DI:
+      {
+	arg0 = gimple_call_arg (stmt, 0);  // offset
+	arg1 = gimple_call_arg (stmt, 1);  // address
+	lhs = gimple_call_lhs (stmt);
+	location_t loc = gimple_location (stmt);
+	/* Since arg1 may be cast to a different type, just use ptr_type_node
+	   here instead of trying to enforce TBAA on pointer types.  */
+	tree arg1_type = ptr_type_node;
+	tree lhs_type = TREE_TYPE (lhs);
+	/* In GIMPLE the type of the MEM_REF specifies the alignment.  The
+	  required alignment (power) is 4 bytes regardless of data type.  */
+	tree align_ltype = build_aligned_type (lhs_type, 4);
+	/* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  Create
+	   the tree using the value from arg0.  The resulting type will match
+	   the type of arg1.  */
+	gimple_seq stmts = NULL;
+	tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg0);
+	tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR,
+				       arg1_type, arg1, temp_offset);
+	gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+	if (!is_gimple_mem_ref_addr (temp_addr))
+	  {
+	    tree t = make_ssa_name (TREE_TYPE (temp_addr));
+	    gimple *g = gimple_build_assign (t, temp_addr);
+	    gsi_insert_before (gsi, g, GSI_SAME_STMT);
+	    temp_addr = t;
+	  }
+	/* Use the build2 helper to set up the mem_ref.  The MEM_REF could also
+	   take an offset, but since we've already incorporated the offset
+	   above, here we just pass in a zero.  */
+	gimple *g;
+	g = gimple_build_assign (lhs, build2 (MEM_REF, align_ltype, temp_addr,
+					      build_int_cst (arg1_type, 0)));
+	gimple_set_location (g, loc);
+	gsi_replace (gsi, g, true);
+	return true;
+      }
+
+    /* unaligned Vector stores.  */
+    case RS6000_BIF_STXVW4X_V16QI:
+    case RS6000_BIF_STXVW4X_V8HI:
+    case RS6000_BIF_STXVW4X_V4SF:
+    case RS6000_BIF_STXVW4X_V4SI:
+    case RS6000_BIF_STXVD2X_V2DF:
+    case RS6000_BIF_STXVD2X_V2DI:
+      {
+	arg0 = gimple_call_arg (stmt, 0); /* Value to be stored.  */
+	arg1 = gimple_call_arg (stmt, 1); /* Offset.  */
+	tree arg2 = gimple_call_arg (stmt, 2); /* Store-to address.  */
+	location_t loc = gimple_location (stmt);
+	tree arg0_type = TREE_TYPE (arg0);
+	/* Use ptr_type_node (no TBAA) for the arg2_type.  */
+	tree arg2_type = ptr_type_node;
+	/* In GIMPLE the type of the MEM_REF specifies the alignment.  The
+	   required alignment (power) is 4 bytes regardless of data type.  */
+	tree align_stype = build_aligned_type (arg0_type, 4);
+	/* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  Create
+	   the tree using the value from arg1.  */
+	gimple_seq stmts = NULL;
+	tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg1);
+	tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR,
+				       arg2_type, arg2, temp_offset);
+	gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+	if (!is_gimple_mem_ref_addr (temp_addr))
+	  {
+	    tree t = make_ssa_name (TREE_TYPE (temp_addr));
+	    gimple *g = gimple_build_assign (t, temp_addr);
+	    gsi_insert_before (gsi, g, GSI_SAME_STMT);
+	    temp_addr = t;
+	  }
+	gimple *g;
+	g = gimple_build_assign (build2 (MEM_REF, align_stype, temp_addr,
+					 build_int_cst (arg2_type, 0)), arg0);
+	gimple_set_location (g, loc);
+	gsi_replace (gsi, g, true);
+	return true;
+      }
+
+    /* Vector Fused multiply-add (fma).  */
+    case RS6000_BIF_VMADDFP:
+    case RS6000_BIF_XVMADDDP:
+    case RS6000_BIF_XVMADDSP:
+    case RS6000_BIF_VMLADDUHM:
+      {
+	arg0 = gimple_call_arg (stmt, 0);
+	arg1 = gimple_call_arg (stmt, 1);
+	tree arg2 = gimple_call_arg (stmt, 2);
+	lhs = gimple_call_lhs (stmt);
+	gcall *g = gimple_build_call_internal (IFN_FMA, 3, arg0, arg1, arg2);
+	gimple_call_set_lhs (g, lhs);
+	gimple_call_set_nothrow (g, true);
+	gimple_set_location (g, gimple_location (stmt));
+	gsi_replace (gsi, g, true);
+	return true;
+      }
+
+    /* Vector compares; EQ, NE, GE, GT, LE.  */
+    case RS6000_BIF_VCMPEQUB:
+    case RS6000_BIF_VCMPEQUH:
+    case RS6000_BIF_VCMPEQUW:
+    case RS6000_BIF_VCMPEQUD:
+    /* We deliberately omit RS6000_BIF_VCMPEQUT for now, because gimple
+       folding produces worse code for 128-bit compares.  */
+      fold_compare_helper (gsi, EQ_EXPR, stmt);
+      return true;
+
+    case RS6000_BIF_VCMPNEB:
+    case RS6000_BIF_VCMPNEH:
+    case RS6000_BIF_VCMPNEW:
+    /* We deliberately omit RS6000_BIF_VCMPNET for now, because gimple
+       folding produces worse code for 128-bit compares.  */
+      fold_compare_helper (gsi, NE_EXPR, stmt);
+      return true;
+
+    case RS6000_BIF_CMPGE_16QI:
+    case RS6000_BIF_CMPGE_U16QI:
+    case RS6000_BIF_CMPGE_8HI:
+    case RS6000_BIF_CMPGE_U8HI:
+    case RS6000_BIF_CMPGE_4SI:
+    case RS6000_BIF_CMPGE_U4SI:
+    case RS6000_BIF_CMPGE_2DI:
+    case RS6000_BIF_CMPGE_U2DI:
+    /* We deliberately omit RS6000_BIF_CMPGE_1TI and RS6000_BIF_CMPGE_U1TI
+       for now, because gimple folding produces worse code for 128-bit
+       compares.  */
+      fold_compare_helper (gsi, GE_EXPR, stmt);
+      return true;
+
+    case RS6000_BIF_VCMPGTSB:
+    case RS6000_BIF_VCMPGTUB:
+    case RS6000_BIF_VCMPGTSH:
+    case RS6000_BIF_VCMPGTUH:
+    case RS6000_BIF_VCMPGTSW:
+    case RS6000_BIF_VCMPGTUW:
+    case RS6000_BIF_VCMPGTUD:
+    case RS6000_BIF_VCMPGTSD:
+    /* We deliberately omit RS6000_BIF_VCMPGTUT and RS6000_BIF_VCMPGTST
+       for now, because gimple folding produces worse code for 128-bit
+       compares.  */
+      fold_compare_helper (gsi, GT_EXPR, stmt);
+      return true;
+
+    case RS6000_BIF_CMPLE_16QI:
+    case RS6000_BIF_CMPLE_U16QI:
+    case RS6000_BIF_CMPLE_8HI:
+    case RS6000_BIF_CMPLE_U8HI:
+    case RS6000_BIF_CMPLE_4SI:
+    case RS6000_BIF_CMPLE_U4SI:
+    case RS6000_BIF_CMPLE_2DI:
+    case RS6000_BIF_CMPLE_U2DI:
+    /* We deliberately omit RS6000_BIF_CMPLE_1TI and RS6000_BIF_CMPLE_U1TI
+       for now, because gimple folding produces worse code for 128-bit
+       compares.  */
+      fold_compare_helper (gsi, LE_EXPR, stmt);
+      return true;
+
+    /* flavors of vec_splat_[us]{8,16,32}.  */
+    case RS6000_BIF_VSPLTISB:
+    case RS6000_BIF_VSPLTISH:
+    case RS6000_BIF_VSPLTISW:
+      {
+	arg0 = gimple_call_arg (stmt, 0);
+	lhs = gimple_call_lhs (stmt);
+
+	/* Only fold the vec_splat_*() if the lower bits of arg 0 is a
+	   5-bit signed constant in range -16 to +15.  */
+	if (TREE_CODE (arg0) != INTEGER_CST
+	    || !IN_RANGE (TREE_INT_CST_LOW (arg0), -16, 15))
+	  return false;
+	gimple_seq stmts = NULL;
+	location_t loc = gimple_location (stmt);
+	tree splat_value = gimple_convert (&stmts, loc,
+					   TREE_TYPE (TREE_TYPE (lhs)), arg0);
+	gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+	tree splat_tree = build_vector_from_val (TREE_TYPE (lhs), splat_value);
+	g = gimple_build_assign (lhs, splat_tree);
+	gimple_set_location (g, gimple_location (stmt));
+	gsi_replace (gsi, g, true);
+	return true;
+      }
+
+    /* Flavors of vec_splat.  */
+    /* a = vec_splat (b, 0x3) becomes a = { b[3],b[3],b[3],...};  */
+    case RS6000_BIF_VSPLTB:
+    case RS6000_BIF_VSPLTH:
+    case RS6000_BIF_VSPLTW:
+    case RS6000_BIF_XXSPLTD_V2DI:
+    case RS6000_BIF_XXSPLTD_V2DF:
+      {
+	arg0 = gimple_call_arg (stmt, 0); /* input vector.  */
+	arg1 = gimple_call_arg (stmt, 1); /* index into arg0.  */
+	/* Only fold the vec_splat_*() if arg1 is both a constant value and
+	   is a valid index into the arg0 vector.  */
+	unsigned int n_elts = VECTOR_CST_NELTS (arg0);
+	if (TREE_CODE (arg1) != INTEGER_CST
+	    || TREE_INT_CST_LOW (arg1) > (n_elts -1))
+	  return false;
+	lhs = gimple_call_lhs (stmt);
+	tree lhs_type = TREE_TYPE (lhs);
+	tree arg0_type = TREE_TYPE (arg0);
+	tree splat;
+	if (TREE_CODE (arg0) == VECTOR_CST)
+	  splat = VECTOR_CST_ELT (arg0, TREE_INT_CST_LOW (arg1));
+	else
+	  {
+	    /* Determine (in bits) the length and start location of the
+	       splat value for a call to the tree_vec_extract helper.  */
+	    int splat_elem_size = TREE_INT_CST_LOW (size_in_bytes (arg0_type))
+				  * BITS_PER_UNIT / n_elts;
+	    int splat_start_bit = TREE_INT_CST_LOW (arg1) * splat_elem_size;
+	    tree len = build_int_cst (bitsizetype, splat_elem_size);
+	    tree start = build_int_cst (bitsizetype, splat_start_bit);
+	    splat = tree_vec_extract (gsi, TREE_TYPE (lhs_type), arg0,
+				      len, start);
+	  }
+	/* And finally, build the new vector.  */
+	tree splat_tree = build_vector_from_val (lhs_type, splat);
+	g = gimple_build_assign (lhs, splat_tree);
+	gimple_set_location (g, gimple_location (stmt));
+	gsi_replace (gsi, g, true);
+	return true;
+      }
+
+    /* vec_mergel (integrals).  */
+    case RS6000_BIF_VMRGLH:
+    case RS6000_BIF_VMRGLW:
+    case RS6000_BIF_XXMRGLW_4SI:
+    case RS6000_BIF_VMRGLB:
+    case RS6000_BIF_VEC_MERGEL_V2DI:
+    case RS6000_BIF_XXMRGLW_4SF:
+    case RS6000_BIF_VEC_MERGEL_V2DF:
+      fold_mergehl_helper (gsi, stmt, 1);
+      return true;
+    /* vec_mergeh (integrals).  */
+    case RS6000_BIF_VMRGHH:
+    case RS6000_BIF_VMRGHW:
+    case RS6000_BIF_XXMRGHW_4SI:
+    case RS6000_BIF_VMRGHB:
+    case RS6000_BIF_VEC_MERGEH_V2DI:
+    case RS6000_BIF_XXMRGHW_4SF:
+    case RS6000_BIF_VEC_MERGEH_V2DF:
+      fold_mergehl_helper (gsi, stmt, 0);
+      return true;
+
+    /* Flavors of vec_mergee.  */
+    case RS6000_BIF_VMRGEW_V4SI:
+    case RS6000_BIF_VMRGEW_V2DI:
+    case RS6000_BIF_VMRGEW_V4SF:
+    case RS6000_BIF_VMRGEW_V2DF:
+      fold_mergeeo_helper (gsi, stmt, 0);
+      return true;
+    /* Flavors of vec_mergeo.  */
+    case RS6000_BIF_VMRGOW_V4SI:
+    case RS6000_BIF_VMRGOW_V2DI:
+    case RS6000_BIF_VMRGOW_V4SF:
+    case RS6000_BIF_VMRGOW_V2DF:
+      fold_mergeeo_helper (gsi, stmt, 1);
+      return true;
+
+    /* d = vec_pack (a, b) */
+    case RS6000_BIF_VPKUDUM:
+    case RS6000_BIF_VPKUHUM:
+    case RS6000_BIF_VPKUWUM:
+      {
+	arg0 = gimple_call_arg (stmt, 0);
+	arg1 = gimple_call_arg (stmt, 1);
+	lhs = gimple_call_lhs (stmt);
+	gimple *g = gimple_build_assign (lhs, VEC_PACK_TRUNC_EXPR, arg0, arg1);
+	gimple_set_location (g, gimple_location (stmt));
+	gsi_replace (gsi, g, true);
+	return true;
+      }
+
+    /* d = vec_unpackh (a) */
+    /* Note that the UNPACK_{HI,LO}_EXPR used in the gimple_build_assign call
+       in this code is sensitive to endian-ness, and needs to be inverted to
+       handle both LE and BE targets.  */
+    case RS6000_BIF_VUPKHSB:
+    case RS6000_BIF_VUPKHSH:
+    case RS6000_BIF_VUPKHSW:
+      {
+	arg0 = gimple_call_arg (stmt, 0);
+	lhs = gimple_call_lhs (stmt);
+	if (BYTES_BIG_ENDIAN)
+	  g = gimple_build_assign (lhs, VEC_UNPACK_HI_EXPR, arg0);
+	else
+	  g = gimple_build_assign (lhs, VEC_UNPACK_LO_EXPR, arg0);
+	gimple_set_location (g, gimple_location (stmt));
+	gsi_replace (gsi, g, true);
+	return true;
+      }
+    /* d = vec_unpackl (a) */
+    case RS6000_BIF_VUPKLSB:
+    case RS6000_BIF_VUPKLSH:
+    case RS6000_BIF_VUPKLSW:
+      {
+	arg0 = gimple_call_arg (stmt, 0);
+	lhs = gimple_call_lhs (stmt);
+	if (BYTES_BIG_ENDIAN)
+	  g = gimple_build_assign (lhs, VEC_UNPACK_LO_EXPR, arg0);
+	else
+	  g = gimple_build_assign (lhs, VEC_UNPACK_HI_EXPR, arg0);
+	gimple_set_location (g, gimple_location (stmt));
+	gsi_replace (gsi, g, true);
+	return true;
+      }
+    /* There is no gimple type corresponding with pixel, so just return.  */
+    case RS6000_BIF_VUPKHPX:
+    case RS6000_BIF_VUPKLPX:
+      return false;
+
+    /* vec_perm.  */
+    case RS6000_BIF_VPERM_16QI:
+    case RS6000_BIF_VPERM_8HI:
+    case RS6000_BIF_VPERM_4SI:
+    case RS6000_BIF_VPERM_2DI:
+    case RS6000_BIF_VPERM_4SF:
+    case RS6000_BIF_VPERM_2DF:
+    case RS6000_BIF_VPERM_16QI_UNS:
+    case RS6000_BIF_VPERM_8HI_UNS:
+    case RS6000_BIF_VPERM_4SI_UNS:
+    case RS6000_BIF_VPERM_2DI_UNS:
+      {
+	arg0 = gimple_call_arg (stmt, 0);
+	arg1 = gimple_call_arg (stmt, 1);
+	tree permute = gimple_call_arg (stmt, 2);
+	lhs = gimple_call_lhs (stmt);
+	location_t loc = gimple_location (stmt);
+	gimple_seq stmts = NULL;
+	// convert arg0 and arg1 to match the type of the permute
+	// for the VEC_PERM_EXPR operation.
+	tree permute_type = (TREE_TYPE (permute));
+	tree arg0_ptype = gimple_build (&stmts, loc, VIEW_CONVERT_EXPR,
+					permute_type, arg0);
+	tree arg1_ptype = gimple_build (&stmts, loc, VIEW_CONVERT_EXPR,
+					permute_type, arg1);
+	tree lhs_ptype = gimple_build (&stmts, loc, VEC_PERM_EXPR,
+				      permute_type, arg0_ptype, arg1_ptype,
+				      permute);
+	// Convert the result back to the desired lhs type upon completion.
+	tree temp = gimple_build (&stmts, loc, VIEW_CONVERT_EXPR,
+				  TREE_TYPE (lhs), lhs_ptype);
+	gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
+	g = gimple_build_assign (lhs, temp);
+	gimple_set_location (g, loc);
+	gsi_replace (gsi, g, true);
+	return true;
+      }
+
+    default:
+      if (TARGET_DEBUG_BUILTIN)
+	fprintf (stderr, "gimple builtin intrinsic not matched:%d %s %s\n",
+		 fn_code, fn_name1, fn_name2);
+      break;
+    }
+
+  return false;
+}
+
 /* Expand an expression EXP that calls a built-in function,
    with result going to TARGET if that's convenient
    (and in mode MODE if that's convenient).
-- 
2.27.0


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 04/18] rs6000: Handle some recent MMA builtin changes
  2021-09-01 16:13 [PATCHv5 00/18] Replace the Power target-specific builtin machinery Bill Schmidt
                   ` (2 preceding siblings ...)
  2021-09-01 16:13 ` [PATCH 03/18] rs6000: Handle gimple folding of target built-ins Bill Schmidt
@ 2021-09-01 16:13 ` Bill Schmidt
  2021-09-13 19:02   ` will schmidt
  2021-09-16 23:38   ` Segher Boessenkool
  2021-09-01 16:13 ` [PATCH 05/18] rs6000: Support for vectorizing built-in functions Bill Schmidt
                   ` (14 subsequent siblings)
  18 siblings, 2 replies; 52+ messages in thread
From: Bill Schmidt @ 2021-09-01 16:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc

Peter Bergner recently added two new builtins __builtin_vsx_lxvp and
__builtin_vsx_stxvp.  These happened to break a pattern in MMA builtins that
I had been using to automate gimple folding of MMA builtins.  Previously,
every MMA function that could be folded had an associated internal function
that it was folded into.  The LXVP/STXVP builtins are just folded directly
into memory operations.

Instead of relying on this pattern, this patch adds a new attribute to
builtins called "mmaint," which is set for all MMA builtins that have an
associated internal builtin.  The naming convention that adds _INTERNAL to
the builtin index name remains.

The rest of the patch is just duplicating Peter's patch, using the new
builtin infrastructure.

2021-08-23  Bill Schmidt  <wschmidt@linux.ibm.com>

gcc/
	* config/rs6000/rs6000-builtin-new.def (ASSEMBLE_ACC): Add mmaint flag.
	(ASSEMBLE_PAIR): Likewise.
	(BUILD_ACC): Likewise.
	(DISASSEMBLE_ACC): Likewise.
	(DISASSEMBLE_PAIR): Likewise.
	(PMXVBF16GER2): Likewise.
	(PMXVBF16GER2NN): Likewise.
	(PMXVBF16GER2NP): Likewise.
	(PMXVBF16GER2PN): Likewise.
	(PMXVBF16GER2PP): Likewise.
	(PMXVF16GER2): Likewise.
	(PMXVF16GER2NN): Likewise.
	(PMXVF16GER2NP): Likewise.
	(PMXVF16GER2PN): Likewise.
	(PMXVF16GER2PP): Likewise.
	(PMXVF32GER): Likewise.
	(PMXVF32GERNN): Likewise.
	(PMXVF32GERNP): Likewise.
	(PMXVF32GERPN): Likewise.
	(PMXVF32GERPP): Likewise.
	(PMXVF64GER): Likewise.
	(PMXVF64GERNN): Likewise.
	(PMXVF64GERNP): Likewise.
	(PMXVF64GERPN): Likewise.
	(PMXVF64GERPP): Likewise.
	(PMXVI16GER2): Likewise.
	(PMXVI16GER2PP): Likewise.
	(PMXVI16GER2S): Likewise.
	(PMXVI16GER2SPP): Likewise.
	(PMXVI4GER8): Likewise.
	(PMXVI4GER8PP): Likewise.
	(PMXVI8GER4): Likewise.
	(PMXVI8GER4PP): Likewise.
	(PMXVI8GER4SPP): Likewise.
	(XVBF16GER2): Likewise.
	(XVBF16GER2NN): Likewise.
	(XVBF16GER2NP): Likewise.
	(XVBF16GER2PN): Likewise.
	(XVBF16GER2PP): Likewise.
	(XVF16GER2): Likewise.
	(XVF16GER2NN): Likewise.
	(XVF16GER2NP): Likewise.
	(XVF16GER2PN): Likewise.
	(XVF16GER2PP): Likewise.
	(XVF32GER): Likewise.
	(XVF32GERNN): Likewise.
	(XVF32GERNP): Likewise.
	(XVF32GERPN): Likewise.
	(XVF32GERPP): Likewise.
	(XVF64GER): Likewise.
	(XVF64GERNN): Likewise.
	(XVF64GERNP): Likewise.
	(XVF64GERPN): Likewise.
	(XVF64GERPP): Likewise.
	(XVI16GER2): Likewise.
	(XVI16GER2PP): Likewise.
	(XVI16GER2S): Likewise.
	(XVI16GER2SPP): Likewise.
	(XVI4GER8): Likewise.
	(XVI4GER8PP): Likewise.
	(XVI8GER4): Likewise.
	(XVI8GER4PP): Likewise.
	(XVI8GER4SPP): Likewise.
	(XXMFACC): Likewise.
	(XXMTACC): Likewise.
	(XXSETACCZ): Likewise.
	(ASSEMBLE_PAIR_V): Likewise.
	(BUILD_PAIR): Likewise.
	(DISASSEMBLE_PAIR_V): Likewise.
	(LXVP): New.
	(STXVP): New.
	* config/rs6000/rs6000-call.c
	(rs6000_gimple_fold_new_mma_builtin): Handle RS6000_BIF_LXVP and
	RS6000_BIF_STXVP.
	* config/rs6000/rs6000-gen-builtins.c (attrinfo): Add ismmaint.
	(parse_bif_attrs): Handle ismmaint.
	(write_decls): Add bif_mmaint_bit and bif_is_mmaint.
	(write_bif_static_init): Handle ismmaint.
---
 gcc/config/rs6000/rs6000-builtin-new.def | 145 ++++++++++++-----------
 gcc/config/rs6000/rs6000-call.c          |  38 +++++-
 gcc/config/rs6000/rs6000-gen-builtins.c  |  38 +++---
 3 files changed, 135 insertions(+), 86 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin-new.def b/gcc/config/rs6000/rs6000-builtin-new.def
index a8c6b9e988f..1966516551e 100644
--- a/gcc/config/rs6000/rs6000-builtin-new.def
+++ b/gcc/config/rs6000/rs6000-builtin-new.def
@@ -129,6 +129,7 @@
 ;   mma      Needs special handling for MMA
 ;   quad     MMA instruction using a register quad as an input operand
 ;   pair     MMA instruction using a register pair as an input operand
+;   mmaint   MMA instruction expanding to internal call at GIMPLE time
 ;   no32bit  Not valid for TARGET_32BIT
 ;   32bit    Requires different handling for TARGET_32BIT
 ;   cpu      This is a "cpu_is" or "cpu_supports" builtin
@@ -3584,415 +3585,421 @@
 
 [mma]
   void __builtin_mma_assemble_acc (v512 *, vuc, vuc, vuc, vuc);
-    ASSEMBLE_ACC nothing {mma}
+    ASSEMBLE_ACC nothing {mma,mmaint}
 
   v512 __builtin_mma_assemble_acc_internal (vuc, vuc, vuc, vuc);
     ASSEMBLE_ACC_INTERNAL mma_assemble_acc {mma}
 
   void __builtin_mma_assemble_pair (v256 *, vuc, vuc);
-    ASSEMBLE_PAIR nothing {mma}
+    ASSEMBLE_PAIR nothing {mma,mmaint}
 
   v256 __builtin_mma_assemble_pair_internal (vuc, vuc);
     ASSEMBLE_PAIR_INTERNAL vsx_assemble_pair {mma}
 
   void __builtin_mma_build_acc (v512 *, vuc, vuc, vuc, vuc);
-    BUILD_ACC nothing {mma}
+    BUILD_ACC nothing {mma,mmaint}
 
   v512 __builtin_mma_build_acc_internal (vuc, vuc, vuc, vuc);
     BUILD_ACC_INTERNAL mma_assemble_acc {mma}
 
   void __builtin_mma_disassemble_acc (void *, v512 *);
-    DISASSEMBLE_ACC nothing {mma,quad}
+    DISASSEMBLE_ACC nothing {mma,quad,mmaint}
 
   vuc __builtin_mma_disassemble_acc_internal (v512, const int<2>);
     DISASSEMBLE_ACC_INTERNAL mma_disassemble_acc {mma}
 
   void __builtin_mma_disassemble_pair (void *, v256 *);
-    DISASSEMBLE_PAIR nothing {mma,pair}
+    DISASSEMBLE_PAIR nothing {mma,pair,mmaint}
 
   vuc __builtin_mma_disassemble_pair_internal (v256, const int<2>);
     DISASSEMBLE_PAIR_INTERNAL vsx_disassemble_pair {mma}
 
   void __builtin_mma_pmxvbf16ger2 (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
-    PMXVBF16GER2 nothing {mma}
+    PMXVBF16GER2 nothing {mma,mmaint}
 
   v512 __builtin_mma_pmxvbf16ger2_internal (vuc, vuc, const int<4>, const int<4>, const int<2>);
     PMXVBF16GER2_INTERNAL mma_pmxvbf16ger2 {mma}
 
   void __builtin_mma_pmxvbf16ger2nn (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
-    PMXVBF16GER2NN nothing {mma,quad}
+    PMXVBF16GER2NN nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_pmxvbf16ger2nn_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<2>);
     PMXVBF16GER2NN_INTERNAL mma_pmxvbf16ger2nn {mma,quad}
 
   void __builtin_mma_pmxvbf16ger2np (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
-    PMXVBF16GER2NP nothing {mma,quad}
+    PMXVBF16GER2NP nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_pmxvbf16ger2np_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<2>);
     PMXVBF16GER2NP_INTERNAL mma_pmxvbf16ger2np {mma,quad}
 
   void __builtin_mma_pmxvbf16ger2pn (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
-    PMXVBF16GER2PN nothing {mma,quad}
+    PMXVBF16GER2PN nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_pmxvbf16ger2pn_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<2>);
     PMXVBF16GER2PN_INTERNAL mma_pmxvbf16ger2pn {mma,quad}
 
   void __builtin_mma_pmxvbf16ger2pp (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
-    PMXVBF16GER2PP nothing {mma,quad}
+    PMXVBF16GER2PP nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_pmxvbf16ger2pp_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<2>);
     PMXVBF16GER2PP_INTERNAL mma_pmxvbf16ger2pp {mma,quad}
 
   void __builtin_mma_pmxvf16ger2 (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
-    PMXVF16GER2 nothing {mma}
+    PMXVF16GER2 nothing {mma,mmaint}
 
   v512 __builtin_mma_pmxvf16ger2_internal (vuc, vuc, const int<4>, const int<4>, const int<2>);
     PMXVF16GER2_INTERNAL mma_pmxvf16ger2 {mma}
 
   void __builtin_mma_pmxvf16ger2nn (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
-    PMXVF16GER2NN nothing {mma,quad}
+    PMXVF16GER2NN nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_pmxvf16ger2nn_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<2>);
     PMXVF16GER2NN_INTERNAL mma_pmxvf16ger2nn {mma,quad}
 
   void __builtin_mma_pmxvf16ger2np (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
-    PMXVF16GER2NP nothing {mma,quad}
+    PMXVF16GER2NP nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_pmxvf16ger2np_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<2>);
     PMXVF16GER2NP_INTERNAL mma_pmxvf16ger2np {mma,quad}
 
   void __builtin_mma_pmxvf16ger2pn (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
-    PMXVF16GER2PN nothing {mma,quad}
+    PMXVF16GER2PN nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_pmxvf16ger2pn_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<2>);
     PMXVF16GER2PN_INTERNAL mma_pmxvf16ger2pn {mma,quad}
 
   void __builtin_mma_pmxvf16ger2pp (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
-    PMXVF16GER2PP nothing {mma,quad}
+    PMXVF16GER2PP nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_pmxvf16ger2pp_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<2>);
     PMXVF16GER2PP_INTERNAL mma_pmxvf16ger2pp {mma,quad}
 
   void __builtin_mma_pmxvf32ger (v512 *, vuc, vuc, const int<4>, const int<4>);
-    PMXVF32GER nothing {mma}
+    PMXVF32GER nothing {mma,mmaint}
 
   v512 __builtin_mma_pmxvf32ger_internal (vuc, vuc, const int<4>, const int<4>);
     PMXVF32GER_INTERNAL mma_pmxvf32ger {mma}
 
   void __builtin_mma_pmxvf32gernn (v512 *, vuc, vuc, const int<4>, const int<4>);
-    PMXVF32GERNN nothing {mma,quad}
+    PMXVF32GERNN nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_pmxvf32gernn_internal (v512, vuc, vuc, const int<4>, const int<4>);
     PMXVF32GERNN_INTERNAL mma_pmxvf32gernn {mma,quad}
 
   void __builtin_mma_pmxvf32gernp (v512 *, vuc, vuc, const int<4>, const int<4>);
-    PMXVF32GERNP nothing {mma,quad}
+    PMXVF32GERNP nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_pmxvf32gernp_internal (v512, vuc, vuc, const int<4>, const int<4>);
     PMXVF32GERNP_INTERNAL mma_pmxvf32gernp {mma,quad}
 
   void __builtin_mma_pmxvf32gerpn (v512 *, vuc, vuc, const int<4>, const int<4>);
-    PMXVF32GERPN nothing {mma,quad}
+    PMXVF32GERPN nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_pmxvf32gerpn_internal (v512, vuc, vuc, const int<4>, const int<4>);
     PMXVF32GERPN_INTERNAL mma_pmxvf32gerpn {mma,quad}
 
   void __builtin_mma_pmxvf32gerpp (v512 *, vuc, vuc, const int<4>, const int<4>);
-    PMXVF32GERPP nothing {mma,quad}
+    PMXVF32GERPP nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_pmxvf32gerpp_internal (v512, vuc, vuc, const int<4>, const int<4>);
     PMXVF32GERPP_INTERNAL mma_pmxvf32gerpp {mma,quad}
 
   void __builtin_mma_pmxvf64ger (v512 *, v256, vuc, const int<4>, const int<2>);
-    PMXVF64GER nothing {mma,pair}
+    PMXVF64GER nothing {mma,pair,mmaint}
 
   v512 __builtin_mma_pmxvf64ger_internal (v256, vuc, const int<4>, const int<2>);
     PMXVF64GER_INTERNAL mma_pmxvf64ger {mma,pair}
 
   void __builtin_mma_pmxvf64gernn (v512 *, v256, vuc, const int<4>, const int<2>);
-    PMXVF64GERNN nothing {mma,pair,quad}
+    PMXVF64GERNN nothing {mma,pair,quad,mmaint}
 
   v512 __builtin_mma_pmxvf64gernn_internal (v512, v256, vuc, const int<4>, const int<2>);
     PMXVF64GERNN_INTERNAL mma_pmxvf64gernn {mma,pair,quad}
 
   void __builtin_mma_pmxvf64gernp (v512 *, v256, vuc, const int<4>, const int<2>);
-    PMXVF64GERNP nothing {mma,pair,quad}
+    PMXVF64GERNP nothing {mma,pair,quad,mmaint}
 
   v512 __builtin_mma_pmxvf64gernp_internal (v512, v256, vuc, const int<4>, const int<2>);
     PMXVF64GERNP_INTERNAL mma_pmxvf64gernp {mma,pair,quad}
 
   void __builtin_mma_pmxvf64gerpn (v512 *, v256, vuc, const int<4>, const int<2>);
-    PMXVF64GERPN nothing {mma,pair,quad}
+    PMXVF64GERPN nothing {mma,pair,quad,mmaint}
 
   v512 __builtin_mma_pmxvf64gerpn_internal (v512, v256, vuc, const int<4>, const int<2>);
     PMXVF64GERPN_INTERNAL mma_pmxvf64gerpn {mma,pair,quad}
 
   void __builtin_mma_pmxvf64gerpp (v512 *, v256, vuc, const int<4>, const int<2>);
-    PMXVF64GERPP nothing {mma,pair,quad}
+    PMXVF64GERPP nothing {mma,pair,quad,mmaint}
 
   v512 __builtin_mma_pmxvf64gerpp_internal (v512, v256, vuc, const int<4>, const int<2>);
     PMXVF64GERPP_INTERNAL mma_pmxvf64gerpp {mma,pair,quad}
 
   void __builtin_mma_pmxvi16ger2 (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
-    PMXVI16GER2 nothing {mma}
+    PMXVI16GER2 nothing {mma,mmaint}
 
   v512 __builtin_mma_pmxvi16ger2_internal (vuc, vuc, const int<4>, const int<4>, const int<2>);
     PMXVI16GER2_INTERNAL mma_pmxvi16ger2 {mma}
 
   void __builtin_mma_pmxvi16ger2pp (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
-    PMXVI16GER2PP nothing {mma,quad}
+    PMXVI16GER2PP nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_pmxvi16ger2pp_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<2>);
     PMXVI16GER2PP_INTERNAL mma_pmxvi16ger2pp {mma,quad}
 
   void __builtin_mma_pmxvi16ger2s (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
-    PMXVI16GER2S nothing {mma}
+    PMXVI16GER2S nothing {mma,mmaint}
 
   v512 __builtin_mma_pmxvi16ger2s_internal (vuc, vuc, const int<4>, const int<4>, const int<2>);
     PMXVI16GER2S_INTERNAL mma_pmxvi16ger2s {mma}
 
   void __builtin_mma_pmxvi16ger2spp (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
-    PMXVI16GER2SPP nothing {mma,quad}
+    PMXVI16GER2SPP nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_pmxvi16ger2spp_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<2>);
     PMXVI16GER2SPP_INTERNAL mma_pmxvi16ger2spp {mma,quad}
 
   void __builtin_mma_pmxvi4ger8 (v512 *, vuc, vuc, const int<4>, const int<4>, const int<8>);
-    PMXVI4GER8 nothing {mma}
+    PMXVI4GER8 nothing {mma,mmaint}
 
   v512 __builtin_mma_pmxvi4ger8_internal (vuc, vuc, const int<4>, const int<4>, const int<8>);
     PMXVI4GER8_INTERNAL mma_pmxvi4ger8 {mma}
 
   void __builtin_mma_pmxvi4ger8pp (v512 *, vuc, vuc, const int<4>, const int<4>, const int<4>);
-    PMXVI4GER8PP nothing {mma,quad}
+    PMXVI4GER8PP nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_pmxvi4ger8pp_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<4>);
     PMXVI4GER8PP_INTERNAL mma_pmxvi4ger8pp {mma,quad}
 
   void __builtin_mma_pmxvi8ger4 (v512 *, vuc, vuc, const int<4>, const int<4>, const int<4>);
-    PMXVI8GER4 nothing {mma}
+    PMXVI8GER4 nothing {mma,mmaint}
 
   v512 __builtin_mma_pmxvi8ger4_internal (vuc, vuc, const int<4>, const int<4>, const int<4>);
     PMXVI8GER4_INTERNAL mma_pmxvi8ger4 {mma}
 
   void __builtin_mma_pmxvi8ger4pp (v512 *, vuc, vuc, const int<4>, const int<4>, const int<4>);
-    PMXVI8GER4PP nothing {mma,quad}
+    PMXVI8GER4PP nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_pmxvi8ger4pp_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<4>);
     PMXVI8GER4PP_INTERNAL mma_pmxvi8ger4pp {mma,quad}
 
   void __builtin_mma_pmxvi8ger4spp (v512 *, vuc, vuc, const int<4>, const int<4>, const int<4>);
-    PMXVI8GER4SPP nothing {mma,quad}
+    PMXVI8GER4SPP nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_pmxvi8ger4spp_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<4>);
     PMXVI8GER4SPP_INTERNAL mma_pmxvi8ger4spp {mma,quad}
 
   void __builtin_mma_xvbf16ger2 (v512 *, vuc, vuc);
-    XVBF16GER2 nothing {mma}
+    XVBF16GER2 nothing {mma,mmaint}
 
   v512 __builtin_mma_xvbf16ger2_internal (vuc, vuc);
     XVBF16GER2_INTERNAL mma_xvbf16ger2 {mma}
 
   void __builtin_mma_xvbf16ger2nn (v512 *, vuc, vuc);
-    XVBF16GER2NN nothing {mma,quad}
+    XVBF16GER2NN nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_xvbf16ger2nn_internal (v512, vuc, vuc);
     XVBF16GER2NN_INTERNAL mma_xvbf16ger2nn {mma,quad}
 
   void __builtin_mma_xvbf16ger2np (v512 *, vuc, vuc);
-    XVBF16GER2NP nothing {mma,quad}
+    XVBF16GER2NP nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_xvbf16ger2np_internal (v512, vuc, vuc);
     XVBF16GER2NP_INTERNAL mma_xvbf16ger2np {mma,quad}
 
   void __builtin_mma_xvbf16ger2pn (v512 *, vuc, vuc);
-    XVBF16GER2PN nothing {mma,quad}
+    XVBF16GER2PN nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_xvbf16ger2pn_internal (v512, vuc, vuc);
     XVBF16GER2PN_INTERNAL mma_xvbf16ger2pn {mma,quad}
 
   void __builtin_mma_xvbf16ger2pp (v512 *, vuc, vuc);
-    XVBF16GER2PP nothing {mma,quad}
+    XVBF16GER2PP nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_xvbf16ger2pp_internal (v512, vuc, vuc);
     XVBF16GER2PP_INTERNAL mma_xvbf16ger2pp {mma,quad}
 
   void __builtin_mma_xvf16ger2 (v512 *, vuc, vuc);
-    XVF16GER2 nothing {mma}
+    XVF16GER2 nothing {mma,mmaint}
 
   v512 __builtin_mma_xvf16ger2_internal (vuc, vuc);
     XVF16GER2_INTERNAL mma_xvf16ger2 {mma}
 
   void __builtin_mma_xvf16ger2nn (v512 *, vuc, vuc);
-    XVF16GER2NN nothing {mma,quad}
+    XVF16GER2NN nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_xvf16ger2nn_internal (v512, vuc, vuc);
     XVF16GER2NN_INTERNAL mma_xvf16ger2nn {mma,quad}
 
   void __builtin_mma_xvf16ger2np (v512 *, vuc, vuc);
-    XVF16GER2NP nothing {mma,quad}
+    XVF16GER2NP nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_xvf16ger2np_internal (v512, vuc, vuc);
     XVF16GER2NP_INTERNAL mma_xvf16ger2np {mma,quad}
 
   void __builtin_mma_xvf16ger2pn (v512 *, vuc, vuc);
-    XVF16GER2PN nothing {mma,quad}
+    XVF16GER2PN nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_xvf16ger2pn_internal (v512, vuc, vuc);
     XVF16GER2PN_INTERNAL mma_xvf16ger2pn {mma,quad}
 
   void __builtin_mma_xvf16ger2pp (v512 *, vuc, vuc);
-    XVF16GER2PP nothing {mma,quad}
+    XVF16GER2PP nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_xvf16ger2pp_internal (v512, vuc, vuc);
     XVF16GER2PP_INTERNAL mma_xvf16ger2pp {mma,quad}
 
   void __builtin_mma_xvf32ger (v512 *, vuc, vuc);
-    XVF32GER nothing {mma}
+    XVF32GER nothing {mma,mmaint}
 
   v512 __builtin_mma_xvf32ger_internal (vuc, vuc);
     XVF32GER_INTERNAL mma_xvf32ger {mma}
 
   void __builtin_mma_xvf32gernn (v512 *, vuc, vuc);
-    XVF32GERNN nothing {mma,quad}
+    XVF32GERNN nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_xvf32gernn_internal (v512, vuc, vuc);
     XVF32GERNN_INTERNAL mma_xvf32gernn {mma,quad}
 
   void __builtin_mma_xvf32gernp (v512 *, vuc, vuc);
-    XVF32GERNP nothing {mma,quad}
+    XVF32GERNP nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_xvf32gernp_internal (v512, vuc, vuc);
     XVF32GERNP_INTERNAL mma_xvf32gernp {mma,quad}
 
   void __builtin_mma_xvf32gerpn (v512 *, vuc, vuc);
-    XVF32GERPN nothing {mma,quad}
+    XVF32GERPN nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_xvf32gerpn_internal (v512, vuc, vuc);
     XVF32GERPN_INTERNAL mma_xvf32gerpn {mma,quad}
 
   void __builtin_mma_xvf32gerpp (v512 *, vuc, vuc);
-    XVF32GERPP nothing {mma,quad}
+    XVF32GERPP nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_xvf32gerpp_internal (v512, vuc, vuc);
     XVF32GERPP_INTERNAL mma_xvf32gerpp {mma,quad}
 
   void __builtin_mma_xvf64ger (v512 *, v256, vuc);
-    XVF64GER nothing {mma,pair}
+    XVF64GER nothing {mma,pair,mmaint}
 
   v512 __builtin_mma_xvf64ger_internal (v256, vuc);
     XVF64GER_INTERNAL mma_xvf64ger {mma,pair}
 
   void __builtin_mma_xvf64gernn (v512 *, v256, vuc);
-    XVF64GERNN nothing {mma,pair,quad}
+    XVF64GERNN nothing {mma,pair,quad,mmaint}
 
   v512 __builtin_mma_xvf64gernn_internal (v512, v256, vuc);
     XVF64GERNN_INTERNAL mma_xvf64gernn {mma,pair,quad}
 
   void __builtin_mma_xvf64gernp (v512 *, v256, vuc);
-    XVF64GERNP nothing {mma,pair,quad}
+    XVF64GERNP nothing {mma,pair,quad,mmaint}
 
   v512 __builtin_mma_xvf64gernp_internal (v512, v256, vuc);
     XVF64GERNP_INTERNAL mma_xvf64gernp {mma,pair,quad}
 
   void __builtin_mma_xvf64gerpn (v512 *, v256, vuc);
-    XVF64GERPN nothing {mma,pair,quad}
+    XVF64GERPN nothing {mma,pair,quad,mmaint}
 
   v512 __builtin_mma_xvf64gerpn_internal (v512, v256, vuc);
     XVF64GERPN_INTERNAL mma_xvf64gerpn {mma,pair,quad}
 
   void __builtin_mma_xvf64gerpp (v512 *, v256, vuc);
-    XVF64GERPP nothing {mma,pair,quad}
+    XVF64GERPP nothing {mma,pair,quad,mmaint}
 
   v512 __builtin_mma_xvf64gerpp_internal (v512, v256, vuc);
     XVF64GERPP_INTERNAL mma_xvf64gerpp {mma,pair,quad}
 
   void __builtin_mma_xvi16ger2 (v512 *, vuc, vuc);
-    XVI16GER2 nothing {mma}
+    XVI16GER2 nothing {mma,mmaint}
 
   v512 __builtin_mma_xvi16ger2_internal (vuc, vuc);
     XVI16GER2_INTERNAL mma_xvi16ger2 {mma}
 
   void __builtin_mma_xvi16ger2pp (v512 *, vuc, vuc);
-    XVI16GER2PP nothing {mma,quad}
+    XVI16GER2PP nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_xvi16ger2pp_internal (v512, vuc, vuc);
     XVI16GER2PP_INTERNAL mma_xvi16ger2pp {mma,quad}
 
   void __builtin_mma_xvi16ger2s (v512 *, vuc, vuc);
-    XVI16GER2S nothing {mma}
+    XVI16GER2S nothing {mma,mmaint}
 
   v512 __builtin_mma_xvi16ger2s_internal (vuc, vuc);
     XVI16GER2S_INTERNAL mma_xvi16ger2s {mma}
 
   void __builtin_mma_xvi16ger2spp (v512 *, vuc, vuc);
-    XVI16GER2SPP nothing {mma,quad}
+    XVI16GER2SPP nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_xvi16ger2spp_internal (v512, vuc, vuc);
     XVI16GER2SPP_INTERNAL mma_xvi16ger2spp {mma,quad}
 
   void __builtin_mma_xvi4ger8 (v512 *, vuc, vuc);
-    XVI4GER8 nothing {mma}
+    XVI4GER8 nothing {mma,mmaint}
 
   v512 __builtin_mma_xvi4ger8_internal (vuc, vuc);
     XVI4GER8_INTERNAL mma_xvi4ger8 {mma}
 
   void __builtin_mma_xvi4ger8pp (v512 *, vuc, vuc);
-    XVI4GER8PP nothing {mma,quad}
+    XVI4GER8PP nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_xvi4ger8pp_internal (v512, vuc, vuc);
     XVI4GER8PP_INTERNAL mma_xvi4ger8pp {mma,quad}
 
   void __builtin_mma_xvi8ger4 (v512 *, vuc, vuc);
-    XVI8GER4 nothing {mma}
+    XVI8GER4 nothing {mma,mmaint}
 
   v512 __builtin_mma_xvi8ger4_internal (vuc, vuc);
     XVI8GER4_INTERNAL mma_xvi8ger4 {mma}
 
   void __builtin_mma_xvi8ger4pp (v512 *, vuc, vuc);
-    XVI8GER4PP nothing {mma,quad}
+    XVI8GER4PP nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_xvi8ger4pp_internal (v512, vuc, vuc);
     XVI8GER4PP_INTERNAL mma_xvi8ger4pp {mma,quad}
 
   void __builtin_mma_xvi8ger4spp (v512 *, vuc, vuc);
-    XVI8GER4SPP nothing {mma,quad}
+    XVI8GER4SPP nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_xvi8ger4spp_internal (v512, vuc, vuc);
     XVI8GER4SPP_INTERNAL mma_xvi8ger4spp {mma,quad}
 
   void __builtin_mma_xxmfacc (v512 *);
-    XXMFACC nothing {mma,quad}
+    XXMFACC nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_xxmfacc_internal (v512);
     XXMFACC_INTERNAL mma_xxmfacc {mma,quad}
 
   void __builtin_mma_xxmtacc (v512 *);
-    XXMTACC nothing {mma,quad}
+    XXMTACC nothing {mma,quad,mmaint}
 
   v512 __builtin_mma_xxmtacc_internal (v512);
     XXMTACC_INTERNAL mma_xxmtacc {mma,quad}
 
   void __builtin_mma_xxsetaccz (v512 *);
-    XXSETACCZ nothing {mma}
+    XXSETACCZ nothing {mma,mmaint}
 
   v512 __builtin_mma_xxsetaccz_internal ();
     XXSETACCZ_INTERNAL mma_xxsetaccz {mma}
 
   void __builtin_vsx_assemble_pair (v256 *, vuc, vuc);
-    ASSEMBLE_PAIR_V nothing {mma}
+    ASSEMBLE_PAIR_V nothing {mma,mmaint}
 
   v256 __builtin_vsx_assemble_pair_internal (vuc, vuc);
     ASSEMBLE_PAIR_V_INTERNAL vsx_assemble_pair {mma}
 
   void __builtin_vsx_build_pair (v256 *, vuc, vuc);
-    BUILD_PAIR nothing {mma}
+    BUILD_PAIR nothing {mma,mmaint}
 
   v256 __builtin_vsx_build_pair_internal (vuc, vuc);
     BUILD_PAIR_INTERNAL vsx_assemble_pair {mma}
 
   void __builtin_vsx_disassemble_pair (void *, v256 *);
-    DISASSEMBLE_PAIR_V nothing {mma,pair}
+    DISASSEMBLE_PAIR_V nothing {mma,pair,mmaint}
 
   vuc __builtin_vsx_disassemble_pair_internal (v256, const int<2>);
     DISASSEMBLE_PAIR_V_INTERNAL vsx_disassemble_pair {mma}
+
+  v256 __builtin_vsx_lxvp (unsigned long, const v256 *);
+    LXVP nothing {mma}
+
+  void __builtin_vsx_stxvp (v256, unsigned long, const v256 *);
+    STXVP nothing {mma,pair}
diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index eae4e15df1e..558f06cfd6c 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -13072,8 +13072,10 @@ rs6000_gimple_fold_new_mma_builtin (gimple_stmt_iterator *gsi,
 
   /* Each call that can be gimple-expanded has an associated built-in
      function that it will expand into.  If this one doesn't, we have
-     already expanded it!  */
-  if (rs6000_builtin_info_x[fncode].assoc_bif == RS6000_BIF_NONE)
+     already expanded it!  Exceptions: lxvp and stxvp.  */
+  if (rs6000_builtin_info_x[fncode].assoc_bif == RS6000_BIF_NONE
+      && fncode != RS6000_BIF_LXVP
+      && fncode != RS6000_BIF_STXVP)
     return false;
 
   bifdata *bd = &rs6000_builtin_info_x[fncode];
@@ -13150,6 +13152,38 @@ rs6000_gimple_fold_new_mma_builtin (gimple_stmt_iterator *gsi,
       gsi_replace_with_seq (gsi, new_seq, true);
       return true;
     }
+  else if (fncode == RS6000_BIF_LXVP)
+    {
+      push_gimplify_context (true);
+      tree offset = gimple_call_arg (stmt, 0);
+      tree ptr = gimple_call_arg (stmt, 1);
+      tree lhs = gimple_call_lhs (stmt);
+      if (TREE_TYPE (TREE_TYPE (ptr)) != vector_pair_type_node)
+	ptr = build1 (VIEW_CONVERT_EXPR,
+		      build_pointer_type (vector_pair_type_node), ptr);
+      tree mem = build_simple_mem_ref (build2 (POINTER_PLUS_EXPR,
+					       TREE_TYPE (ptr), ptr, offset));
+      gimplify_assign (lhs, mem, &new_seq);
+      pop_gimplify_context (NULL);
+      gsi_replace_with_seq (gsi, new_seq, true);
+      return true;
+    }
+  else if (fncode == RS6000_BIF_STXVP)
+    {
+      push_gimplify_context (true);
+      tree src = gimple_call_arg (stmt, 0);
+      tree offset = gimple_call_arg (stmt, 1);
+      tree ptr = gimple_call_arg (stmt, 2);
+      if (TREE_TYPE (TREE_TYPE (ptr)) != vector_pair_type_node)
+	ptr = build1 (VIEW_CONVERT_EXPR,
+		      build_pointer_type (vector_pair_type_node), ptr);
+      tree mem = build_simple_mem_ref (build2 (POINTER_PLUS_EXPR,
+					       TREE_TYPE (ptr), ptr, offset));
+      gimplify_assign (mem, src, &new_seq);
+      pop_gimplify_context (NULL);
+      gsi_replace_with_seq (gsi, new_seq, true);
+      return true;
+    }
 
   /* Convert this built-in into an internal version that uses pass-by-value
      arguments.  The internal built-in is found in the assoc_bif field.  */
diff --git a/gcc/config/rs6000/rs6000-gen-builtins.c b/gcc/config/rs6000/rs6000-gen-builtins.c
index f65932e1cd5..7f711210aff 100644
--- a/gcc/config/rs6000/rs6000-gen-builtins.c
+++ b/gcc/config/rs6000/rs6000-gen-builtins.c
@@ -84,6 +84,7 @@ along with GCC; see the file COPYING3.  If not see
      mma      Needs special handling for MMA instructions
      quad     MMA instruction using a register quad as an input operand
      pair     MMA instruction using a register pair as an input operand
+     mmaint   MMA instruction expanding to internal call at GIMPLE time
      no32bit  Not valid for TARGET_32BIT
      32bit    Requires different handling for TARGET_32BIT
      cpu      This is a "cpu_is" or "cpu_supports" builtin
@@ -369,6 +370,7 @@ struct attrinfo
   bool ismma;
   bool isquad;
   bool ispair;
+  bool ismmaint;
   bool isno32bit;
   bool is32bit;
   bool iscpu;
@@ -1363,6 +1365,8 @@ parse_bif_attrs (attrinfo *attrptr)
 	  attrptr->isquad = 1;
 	else if (!strcmp (attrname, "pair"))
 	  attrptr->ispair = 1;
+	else if (!strcmp (attrname, "mmaint"))
+	  attrptr->ismmaint = 1;
 	else if (!strcmp (attrname, "no32bit"))
 	  attrptr->isno32bit = 1;
 	else if (!strcmp (attrname, "32bit"))
@@ -1409,15 +1413,15 @@ parse_bif_attrs (attrinfo *attrptr)
   (*diag) ("attribute set: init = %d, set = %d, extract = %d, nosoft = %d, "
 	   "ldvec = %d, stvec = %d, reve = %d, pred = %d, htm = %d, "
 	   "htmspr = %d, htmcr = %d, mma = %d, quad = %d, pair = %d, "
-	   "no32bit = %d, 32bit = %d, cpu = %d, ldstmask = %d, lxvrse = %d, "
-	   "lxvrze = %d, endian = %d.\n",
+	   "mmaint = %d, no32bit = %d, 32bit = %d, cpu = %d, ldstmask = %d, "
+	   "lxvrse = %d, lxvrze = %d, endian = %d.\n",
 	   attrptr->isinit, attrptr->isset, attrptr->isextract,
 	   attrptr->isnosoft, attrptr->isldvec, attrptr->isstvec,
 	   attrptr->isreve, attrptr->ispred, attrptr->ishtm, attrptr->ishtmspr,
 	   attrptr->ishtmcr, attrptr->ismma, attrptr->isquad, attrptr->ispair,
-	   attrptr->isno32bit, attrptr->is32bit, attrptr->iscpu,
-	   attrptr->isldstmask, attrptr->islxvrse, attrptr->islxvrze,
-	   attrptr->isendian);
+	   attrptr->ismmaint, attrptr->isno32bit, attrptr->is32bit,
+	   attrptr->iscpu, attrptr->isldstmask, attrptr->islxvrse,
+	   attrptr->islxvrze, attrptr->isendian);
 #endif
 
   return PC_OK;
@@ -2223,13 +2227,14 @@ write_decls (void)
   fprintf (header_file, "#define bif_mma_bit\t\t(0x00000800)\n");
   fprintf (header_file, "#define bif_quad_bit\t\t(0x00001000)\n");
   fprintf (header_file, "#define bif_pair_bit\t\t(0x00002000)\n");
-  fprintf (header_file, "#define bif_no32bit_bit\t\t(0x00004000)\n");
-  fprintf (header_file, "#define bif_32bit_bit\t\t(0x00008000)\n");
-  fprintf (header_file, "#define bif_cpu_bit\t\t(0x00010000)\n");
-  fprintf (header_file, "#define bif_ldstmask_bit\t(0x00020000)\n");
-  fprintf (header_file, "#define bif_lxvrse_bit\t\t(0x00040000)\n");
-  fprintf (header_file, "#define bif_lxvrze_bit\t\t(0x00080000)\n");
-  fprintf (header_file, "#define bif_endian_bit\t\t(0x00100000)\n");
+  fprintf (header_file, "#define bif_mmaint_bit\t\t(0x00004000)\n");
+  fprintf (header_file, "#define bif_no32bit_bit\t\t(0x00008000)\n");
+  fprintf (header_file, "#define bif_32bit_bit\t\t(0x00010000)\n");
+  fprintf (header_file, "#define bif_cpu_bit\t\t(0x00020000)\n");
+  fprintf (header_file, "#define bif_ldstmask_bit\t(0x00040000)\n");
+  fprintf (header_file, "#define bif_lxvrse_bit\t\t(0x00080000)\n");
+  fprintf (header_file, "#define bif_lxvrze_bit\t\t(0x00100000)\n");
+  fprintf (header_file, "#define bif_endian_bit\t\t(0x00200000)\n");
   fprintf (header_file, "\n");
   fprintf (header_file,
 	   "#define bif_is_init(x)\t\t((x).bifattrs & bif_init_bit)\n");
@@ -2259,6 +2264,8 @@ write_decls (void)
 	   "#define bif_is_quad(x)\t\t((x).bifattrs & bif_quad_bit)\n");
   fprintf (header_file,
 	   "#define bif_is_pair(x)\t\t((x).bifattrs & bif_pair_bit)\n");
+  fprintf (header_file,
+	   "#define bif_is_mmaint(x)\t\t((x).bifattrs & bif_mmaint_bit)\n");
   fprintf (header_file,
 	   "#define bif_is_no32bit(x)\t((x).bifattrs & bif_no32bit_bit)\n");
   fprintf (header_file,
@@ -2491,6 +2498,8 @@ write_bif_static_init (void)
 	fprintf (init_file, " | bif_quad_bit");
       if (bifp->attrs.ispair)
 	fprintf (init_file, " | bif_pair_bit");
+      if (bifp->attrs.ismmaint)
+	fprintf (init_file, " | bif_mmaint_bit");
       if (bifp->attrs.isno32bit)
 	fprintf (init_file, " | bif_no32bit_bit");
       if (bifp->attrs.is32bit)
@@ -2537,10 +2546,9 @@ write_bif_static_init (void)
 		: (bifp->kind == FNK_PURE ? "= pure"
 		   : (bifp->kind == FNK_FPMATH ? "= fp, const"
 		      : ""))));
-      bool no_icode = !strcmp (bifp->patname, "nothing");
       fprintf (init_file, "      /* assoc_bif */\tRS6000_BIF_%s%s\n",
-	       bifp->attrs.ismma && no_icode ? bifp->idname : "NONE",
-	       bifp->attrs.ismma && no_icode ? "_INTERNAL" : "");
+	       bifp->attrs.ismmaint ? bifp->idname : "NONE",
+	       bifp->attrs.ismmaint ? "_INTERNAL" : "");
       fprintf (init_file, "    },\n");
     }
   fprintf (init_file, "  };\n\n");
-- 
2.27.0


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 05/18] rs6000: Support for vectorizing built-in functions
  2021-09-01 16:13 [PATCHv5 00/18] Replace the Power target-specific builtin machinery Bill Schmidt
                   ` (3 preceding siblings ...)
  2021-09-01 16:13 ` [PATCH 04/18] rs6000: Handle some recent MMA builtin changes Bill Schmidt
@ 2021-09-01 16:13 ` Bill Schmidt
  2021-09-13 19:29   ` will schmidt
  2021-09-17 12:17   ` Segher Boessenkool
  2021-09-01 16:13 ` [PATCH 06/18] rs6000: Builtin expansion, part 1 Bill Schmidt
                   ` (13 subsequent siblings)
  18 siblings, 2 replies; 52+ messages in thread
From: Bill Schmidt @ 2021-09-01 16:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc

This patch just duplicates a couple of functions and adjusts them to use the
new builtin names.  There's no logical change otherwise.

2021-08-31  Bill Schmidt  <wschmidt@linux.ibm.com>

gcc/
	* config/rs6000/rs6000.c (rs6000-builtins.h): New include.
	(rs6000_new_builtin_vectorized_function): New function.
	(rs6000_new_builtin_md_vectorized_function): Likewise.
	(rs6000_builtin_vectorized_function): Call
	rs6000_new_builtin_vectorized_function.
	(rs6000_builtin_md_vectorized_function): Call
	rs6000_new_builtin_md_vectorized_function.
---
 gcc/config/rs6000/rs6000.c | 253 +++++++++++++++++++++++++++++++++++++
 1 file changed, 253 insertions(+)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index b7ea1483da5..52c78c7500c 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -78,6 +78,7 @@
 #include "case-cfn-macros.h"
 #include "ppc-auxv.h"
 #include "rs6000-internal.h"
+#include "rs6000-builtins.h"
 #include "opts.h"
 
 /* This file should be included last.  */
@@ -5501,6 +5502,251 @@ rs6000_loop_unroll_adjust (unsigned nunroll, struct loop *loop)
   return nunroll;
 }
 
+/* Returns a function decl for a vectorized version of the builtin function
+   with builtin function code FN and the result vector type TYPE, or NULL_TREE
+   if it is not available.  */
+
+static tree
+rs6000_new_builtin_vectorized_function (unsigned int fn, tree type_out,
+					tree type_in)
+{
+  machine_mode in_mode, out_mode;
+  int in_n, out_n;
+
+  if (TARGET_DEBUG_BUILTIN)
+    fprintf (stderr, "rs6000_new_builtin_vectorized_function (%s, %s, %s)\n",
+	     combined_fn_name (combined_fn (fn)),
+	     GET_MODE_NAME (TYPE_MODE (type_out)),
+	     GET_MODE_NAME (TYPE_MODE (type_in)));
+
+  if (TREE_CODE (type_out) != VECTOR_TYPE
+      || TREE_CODE (type_in) != VECTOR_TYPE)
+    return NULL_TREE;
+
+  out_mode = TYPE_MODE (TREE_TYPE (type_out));
+  out_n = TYPE_VECTOR_SUBPARTS (type_out);
+  in_mode = TYPE_MODE (TREE_TYPE (type_in));
+  in_n = TYPE_VECTOR_SUBPARTS (type_in);
+
+  switch (fn)
+    {
+    CASE_CFN_COPYSIGN:
+      if (VECTOR_UNIT_VSX_P (V2DFmode)
+	  && out_mode == DFmode && out_n == 2
+	  && in_mode == DFmode && in_n == 2)
+	return rs6000_builtin_decls_x[RS6000_BIF_CPSGNDP];
+      if (VECTOR_UNIT_VSX_P (V4SFmode)
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
+	return rs6000_builtin_decls_x[RS6000_BIF_CPSGNSP];
+      if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
+	return rs6000_builtin_decls_x[RS6000_BIF_COPYSIGN_V4SF];
+      break;
+    CASE_CFN_CEIL:
+      if (VECTOR_UNIT_VSX_P (V2DFmode)
+	  && out_mode == DFmode && out_n == 2
+	  && in_mode == DFmode && in_n == 2)
+	return rs6000_builtin_decls_x[RS6000_BIF_XVRDPIP];
+      if (VECTOR_UNIT_VSX_P (V4SFmode)
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
+	return rs6000_builtin_decls_x[RS6000_BIF_XVRSPIP];
+      if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
+	return rs6000_builtin_decls_x[RS6000_BIF_VRFIP];
+      break;
+    CASE_CFN_FLOOR:
+      if (VECTOR_UNIT_VSX_P (V2DFmode)
+	  && out_mode == DFmode && out_n == 2
+	  && in_mode == DFmode && in_n == 2)
+	return rs6000_builtin_decls_x[RS6000_BIF_XVRDPIM];
+      if (VECTOR_UNIT_VSX_P (V4SFmode)
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
+	return rs6000_builtin_decls_x[RS6000_BIF_XVRSPIM];
+      if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
+	return rs6000_builtin_decls_x[RS6000_BIF_VRFIM];
+      break;
+    CASE_CFN_FMA:
+      if (VECTOR_UNIT_VSX_P (V2DFmode)
+	  && out_mode == DFmode && out_n == 2
+	  && in_mode == DFmode && in_n == 2)
+	return rs6000_builtin_decls_x[RS6000_BIF_XVMADDDP];
+      if (VECTOR_UNIT_VSX_P (V4SFmode)
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
+	return rs6000_builtin_decls_x[RS6000_BIF_XVMADDSP];
+      if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
+	return rs6000_builtin_decls_x[RS6000_BIF_VMADDFP];
+      break;
+    CASE_CFN_TRUNC:
+      if (VECTOR_UNIT_VSX_P (V2DFmode)
+	  && out_mode == DFmode && out_n == 2
+	  && in_mode == DFmode && in_n == 2)
+	return rs6000_builtin_decls_x[RS6000_BIF_XVRDPIZ];
+      if (VECTOR_UNIT_VSX_P (V4SFmode)
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
+	return rs6000_builtin_decls_x[RS6000_BIF_XVRSPIZ];
+      if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
+	return rs6000_builtin_decls_x[RS6000_BIF_VRFIZ];
+      break;
+    CASE_CFN_NEARBYINT:
+      if (VECTOR_UNIT_VSX_P (V2DFmode)
+	  && flag_unsafe_math_optimizations
+	  && out_mode == DFmode && out_n == 2
+	  && in_mode == DFmode && in_n == 2)
+	return rs6000_builtin_decls_x[RS6000_BIF_XVRDPI];
+      if (VECTOR_UNIT_VSX_P (V4SFmode)
+	  && flag_unsafe_math_optimizations
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
+	return rs6000_builtin_decls_x[RS6000_BIF_XVRSPI];
+      break;
+    CASE_CFN_RINT:
+      if (VECTOR_UNIT_VSX_P (V2DFmode)
+	  && !flag_trapping_math
+	  && out_mode == DFmode && out_n == 2
+	  && in_mode == DFmode && in_n == 2)
+	return rs6000_builtin_decls_x[RS6000_BIF_XVRDPIC];
+      if (VECTOR_UNIT_VSX_P (V4SFmode)
+	  && !flag_trapping_math
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
+	return rs6000_builtin_decls_x[RS6000_BIF_XVRSPIC];
+      break;
+    default:
+      break;
+    }
+
+  /* Generate calls to libmass if appropriate.  */
+  if (rs6000_veclib_handler)
+    return rs6000_veclib_handler (combined_fn (fn), type_out, type_in);
+
+  return NULL_TREE;
+}
+
+/* Implement TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION.  */
+
+static tree
+rs6000_new_builtin_md_vectorized_function (tree fndecl, tree type_out,
+					   tree type_in)
+{
+  machine_mode in_mode, out_mode;
+  int in_n, out_n;
+
+  if (TARGET_DEBUG_BUILTIN)
+    fprintf (stderr,
+	     "rs6000_new_builtin_md_vectorized_function (%s, %s, %s)\n",
+	     IDENTIFIER_POINTER (DECL_NAME (fndecl)),
+	     GET_MODE_NAME (TYPE_MODE (type_out)),
+	     GET_MODE_NAME (TYPE_MODE (type_in)));
+
+  if (TREE_CODE (type_out) != VECTOR_TYPE
+      || TREE_CODE (type_in) != VECTOR_TYPE)
+    return NULL_TREE;
+
+  out_mode = TYPE_MODE (TREE_TYPE (type_out));
+  out_n = TYPE_VECTOR_SUBPARTS (type_out);
+  in_mode = TYPE_MODE (TREE_TYPE (type_in));
+  in_n = TYPE_VECTOR_SUBPARTS (type_in);
+
+  enum rs6000_gen_builtins fn
+    = (enum rs6000_gen_builtins) DECL_MD_FUNCTION_CODE (fndecl);
+  switch (fn)
+    {
+    case RS6000_BIF_RSQRTF:
+      if (VECTOR_UNIT_ALTIVEC_OR_VSX_P (V4SFmode)
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
+	return rs6000_builtin_decls_x[RS6000_BIF_VRSQRTFP];
+      break;
+    case RS6000_BIF_RSQRT:
+      if (VECTOR_UNIT_VSX_P (V2DFmode)
+	  && out_mode == DFmode && out_n == 2
+	  && in_mode == DFmode && in_n == 2)
+	return rs6000_builtin_decls_x[RS6000_BIF_RSQRT_2DF];
+      break;
+    case RS6000_BIF_RECIPF:
+      if (VECTOR_UNIT_ALTIVEC_OR_VSX_P (V4SFmode)
+	  && out_mode == SFmode && out_n == 4
+	  && in_mode == SFmode && in_n == 4)
+	return rs6000_builtin_decls_x[RS6000_BIF_VRECIPFP];
+      break;
+    case RS6000_BIF_RECIP:
+      if (VECTOR_UNIT_VSX_P (V2DFmode)
+	  && out_mode == DFmode && out_n == 2
+	  && in_mode == DFmode && in_n == 2)
+	return rs6000_builtin_decls_x[RS6000_BIF_RECIP_V2DF];
+      break;
+    default:
+      break;
+    }
+
+  machine_mode in_vmode = TYPE_MODE (type_in);
+  machine_mode out_vmode = TYPE_MODE (type_out);
+
+  /* Power10 supported vectorized built-in functions.  */
+  if (TARGET_POWER10
+      && in_vmode == out_vmode
+      && VECTOR_UNIT_ALTIVEC_OR_VSX_P (in_vmode))
+    {
+      machine_mode exp_mode = DImode;
+      machine_mode exp_vmode = V2DImode;
+      enum rs6000_gen_builtins bif;
+      switch (fn)
+	{
+	case RS6000_BIF_DIVWE:
+	case RS6000_BIF_DIVWEU:
+	  exp_mode = SImode;
+	  exp_vmode = V4SImode;
+	  if (fn == RS6000_BIF_DIVWE)
+	    bif = RS6000_BIF_VDIVESW;
+	  else
+	    bif = RS6000_BIF_VDIVEUW;
+	  break;
+	case RS6000_BIF_DIVDE:
+	case RS6000_BIF_DIVDEU:
+	  if (fn == RS6000_BIF_DIVDE)
+	    bif = RS6000_BIF_VDIVESD;
+	  else
+	    bif = RS6000_BIF_VDIVEUD;
+	  break;
+	case RS6000_BIF_CFUGED:
+	  bif = RS6000_BIF_VCFUGED;
+	  break;
+	case RS6000_BIF_CNTLZDM:
+	  bif = RS6000_BIF_VCLZDM;
+	  break;
+	case RS6000_BIF_CNTTZDM:
+	  bif = RS6000_BIF_VCTZDM;
+	  break;
+	case RS6000_BIF_PDEPD:
+	  bif = RS6000_BIF_VPDEPD;
+	  break;
+	case RS6000_BIF_PEXTD:
+	  bif = RS6000_BIF_VPEXTD;
+	  break;
+	default:
+	  return NULL_TREE;
+	}
+
+      if (in_mode == exp_mode && in_vmode == exp_vmode)
+	return rs6000_builtin_decls_x[bif];
+    }
+
+  return NULL_TREE;
+}
+
 /* Handler for the Mathematical Acceleration Subsystem (mass) interface to a
    library with vectorized intrinsics.  */
 
@@ -5620,6 +5866,9 @@ rs6000_builtin_vectorized_function (unsigned int fn, tree type_out,
   machine_mode in_mode, out_mode;
   int in_n, out_n;
 
+  if (new_builtins_are_live)
+    return rs6000_new_builtin_vectorized_function (fn, type_out, type_in);
+
   if (TARGET_DEBUG_BUILTIN)
     fprintf (stderr, "rs6000_builtin_vectorized_function (%s, %s, %s)\n",
 	     combined_fn_name (combined_fn (fn)),
@@ -5751,6 +6000,10 @@ rs6000_builtin_md_vectorized_function (tree fndecl, tree type_out,
   machine_mode in_mode, out_mode;
   int in_n, out_n;
 
+  if (new_builtins_are_live)
+    return rs6000_new_builtin_md_vectorized_function (fndecl, type_out,
+						      type_in);
+
   if (TARGET_DEBUG_BUILTIN)
     fprintf (stderr, "rs6000_builtin_md_vectorized_function (%s, %s, %s)\n",
 	     IDENTIFIER_POINTER (DECL_NAME (fndecl)),
-- 
2.27.0


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 06/18] rs6000: Builtin expansion, part 1
  2021-09-01 16:13 [PATCHv5 00/18] Replace the Power target-specific builtin machinery Bill Schmidt
                   ` (4 preceding siblings ...)
  2021-09-01 16:13 ` [PATCH 05/18] rs6000: Support for vectorizing built-in functions Bill Schmidt
@ 2021-09-01 16:13 ` Bill Schmidt
  2021-10-31  3:24   ` Segher Boessenkool
  2021-09-01 16:13 ` [PATCH 07/18] rs6000: Builtin expansion, part 2 Bill Schmidt
                   ` (12 subsequent siblings)
  18 siblings, 1 reply; 52+ messages in thread
From: Bill Schmidt @ 2021-09-01 16:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc

This patch and the subsequent five patches form the meat of the improvements
for this patch series.  We develop a replacement for rs6000_expand_builtin
and its supporting functions, which are inefficient and difficult to
maintain.  This patch implements rs6000_expand_new_builtin, and creates
stubs for the support functions that subsequent patches will fill out.

Differences between the old and new support in this patch include:
 - Make use of the new builtin data structures, directly looking up
   a function's information rather than searching for the function
   multiple times;
 - Test for enablement of builtins at expand time, to support #pragma
   target changes within a compilation unit;
 - Use the builtin function attributes (e.g., bif_is_cpu) to control
   special handling;
 - Refactor common code into one place; and
 - Provide common error handling in one place for operands that are
   restricted to specific values or ranges.

Note that these six patches must be pushed together, because otherwise
unused parameter warnings in the stub functions will prevent bootstrap.
If preferred, I can flag them unused to remove this restriction.

2021-08-31  Bill Schmidt  <wschmidt@linux.ibm.com>

gcc/
	* config/rs6000/rs6000-call.c (rs6000_expand_new_builtin): New
	forward decl.
	(rs6000_invalid_new_builtin): New stub function.
	(rs6000_expand_builtin): Call rs6000_expand_new_builtin.
	(rs6000_expand_ldst_mask): New stub function.
	(new_cpu_expand_builtin): Likewise.
	(elemrev_icode): Likewise.
	(ldv_expand_builtin): Likewise.
	(lxvrse_expand_builtin): Likewise.
	(lxvrze_expand_builtin): Likewise.
	(stv_expand_builtin): Likewise.
	(new_mma_expand_builtin): Likewise.
	(new_htm_expand_builtin): Likewise.
	(rs6000_expand_new_builtin): New function.
---
 gcc/config/rs6000/rs6000-call.c | 432 ++++++++++++++++++++++++++++++++
 1 file changed, 432 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 558f06cfd6c..583efc9e98e 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -190,6 +190,7 @@ static tree builtin_function_type (machine_mode, machine_mode,
 static void rs6000_common_init_builtins (void);
 static void htm_init_builtins (void);
 static void mma_init_builtins (void);
+static rtx rs6000_expand_new_builtin (tree, rtx, rtx, machine_mode, int);
 static bool rs6000_gimple_fold_new_builtin (gimple_stmt_iterator *gsi);
 
 
@@ -11664,6 +11665,14 @@ rs6000_invalid_builtin (enum rs6000_builtins fncode)
     error ("%qs is not supported with the current options", name);
 }
 
+/* Raise an error message for a builtin function that is called without the
+   appropriate target options being set.  */
+
+static void
+rs6000_invalid_new_builtin (enum rs6000_gen_builtins fncode)
+{
+}
+
 /* Target hook for early folding of built-ins, shamelessly stolen
    from ia64.c.  */
 
@@ -14234,6 +14243,9 @@ rs6000_expand_builtin (tree exp, rtx target, rtx subtarget ATTRIBUTE_UNUSED,
 		       machine_mode mode ATTRIBUTE_UNUSED,
 		       int ignore ATTRIBUTE_UNUSED)
 {
+  if (new_builtins_are_live)
+    return rs6000_expand_new_builtin (exp, target, subtarget, mode, ignore);
+
   tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
   enum rs6000_builtins fcode
     = (enum rs6000_builtins) DECL_MD_FUNCTION_CODE (fndecl);
@@ -14526,6 +14538,426 @@ rs6000_expand_builtin (tree exp, rtx target, rtx subtarget ATTRIBUTE_UNUSED,
   gcc_unreachable ();
 }
 
+/* Expand ALTIVEC_BUILTIN_MASK_FOR_LOAD.  */
+rtx
+rs6000_expand_ldst_mask (rtx target, tree arg0)
+ {
+  return target;
+ }
+
+/* Expand the CPU builtin in FCODE and store the result in TARGET.  */
+static rtx
+new_cpu_expand_builtin (enum rs6000_gen_builtins fcode,
+			tree exp ATTRIBUTE_UNUSED, rtx target)
+{
+  return target;
+}
+
+static insn_code
+elemrev_icode (rs6000_gen_builtins fcode)
+{
+  return (insn_code) 0;
+}
+
+static rtx
+ldv_expand_builtin (rtx target, insn_code icode, rtx *op, machine_mode tmode)
+{
+  return target;
+}
+
+static rtx
+lxvrse_expand_builtin (rtx target, insn_code icode, rtx *op,
+		       machine_mode tmode, machine_mode smode)
+{
+  return target;
+}
+
+static rtx
+lxvrze_expand_builtin (rtx target, insn_code icode, rtx *op,
+		       machine_mode tmode, machine_mode smode)
+{
+  return target;
+}
+
+static rtx
+stv_expand_builtin (insn_code icode, rtx *op,
+		    machine_mode tmode, machine_mode smode)
+{
+  return NULL_RTX;
+}
+
+/* Expand the MMA built-in in EXP.  */
+static rtx
+new_mma_expand_builtin (tree exp, rtx target, insn_code icode,
+			rs6000_gen_builtins fcode)
+{
+  return target;
+}
+
+/* Expand the HTM builtin in EXP and store the result in TARGET.  */
+static rtx
+new_htm_expand_builtin (bifdata *bifaddr, rs6000_gen_builtins fcode,
+			tree exp, rtx target)
+{
+  return const0_rtx;
+}
+
+/* Expand an expression EXP that calls a built-in function,
+   with result going to TARGET if that's convenient
+   (and in mode MODE if that's convenient).
+   SUBTARGET may be used as the target for computing one of EXP's operands.
+   IGNORE is nonzero if the value is to be ignored.
+   Use the new builtin infrastructure.  */
+static rtx
+rs6000_expand_new_builtin (tree exp, rtx target,
+			   rtx subtarget ATTRIBUTE_UNUSED,
+			   machine_mode ignore_mode ATTRIBUTE_UNUSED,
+			   int ignore ATTRIBUTE_UNUSED)
+{
+  tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
+  enum rs6000_gen_builtins fcode
+    = (enum rs6000_gen_builtins) DECL_MD_FUNCTION_CODE (fndecl);
+  size_t uns_fcode = (size_t)fcode;
+  enum insn_code icode = rs6000_builtin_info_x[uns_fcode].icode;
+
+  /* We have two different modes (KFmode, TFmode) that are the IEEE
+     128-bit floating point type, depending on whether long double is the
+     IBM extended double (KFmode) or long double is IEEE 128-bit (TFmode).
+     It is simpler if we only define one variant of the built-in function,
+     and switch the code when defining it, rather than defining two built-
+     ins and using the overload table in rs6000-c.c to switch between the
+     two.  If we don't have the proper assembler, don't do this switch
+     because CODE_FOR_*kf* and CODE_FOR_*tf* will be CODE_FOR_nothing.  */
+  if (FLOAT128_IEEE_P (TFmode))
+    switch (icode)
+      {
+      default:
+	break;
+      case CODE_FOR_sqrtkf2_odd:
+	icode = CODE_FOR_sqrttf2_odd;
+	break;
+      case CODE_FOR_trunckfdf2_odd:
+	icode = CODE_FOR_trunctfdf2_odd;
+	break;
+      case CODE_FOR_addkf3_odd:
+	icode = CODE_FOR_addtf3_odd;
+	break;
+      case CODE_FOR_subkf3_odd:
+	icode = CODE_FOR_subtf3_odd;
+	break;
+      case CODE_FOR_mulkf3_odd:
+	icode = CODE_FOR_multf3_odd;
+	break;
+      case CODE_FOR_divkf3_odd:
+	icode = CODE_FOR_divtf3_odd;
+	break;
+      case CODE_FOR_fmakf4_odd:
+	icode = CODE_FOR_fmatf4_odd;
+	break;
+      case CODE_FOR_xsxexpqp_kf:
+	icode = CODE_FOR_xsxexpqp_tf;
+	break;
+      case CODE_FOR_xsxsigqp_kf:
+	icode = CODE_FOR_xsxsigqp_tf;
+	break;
+      case CODE_FOR_xststdcnegqp_kf:
+	icode = CODE_FOR_xststdcnegqp_tf;
+	break;
+      case CODE_FOR_xsiexpqp_kf:
+	icode = CODE_FOR_xsiexpqp_tf;
+	break;
+      case CODE_FOR_xsiexpqpf_kf:
+	icode = CODE_FOR_xsiexpqpf_tf;
+	break;
+      case CODE_FOR_xststdcqp_kf:
+	icode = CODE_FOR_xststdcqp_tf;
+	break;
+      case CODE_FOR_xscmpexpqp_eq_kf:
+	icode = CODE_FOR_xscmpexpqp_eq_tf;
+	break;
+      case CODE_FOR_xscmpexpqp_lt_kf:
+	icode = CODE_FOR_xscmpexpqp_lt_tf;
+	break;
+      case CODE_FOR_xscmpexpqp_gt_kf:
+	icode = CODE_FOR_xscmpexpqp_gt_tf;
+	break;
+      case CODE_FOR_xscmpexpqp_unordered_kf:
+	icode = CODE_FOR_xscmpexpqp_unordered_tf;
+	break;
+      }
+
+  /* In case of "#pragma target" changes, we initialize all builtins
+     but check for actual availability now, during expand time.  For
+     invalid builtins, generate a normal call.  */
+  bifdata *bifaddr = &rs6000_builtin_info_x[uns_fcode];
+  bif_enable e = bifaddr->enable;
+
+  if (e != ENB_ALWAYS
+      && (e != ENB_P5       || !TARGET_POPCNTB)
+      && (e != ENB_P6       || !TARGET_CMPB)
+      && (e != ENB_ALTIVEC  || !TARGET_ALTIVEC)
+      && (e != ENB_CELL     || !TARGET_ALTIVEC || rs6000_cpu != PROCESSOR_CELL)
+      && (e != ENB_VSX      || !TARGET_VSX)
+      && (e != ENB_P7       || !TARGET_POPCNTD)
+      && (e != ENB_P7_64    || !TARGET_POPCNTD || !TARGET_POWERPC64)
+      && (e != ENB_P8       || !TARGET_DIRECT_MOVE)
+      && (e != ENB_P8V      || !TARGET_P8_VECTOR)
+      && (e != ENB_P9       || !TARGET_MODULO)
+      && (e != ENB_P9_64    || !TARGET_MODULO || !TARGET_POWERPC64)
+      && (e != ENB_P9V      || !TARGET_P9_VECTOR)
+      && (e != ENB_IEEE128_HW || !TARGET_FLOAT128_HW)
+      && (e != ENB_DFP      || !TARGET_DFP)
+      && (e != ENB_CRYPTO   || !TARGET_CRYPTO)
+      && (e != ENB_HTM      || !TARGET_HTM)
+      && (e != ENB_P10      || !TARGET_POWER10)
+      && (e != ENB_P10_64   || !TARGET_POWER10 || !TARGET_POWERPC64)
+      && (e != ENB_MMA      || !TARGET_MMA))
+    {
+      rs6000_invalid_new_builtin (fcode);
+      return expand_call (exp, target, ignore);
+    }
+
+  if (bif_is_nosoft (*bifaddr)
+      && rs6000_isa_flags & OPTION_MASK_SOFT_FLOAT)
+    {
+      error ("%<%s%> not supported with %<-msoft-float%>",
+	     bifaddr->bifname);
+      return const0_rtx;
+    }
+
+  if (bif_is_no32bit (*bifaddr) && TARGET_32BIT)
+    fatal_error (input_location,
+		 "%<%s%> is not supported in 32-bit mode",
+		 bifaddr->bifname);
+
+  if (bif_is_cpu (*bifaddr))
+    return new_cpu_expand_builtin (fcode, exp, target);
+
+  if (bif_is_init (*bifaddr))
+    return altivec_expand_vec_init_builtin (TREE_TYPE (exp), exp, target);
+
+  if (bif_is_set (*bifaddr))
+    return altivec_expand_vec_set_builtin (exp);
+
+  if (bif_is_extract (*bifaddr))
+    return altivec_expand_vec_ext_builtin (exp, target);
+
+  if (bif_is_predicate (*bifaddr))
+    return altivec_expand_predicate_builtin (icode, exp, target);
+
+  if (bif_is_htm (*bifaddr))
+    return new_htm_expand_builtin (bifaddr, fcode, exp, target);
+
+  rtx pat;
+  const int MAX_BUILTIN_ARGS = 6;
+  tree arg[MAX_BUILTIN_ARGS];
+  rtx op[MAX_BUILTIN_ARGS];
+  machine_mode mode[MAX_BUILTIN_ARGS + 1];
+  bool void_func = TREE_TYPE (TREE_TYPE (fndecl)) == void_type_node;
+  int k;
+
+  int nargs = bifaddr->nargs;
+  gcc_assert (nargs <= MAX_BUILTIN_ARGS);
+
+  if (void_func)
+    k = 0;
+  else
+    {
+      k = 1;
+      mode[0] = insn_data[icode].operand[0].mode;
+    }
+
+  for (int i = 0; i < nargs; i++)
+    {
+      arg[i] = CALL_EXPR_ARG (exp, i);
+      if (arg[i] == error_mark_node)
+	return const0_rtx;
+      STRIP_NOPS (arg[i]);
+      op[i] = expand_normal (arg[i]);
+      /* We have a couple of pesky patterns that don't specify the mode...  */
+      if (!insn_data[icode].operand[i+k].mode)
+	mode[i+k] = TARGET_64BIT ? Pmode : SImode;
+      else
+	mode[i+k] = insn_data[icode].operand[i+k].mode;
+    }
+
+  /* Check for restricted constant arguments.  */
+  for (int i = 0; i < 2; i++)
+    {
+      switch (bifaddr->restr[i])
+	{
+	default:
+	case RES_NONE:
+	  break;
+	case RES_BITS:
+	  {
+	    size_t mask = (1 << bifaddr->restr_val1[i]) - 1;
+	    tree restr_arg = arg[bifaddr->restr_opnd[i] - 1];
+	    STRIP_NOPS (restr_arg);
+	    if (TREE_CODE (restr_arg) != INTEGER_CST
+		|| TREE_INT_CST_LOW (restr_arg) & ~mask)
+	      {
+		error ("argument %d must be a %d-bit unsigned literal",
+		       bifaddr->restr_opnd[i], bifaddr->restr_val1[i]);
+		return CONST0_RTX (mode[0]);
+	      }
+	    break;
+	  }
+	case RES_RANGE:
+	  {
+	    tree restr_arg = arg[bifaddr->restr_opnd[i] - 1];
+	    STRIP_NOPS (restr_arg);
+	    if (TREE_CODE (restr_arg) != INTEGER_CST
+		|| !IN_RANGE (tree_to_shwi (restr_arg),
+			      bifaddr->restr_val1[i],
+			      bifaddr->restr_val2[i]))
+	      {
+		error ("argument %d must be a literal between %d and %d,"
+		       " inclusive",
+		       bifaddr->restr_opnd[i], bifaddr->restr_val1[i],
+		       bifaddr->restr_val2[i]);
+		return CONST0_RTX (mode[0]);
+	      }
+	    break;
+	  }
+	case RES_VAR_RANGE:
+	  {
+	    tree restr_arg = arg[bifaddr->restr_opnd[i] - 1];
+	    STRIP_NOPS (restr_arg);
+	    if (TREE_CODE (restr_arg) == INTEGER_CST
+		&& !IN_RANGE (tree_to_shwi (restr_arg),
+			      bifaddr->restr_val1[i],
+			      bifaddr->restr_val2[i]))
+	      {
+		error ("argument %d must be a variable or a literal "
+		       "between %d and %d, inclusive",
+		       bifaddr->restr_opnd[i], bifaddr->restr_val1[i],
+		       bifaddr->restr_val2[i]);
+		return CONST0_RTX (mode[0]);
+	      }
+	    break;
+	  }
+	case RES_VALUES:
+	  {
+	    tree restr_arg = arg[bifaddr->restr_opnd[i] - 1];
+	    STRIP_NOPS (restr_arg);
+	    if (TREE_CODE (restr_arg) != INTEGER_CST
+		|| (tree_to_shwi (restr_arg) != bifaddr->restr_val1[i]
+		    && tree_to_shwi (restr_arg) != bifaddr->restr_val2[i]))
+	      {
+		error ("argument %d must be either a literal %d or a "
+		       "literal %d",
+		       bifaddr->restr_opnd[i], bifaddr->restr_val1[i],
+		       bifaddr->restr_val2[i]);
+		return CONST0_RTX (mode[0]);
+	      }
+	    break;
+	  }
+	}
+    }
+
+  if (bif_is_ldstmask (*bifaddr))
+    return rs6000_expand_ldst_mask (target, arg[0]);
+
+  if (bif_is_stvec (*bifaddr))
+    {
+      if (bif_is_reve (*bifaddr))
+	icode = elemrev_icode (fcode);
+      return stv_expand_builtin (icode, op, mode[0], mode[1]);
+    }
+
+  if (bif_is_ldvec (*bifaddr))
+    {
+      if (bif_is_reve (*bifaddr))
+	icode = elemrev_icode (fcode);
+      return ldv_expand_builtin (target, icode, op, mode[0]);
+    }
+
+  if (bif_is_lxvrse (*bifaddr))
+    return lxvrse_expand_builtin (target, icode, op, mode[0], mode[1]);
+
+  if (bif_is_lxvrze (*bifaddr))
+    return lxvrze_expand_builtin (target, icode, op, mode[0], mode[1]);
+
+  if (bif_is_mma (*bifaddr))
+    return new_mma_expand_builtin (exp, target, icode, fcode);
+
+  if (fcode == RS6000_BIF_PACK_IF
+      && TARGET_LONG_DOUBLE_128 && !TARGET_IEEEQUAD)
+    {
+      icode = CODE_FOR_packtf;
+      fcode = RS6000_BIF_PACK_TF;
+      uns_fcode = (size_t)fcode;
+    }
+  else if (fcode == RS6000_BIF_UNPACK_IF
+	   && TARGET_LONG_DOUBLE_128 && !TARGET_IEEEQUAD)
+    {
+      icode = CODE_FOR_unpacktf;
+      fcode = RS6000_BIF_UNPACK_TF;
+      uns_fcode = (size_t)fcode;
+    }
+
+  if (TREE_TYPE (TREE_TYPE (fndecl)) == void_type_node)
+    target = NULL_RTX;
+  else if (target == 0
+	   || GET_MODE (target) != mode[0]
+	   || !insn_data[icode].operand[0].predicate (target, mode[0]))
+    target = gen_reg_rtx (mode[0]);
+
+  for (int i = 0; i < nargs; i++)
+    if (!insn_data[icode].operand[i+k].predicate (op[i], mode[i+k]))
+      op[i] = copy_to_mode_reg (mode[i+k], op[i]);
+
+  switch (nargs)
+    {
+    default:
+      gcc_assert (MAX_BUILTIN_ARGS == 6);
+      gcc_unreachable ();
+    case 0:
+      pat = (void_func
+	     ? GEN_FCN (icode) ()
+	     : GEN_FCN (icode) (target));
+      break;
+    case 1:
+      pat = (void_func
+	     ? GEN_FCN (icode) (op[0])
+	     : GEN_FCN (icode) (target, op[0]));
+      break;
+    case 2:
+      pat = (void_func
+	     ? GEN_FCN (icode) (op[0], op[1])
+	     : GEN_FCN (icode) (target, op[0], op[1]));
+      break;
+    case 3:
+      pat = (void_func
+	     ? GEN_FCN (icode) (op[0], op[1], op[2])
+	     : GEN_FCN (icode) (target, op[0], op[1], op[2]));
+      break;
+    case 4:
+      pat = (void_func
+	     ? GEN_FCN (icode) (op[0], op[1], op[2], op[3])
+	     : GEN_FCN (icode) (target, op[0], op[1], op[2], op[3]));
+      break;
+    case 5:
+      pat = (void_func
+	     ? GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4])
+	     : GEN_FCN (icode) (target, op[0], op[1], op[2], op[3], op[4]));
+      break;
+    case 6:
+      pat = (void_func
+	     ? GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4], op[5])
+	     : GEN_FCN (icode) (target, op[0], op[1],
+				op[2], op[3], op[4], op[5]));
+      break;
+    }
+
+  if (!pat)
+    return 0;
+  emit_insn (pat);
+
+  return target;
+}
+
 /* Create a builtin vector type with a name.  Taking care not to give
    the canonical type a name.  */
 
-- 
2.27.0


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 07/18] rs6000: Builtin expansion, part 2
  2021-09-01 16:13 [PATCHv5 00/18] Replace the Power target-specific builtin machinery Bill Schmidt
                   ` (5 preceding siblings ...)
  2021-09-01 16:13 ` [PATCH 06/18] rs6000: Builtin expansion, part 1 Bill Schmidt
@ 2021-09-01 16:13 ` Bill Schmidt
  2021-11-01 12:18   ` Segher Boessenkool
  2021-09-01 16:13 ` [PATCH 08/18] rs6000: Builtin expansion, part 3 Bill Schmidt
                   ` (11 subsequent siblings)
  18 siblings, 1 reply; 52+ messages in thread
From: Bill Schmidt @ 2021-09-01 16:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc

Implement rs6000_invalid_new_builtin, which issues the appropriate error
message when a builtin is used when it is not enabled.  Also implement
rs6000_expand_ldst_mask, which just factors out the code that handles
ALTIVEC_BUILTIN_MASK_FOR_LOAD in the old rs6000_expand_builtin.  Finally,
ensure the variable altivec_builtin_mask_for_load is initialized.

2021-09-01  Bill Schmidt  <wschmidt@linux.ibm.com>

gcc/
	* config/rs6000/rs6000-call.c (rs6000_invalid_new_builtin):
	Implement.
	(rs6000_expand_ldst_mask): Likewise.
	(rs6000_init_builtins): Initialize altivec_builtin_mask_for_load.
---
 gcc/config/rs6000/rs6000-call.c | 101 +++++++++++++++++++++++++++++++-
 1 file changed, 100 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 583efc9e98e..3e0ab42317b 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -11671,6 +11671,75 @@ rs6000_invalid_builtin (enum rs6000_builtins fncode)
 static void
 rs6000_invalid_new_builtin (enum rs6000_gen_builtins fncode)
 {
+  size_t uns_fncode = (size_t) fncode;
+  const char *name = rs6000_builtin_info_x[uns_fncode].bifname;
+
+  switch (rs6000_builtin_info_x[uns_fncode].enable)
+    {
+    case ENB_P5:
+      error ("%qs requires the %qs option", name, "-mcpu=power5");
+      break;
+    case ENB_P6:
+      error ("%qs requires the %qs option", name, "-mcpu=power6");
+      break;
+    case ENB_ALTIVEC:
+      error ("%qs requires the %qs option", name, "-maltivec");
+      break;
+    case ENB_CELL:
+      error ("%qs is only valid for the cell processor", name);
+      break;
+    case ENB_VSX:
+      error ("%qs requires the %qs option", name, "-mvsx");
+      break;
+    case ENB_P7:
+      error ("%qs requires the %qs option", name, "-mcpu=power7");
+      break;
+    case ENB_P7_64:
+      error ("%qs requires the %qs option and either the %qs or %qs option",
+	     name, "-mcpu=power7", "-m64", "-mpowerpc64");
+      break;
+    case ENB_P8:
+      error ("%qs requires the %qs option", name, "-mcpu=power8");
+      break;
+    case ENB_P8V:
+      error ("%qs requires the %qs option", name, "-mpower8-vector");
+      break;
+    case ENB_P9:
+      error ("%qs requires the %qs option", name, "-mcpu=power9");
+      break;
+    case ENB_P9_64:
+      error ("%qs requires the %qs option and either the %qs or %qs option",
+	     name, "-mcpu=power9", "-m64", "-mpowerpc64");
+      break;
+    case ENB_P9V:
+      error ("%qs requires the %qs option", name, "-mpower9-vector");
+      break;
+    case ENB_IEEE128_HW:
+      error ("%qs requires ISA 3.0 IEEE 128-bit floating point", name);
+      break;
+    case ENB_DFP:
+      error ("%qs requires the %qs option", name, "-mhard-dfp");
+      break;
+    case ENB_CRYPTO:
+      error ("%qs requires the %qs option", name, "-mcrypto");
+      break;
+    case ENB_HTM:
+      error ("%qs requires the %qs option", name, "-mhtm");
+      break;
+    case ENB_P10:
+      error ("%qs requires the %qs option", name, "-mcpu=power10");
+      break;
+    case ENB_P10_64:
+      error ("%qs requires the %qs option and either the %qs or %qs option",
+	     name, "-mcpu=power10", "-m64", "-mpowerpc64");
+      break;
+    case ENB_MMA:
+      error ("%qs requires the %qs option", name, "-mmma");
+      break;
+    default:
+    case ENB_ALWAYS:
+      gcc_unreachable ();
+    };
 }
 
 /* Target hook for early folding of built-ins, shamelessly stolen
@@ -14542,7 +14611,34 @@ rs6000_expand_builtin (tree exp, rtx target, rtx subtarget ATTRIBUTE_UNUSED,
 rtx
 rs6000_expand_ldst_mask (rtx target, tree arg0)
  {
-  return target;
+  int icode2 = BYTES_BIG_ENDIAN
+    ? (int) CODE_FOR_altivec_lvsr_direct
+    : (int) CODE_FOR_altivec_lvsl_direct;
+  machine_mode tmode = insn_data[icode2].operand[0].mode;
+  machine_mode mode = insn_data[icode2].operand[1].mode;
+  rtx op, addr, pat;
+
+  gcc_assert (TARGET_ALTIVEC);
+
+  gcc_assert (POINTER_TYPE_P (TREE_TYPE (arg0)));
+  op = expand_expr (arg0, NULL_RTX, Pmode, EXPAND_NORMAL);
+  addr = memory_address (mode, op);
+  /* We need to negate the address.  */
+  op = gen_reg_rtx (GET_MODE (addr));
+  emit_insn (gen_rtx_SET (op, gen_rtx_NEG (GET_MODE (addr), addr)));
+  op = gen_rtx_MEM (mode, op);
+
+  if (target == 0
+      || GET_MODE (target) != tmode
+      || !insn_data[icode2].operand[0].predicate (target, tmode))
+    target = gen_reg_rtx (tmode);
+
+  pat = GEN_FCN (icode2) (target, op);
+  if (!pat)
+    return 0;
+  emit_insn (pat);
+
+   return target;
  }
 
 /* Expand the CPU builtin in FCODE and store the result in TARGET.  */
@@ -15351,6 +15447,9 @@ rs6000_init_builtins (void)
 
   if (new_builtins_are_live)
     {
+      altivec_builtin_mask_for_load
+	= rs6000_builtin_decls_x[RS6000_BIF_MASK_FOR_LOAD];
+
 #ifdef SUBTARGET_INIT_BUILTINS
       SUBTARGET_INIT_BUILTINS;
 #endif
-- 
2.27.0


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 08/18] rs6000: Builtin expansion, part 3
  2021-09-01 16:13 [PATCHv5 00/18] Replace the Power target-specific builtin machinery Bill Schmidt
                   ` (6 preceding siblings ...)
  2021-09-01 16:13 ` [PATCH 07/18] rs6000: Builtin expansion, part 2 Bill Schmidt
@ 2021-09-01 16:13 ` Bill Schmidt
  2021-11-03  1:15   ` Segher Boessenkool
  2021-09-01 16:13 ` [PATCH 09/18] rs6000: Builtin expansion, part 4 Bill Schmidt
                   ` (10 subsequent siblings)
  18 siblings, 1 reply; 52+ messages in thread
From: Bill Schmidt @ 2021-09-01 16:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc

Implement the replacement for cpu_expand_builtin.  There are no logic
changes here, just changes to use the new built-in function names and
clean up some formatting.

2021-09-01  Bill Schmidt  <wschmidt@linux.ibm.com>

gcc/
	* config/rs6000/rs6000-call.c (new_cpu_expand_builtin):
	Implement.
---
 gcc/config/rs6000/rs6000-call.c | 102 ++++++++++++++++++++++++++++++++
 1 file changed, 102 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 3e0ab42317b..5032e947a8e 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -14646,6 +14646,108 @@ static rtx
 new_cpu_expand_builtin (enum rs6000_gen_builtins fcode,
 			tree exp ATTRIBUTE_UNUSED, rtx target)
 {
+  /* __builtin_cpu_init () is a nop, so expand to nothing.  */
+  if (fcode == RS6000_BIF_CPU_INIT)
+    return const0_rtx;
+
+  if (target == 0 || GET_MODE (target) != SImode)
+    target = gen_reg_rtx (SImode);
+
+#ifdef TARGET_LIBC_PROVIDES_HWCAP_IN_TCB
+  tree arg = TREE_OPERAND (CALL_EXPR_ARG (exp, 0), 0);
+  /* Target clones creates an ARRAY_REF instead of STRING_CST, convert it back
+     to a STRING_CST.  */
+  if (TREE_CODE (arg) == ARRAY_REF
+      && TREE_CODE (TREE_OPERAND (arg, 0)) == STRING_CST
+      && TREE_CODE (TREE_OPERAND (arg, 1)) == INTEGER_CST
+      && compare_tree_int (TREE_OPERAND (arg, 1), 0) == 0)
+    arg = TREE_OPERAND (arg, 0);
+
+  if (TREE_CODE (arg) != STRING_CST)
+    {
+      error ("builtin %qs only accepts a string argument",
+	     rs6000_builtin_info_x[(size_t) fcode].bifname);
+      return const0_rtx;
+    }
+
+  if (fcode == RS6000_BIF_CPU_IS)
+    {
+      const char *cpu = TREE_STRING_POINTER (arg);
+      rtx cpuid = NULL_RTX;
+      for (size_t i = 0; i < ARRAY_SIZE (cpu_is_info); i++)
+	if (strcmp (cpu, cpu_is_info[i].cpu) == 0)
+	  {
+	    /* The CPUID value in the TCB is offset by _DL_FIRST_PLATFORM.  */
+	    cpuid = GEN_INT (cpu_is_info[i].cpuid + _DL_FIRST_PLATFORM);
+	    break;
+	  }
+      if (cpuid == NULL_RTX)
+	{
+	  /* Invalid CPU argument.  */
+	  error ("cpu %qs is an invalid argument to builtin %qs",
+		 cpu, rs6000_builtin_info_x[(size_t) fcode].bifname);
+	  return const0_rtx;
+	}
+
+      rtx platform = gen_reg_rtx (SImode);
+      rtx address = gen_rtx_PLUS (Pmode,
+				  gen_rtx_REG (Pmode, TLS_REGNUM),
+				  GEN_INT (TCB_PLATFORM_OFFSET));
+      rtx tcbmem = gen_const_mem (SImode, address);
+      emit_move_insn (platform, tcbmem);
+      emit_insn (gen_eqsi3 (target, platform, cpuid));
+    }
+  else if (fcode == RS6000_BIF_CPU_SUPPORTS)
+    {
+      const char *hwcap = TREE_STRING_POINTER (arg);
+      rtx mask = NULL_RTX;
+      int hwcap_offset;
+      for (size_t i = 0; i < ARRAY_SIZE (cpu_supports_info); i++)
+	if (strcmp (hwcap, cpu_supports_info[i].hwcap) == 0)
+	  {
+	    mask = GEN_INT (cpu_supports_info[i].mask);
+	    hwcap_offset = TCB_HWCAP_OFFSET (cpu_supports_info[i].id);
+	    break;
+	  }
+      if (mask == NULL_RTX)
+	{
+	  /* Invalid HWCAP argument.  */
+	  error ("%s %qs is an invalid argument to builtin %qs",
+		 "hwcap", hwcap,
+		 rs6000_builtin_info_x[(size_t) fcode].bifname);
+	  return const0_rtx;
+	}
+
+      rtx tcb_hwcap = gen_reg_rtx (SImode);
+      rtx address = gen_rtx_PLUS (Pmode,
+				  gen_rtx_REG (Pmode, TLS_REGNUM),
+				  GEN_INT (hwcap_offset));
+      rtx tcbmem = gen_const_mem (SImode, address);
+      emit_move_insn (tcb_hwcap, tcbmem);
+      rtx scratch1 = gen_reg_rtx (SImode);
+      emit_insn (gen_rtx_SET (scratch1,
+			      gen_rtx_AND (SImode, tcb_hwcap, mask)));
+      rtx scratch2 = gen_reg_rtx (SImode);
+      emit_insn (gen_eqsi3 (scratch2, scratch1, const0_rtx));
+      emit_insn (gen_rtx_SET (target,
+			      gen_rtx_XOR (SImode, scratch2, const1_rtx)));
+    }
+  else
+    gcc_unreachable ();
+
+  /* Record that we have expanded a CPU builtin, so that we can later
+     emit a reference to the special symbol exported by LIBC to ensure we
+     do not link against an old LIBC that doesn't support this feature.  */
+  cpu_builtin_p = true;
+
+#else
+  warning (0, "builtin %qs needs GLIBC (2.23 and newer) that exports hardware "
+	   "capability bits", rs6000_builtin_info_x[(size_t) fcode].bifname);
+
+  /* For old LIBCs, always return FALSE.  */
+  emit_move_insn (target, GEN_INT (0));
+#endif /* TARGET_LIBC_PROVIDES_HWCAP_IN_TCB */
+
   return target;
 }
 
-- 
2.27.0


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 09/18] rs6000: Builtin expansion, part 4
  2021-09-01 16:13 [PATCHv5 00/18] Replace the Power target-specific builtin machinery Bill Schmidt
                   ` (7 preceding siblings ...)
  2021-09-01 16:13 ` [PATCH 08/18] rs6000: Builtin expansion, part 3 Bill Schmidt
@ 2021-09-01 16:13 ` Bill Schmidt
  2021-11-03  1:52   ` Segher Boessenkool
  2021-09-01 16:13 ` [PATCH 10/18] rs6000: Builtin expansion, part 5 Bill Schmidt
                   ` (9 subsequent siblings)
  18 siblings, 1 reply; 52+ messages in thread
From: Bill Schmidt @ 2021-09-01 16:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc

Consolidate into elemrev_icode some logic that is scattered throughout
the old altivec_expand_builtin.  Also replace functions for handling
special load and store built-ins:
= ldv_expand_builtin replaces altivec_expand_lv_builtin
= lxvrse_expand_builtin and lxvrze_expand_builtin replace
  altivec_expand_lxvr_builtin
= stv_expand builtin replaces altivec_expand_stv_builtin

In all cases, there are no logic changes except that some code was
already factored out into rs6000_expand_new_builtin.

2021-09-01  Bill Schmidt  <wschmidt@linux.ibm.com>

gcc/
	* config/rs6000/rs6000-call.c (elemrev_icode): Implement.
	(ldv_expand_builtin): Likewise.
	(lxvrse_expand_builtin): Likewise.
	(lxvrze_expand_builtin): Likewise.
	(stv_expand_builtin): Likewise.
---
 gcc/config/rs6000/rs6000-call.c | 245 ++++++++++++++++++++++++++++++++
 1 file changed, 245 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 5032e947a8e..33153a5657c 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -14754,12 +14754,142 @@ new_cpu_expand_builtin (enum rs6000_gen_builtins fcode,
 static insn_code
 elemrev_icode (rs6000_gen_builtins fcode)
 {
+  switch (fcode)
+    {
+    default:
+      gcc_unreachable ();
+
+    case RS6000_BIF_ST_ELEMREV_V1TI:
+      return BYTES_BIG_ENDIAN
+	? CODE_FOR_vsx_store_v1ti
+	: CODE_FOR_vsx_st_elemrev_v1ti;
+
+    case RS6000_BIF_ST_ELEMREV_V2DF:
+      return BYTES_BIG_ENDIAN
+	? CODE_FOR_vsx_store_v2df
+	: CODE_FOR_vsx_st_elemrev_v2df;
+
+    case RS6000_BIF_ST_ELEMREV_V2DI:
+      return BYTES_BIG_ENDIAN
+	? CODE_FOR_vsx_store_v2di
+	: CODE_FOR_vsx_st_elemrev_v2di;
+
+    case RS6000_BIF_ST_ELEMREV_V4SF:
+      return BYTES_BIG_ENDIAN
+	? CODE_FOR_vsx_store_v4sf
+	: CODE_FOR_vsx_st_elemrev_v4sf;
+
+    case RS6000_BIF_ST_ELEMREV_V4SI:
+      return BYTES_BIG_ENDIAN
+	? CODE_FOR_vsx_store_v4si
+	: CODE_FOR_vsx_st_elemrev_v4si;
+
+    case RS6000_BIF_ST_ELEMREV_V8HI:
+      return BYTES_BIG_ENDIAN
+	? CODE_FOR_vsx_store_v8hi
+	: CODE_FOR_vsx_st_elemrev_v8hi;
+
+    case RS6000_BIF_ST_ELEMREV_V16QI:
+      return BYTES_BIG_ENDIAN
+	? CODE_FOR_vsx_store_v16qi
+	: CODE_FOR_vsx_st_elemrev_v16qi;
+
+    case RS6000_BIF_LD_ELEMREV_V2DF:
+      return BYTES_BIG_ENDIAN
+	? CODE_FOR_vsx_load_v2df
+	: CODE_FOR_vsx_ld_elemrev_v2df;
+
+    case RS6000_BIF_LD_ELEMREV_V1TI:
+      return BYTES_BIG_ENDIAN
+	? CODE_FOR_vsx_load_v1ti
+	: CODE_FOR_vsx_ld_elemrev_v1ti;
+
+    case RS6000_BIF_LD_ELEMREV_V2DI:
+      return BYTES_BIG_ENDIAN
+	? CODE_FOR_vsx_load_v2di
+	: CODE_FOR_vsx_ld_elemrev_v2di;
+
+    case RS6000_BIF_LD_ELEMREV_V4SF:
+      return BYTES_BIG_ENDIAN
+	? CODE_FOR_vsx_load_v4sf
+	: CODE_FOR_vsx_ld_elemrev_v4sf;
+
+    case RS6000_BIF_LD_ELEMREV_V4SI:
+      return BYTES_BIG_ENDIAN
+	? CODE_FOR_vsx_load_v4si
+	: CODE_FOR_vsx_ld_elemrev_v4si;
+
+    case RS6000_BIF_LD_ELEMREV_V8HI:
+      return BYTES_BIG_ENDIAN
+	? CODE_FOR_vsx_load_v8hi
+	: CODE_FOR_vsx_ld_elemrev_v8hi;
+
+    case RS6000_BIF_LD_ELEMREV_V16QI:
+      return BYTES_BIG_ENDIAN
+	? CODE_FOR_vsx_load_v16qi
+	: CODE_FOR_vsx_ld_elemrev_v16qi;
+    }
+  gcc_unreachable ();
   return (insn_code) 0;
 }
 
 static rtx
 ldv_expand_builtin (rtx target, insn_code icode, rtx *op, machine_mode tmode)
 {
+  rtx pat, addr;
+  bool blk = (icode == CODE_FOR_altivec_lvlx
+	      || icode == CODE_FOR_altivec_lvlxl
+	      || icode == CODE_FOR_altivec_lvrx
+	      || icode == CODE_FOR_altivec_lvrxl);
+
+  if (target == 0
+      || GET_MODE (target) != tmode
+      || !insn_data[icode].operand[0].predicate (target, tmode))
+    target = gen_reg_rtx (tmode);
+
+  op[1] = copy_to_mode_reg (Pmode, op[1]);
+
+  /* For LVX, express the RTL accurately by ANDing the address with -16.
+     LVXL and LVE*X expand to use UNSPECs to hide their special behavior,
+     so the raw address is fine.  */
+  if (icode == CODE_FOR_altivec_lvx_v1ti
+      || icode == CODE_FOR_altivec_lvx_v2df
+      || icode == CODE_FOR_altivec_lvx_v2di
+      || icode == CODE_FOR_altivec_lvx_v4sf
+      || icode == CODE_FOR_altivec_lvx_v4si
+      || icode == CODE_FOR_altivec_lvx_v8hi
+      || icode == CODE_FOR_altivec_lvx_v16qi)
+    {
+      rtx rawaddr;
+      if (op[0] == const0_rtx)
+	rawaddr = op[1];
+      else
+	{
+	  op[0] = copy_to_mode_reg (Pmode, op[0]);
+	  rawaddr = gen_rtx_PLUS (Pmode, op[1], op[0]);
+	}
+      addr = gen_rtx_AND (Pmode, rawaddr, gen_rtx_CONST_INT (Pmode, -16));
+      addr = gen_rtx_MEM (blk ? BLKmode : tmode, addr);
+
+      emit_insn (gen_rtx_SET (target, addr));
+    }
+  else
+    {
+      if (op[0] == const0_rtx)
+	addr = gen_rtx_MEM (blk ? BLKmode : tmode, op[1]);
+      else
+	{
+	  op[0] = copy_to_mode_reg (Pmode, op[0]);
+	  addr = gen_rtx_MEM (blk ? BLKmode : tmode,
+			      gen_rtx_PLUS (Pmode, op[1], op[0]));
+	}
+
+      pat = GEN_FCN (icode) (target, addr);
+      if (!pat)
+	return 0;
+      emit_insn (pat);
+    }
+
   return target;
 }
 
@@ -14767,6 +14897,42 @@ static rtx
 lxvrse_expand_builtin (rtx target, insn_code icode, rtx *op,
 		       machine_mode tmode, machine_mode smode)
 {
+  rtx pat, addr;
+  op[1] = copy_to_mode_reg (Pmode, op[1]);
+
+  if (op[0] == const0_rtx)
+    addr = gen_rtx_MEM (tmode, op[1]);
+  else
+    {
+      op[0] = copy_to_mode_reg (Pmode, op[0]);
+      addr = gen_rtx_MEM (smode,
+			  gen_rtx_PLUS (Pmode, op[1], op[0]));
+    }
+
+  rtx discratch = gen_reg_rtx (DImode);
+  rtx tiscratch = gen_reg_rtx (TImode);
+
+  /* Emit the lxvr*x insn.  */
+  pat = GEN_FCN (icode) (tiscratch, addr);
+  if (!pat)
+    return 0;
+  emit_insn (pat);
+
+  /* Emit a sign extension from QI,HI,WI to double (DI).  */
+  rtx scratch = gen_lowpart (smode, tiscratch);
+  if (icode == CODE_FOR_vsx_lxvrbx)
+    emit_insn (gen_extendqidi2 (discratch, scratch));
+  else if (icode == CODE_FOR_vsx_lxvrhx)
+    emit_insn (gen_extendhidi2 (discratch, scratch));
+  else if (icode == CODE_FOR_vsx_lxvrwx)
+    emit_insn (gen_extendsidi2 (discratch, scratch));
+  /*  Assign discratch directly if scratch is already DI.  */
+  if (icode == CODE_FOR_vsx_lxvrdx)
+    discratch = scratch;
+
+  /* Emit the sign extension from DI (double) to TI (quad).  */
+  emit_insn (gen_extendditi2 (target, discratch));
+
   return target;
 }
 
@@ -14774,6 +14940,22 @@ static rtx
 lxvrze_expand_builtin (rtx target, insn_code icode, rtx *op,
 		       machine_mode tmode, machine_mode smode)
 {
+  rtx pat, addr;
+  op[1] = copy_to_mode_reg (Pmode, op[1]);
+
+  if (op[0] == const0_rtx)
+    addr = gen_rtx_MEM (tmode, op[1]);
+  else
+    {
+      op[0] = copy_to_mode_reg (Pmode, op[0]);
+      addr = gen_rtx_MEM (smode,
+			  gen_rtx_PLUS (Pmode, op[1], op[0]));
+    }
+
+  pat = GEN_FCN (icode) (target, addr);
+  if (!pat)
+    return 0;
+  emit_insn (pat);
   return target;
 }
 
@@ -14781,6 +14963,69 @@ static rtx
 stv_expand_builtin (insn_code icode, rtx *op,
 		    machine_mode tmode, machine_mode smode)
 {
+  rtx pat, addr, rawaddr, truncrtx;
+  op[2] = copy_to_mode_reg (Pmode, op[2]);
+
+  /* For STVX, express the RTL accurately by ANDing the address with -16.
+     STVXL and STVE*X expand to use UNSPECs to hide their special behavior,
+     so the raw address is fine.  */
+  if (icode == CODE_FOR_altivec_stvx_v2df
+      || icode == CODE_FOR_altivec_stvx_v2di
+      || icode == CODE_FOR_altivec_stvx_v4sf
+      || icode == CODE_FOR_altivec_stvx_v4si
+      || icode == CODE_FOR_altivec_stvx_v8hi
+      || icode == CODE_FOR_altivec_stvx_v16qi)
+    {
+      if (op[1] == const0_rtx)
+	rawaddr = op[2];
+      else
+	{
+	  op[1] = copy_to_mode_reg (Pmode, op[1]);
+	  rawaddr = gen_rtx_PLUS (Pmode, op[2], op[1]);
+	}
+
+      addr = gen_rtx_AND (Pmode, rawaddr, gen_rtx_CONST_INT (Pmode, -16));
+      addr = gen_rtx_MEM (tmode, addr);
+      op[0] = copy_to_mode_reg (tmode, op[0]);
+      emit_insn (gen_rtx_SET (addr, op[0]));
+    }
+  else if (icode == CODE_FOR_vsx_stxvrbx
+	   || icode == CODE_FOR_vsx_stxvrhx
+	   || icode == CODE_FOR_vsx_stxvrwx
+	   || icode == CODE_FOR_vsx_stxvrdx)
+    {
+      truncrtx = gen_rtx_TRUNCATE (tmode, op[0]);
+      op[0] = copy_to_mode_reg (E_TImode, truncrtx);
+
+      if (op[1] == const0_rtx)
+	addr = gen_rtx_MEM (Pmode, op[2]);
+      else
+	{
+	  op[1] = copy_to_mode_reg (Pmode, op[1]);
+	  addr = gen_rtx_MEM (tmode, gen_rtx_PLUS (Pmode, op[2], op[1]));
+	}
+      pat = GEN_FCN (icode) (addr, op[0]);
+      if (pat)
+	emit_insn (pat);
+    }
+  else
+    {
+      if (!insn_data[icode].operand[1].predicate (op[0], smode))
+	op[0] = copy_to_mode_reg (smode, op[0]);
+
+      if (op[1] == const0_rtx)
+	addr = gen_rtx_MEM (tmode, op[2]);
+      else
+	{
+	  op[1] = copy_to_mode_reg (Pmode, op[1]);
+	  addr = gen_rtx_MEM (tmode, gen_rtx_PLUS (Pmode, op[2], op[1]));
+	}
+
+      pat = GEN_FCN (icode) (addr, op[0]);
+      if (pat)
+	emit_insn (pat);
+    }
+
   return NULL_RTX;
 }
 
-- 
2.27.0


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 10/18] rs6000: Builtin expansion, part 5
  2021-09-01 16:13 [PATCHv5 00/18] Replace the Power target-specific builtin machinery Bill Schmidt
                   ` (8 preceding siblings ...)
  2021-09-01 16:13 ` [PATCH 09/18] rs6000: Builtin expansion, part 4 Bill Schmidt
@ 2021-09-01 16:13 ` Bill Schmidt
  2021-11-04  0:55   ` Segher Boessenkool
  2021-09-01 16:13 ` [PATCH 11/18] rs6000: Builtin expansion, part 6 Bill Schmidt
                   ` (8 subsequent siblings)
  18 siblings, 1 reply; 52+ messages in thread
From: Bill Schmidt @ 2021-09-01 16:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc

Replace mma_expand_builtin.  There are no significant logic changes,
just adjustments to use the new infrastructure and clean up formatting.

2021-09-01  Bill Schmidt  <wschmidt@linux.ibm.com>

gcc/
	* config/rs6000/rs6000-call.c (new_mma_expand_builtin):
	Implement.
---
 gcc/config/rs6000/rs6000-call.c | 103 ++++++++++++++++++++++++++++++++
 1 file changed, 103 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index 33153a5657c..a8956eefd95 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -15034,6 +15034,109 @@ static rtx
 new_mma_expand_builtin (tree exp, rtx target, insn_code icode,
 			rs6000_gen_builtins fcode)
 {
+  tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
+  tree arg;
+  call_expr_arg_iterator iter;
+  const struct insn_operand_data *insn_op;
+  rtx op[MAX_MMA_OPERANDS];
+  unsigned nopnds = 0;
+  bool void_func = TREE_TYPE (TREE_TYPE (fndecl)) == void_type_node;
+  machine_mode tmode = VOIDmode;
+
+  if (!void_func)
+    {
+      tmode = insn_data[icode].operand[0].mode;
+      if (!target
+	  || GET_MODE (target) != tmode
+	  || !insn_data[icode].operand[0].predicate (target, tmode))
+	target = gen_reg_rtx (tmode);
+      op[nopnds++] = target;
+    }
+  else
+    target = const0_rtx;
+
+  FOR_EACH_CALL_EXPR_ARG (arg, iter, exp)
+    {
+      if (arg == error_mark_node)
+	return const0_rtx;
+
+      rtx opnd;
+      insn_op = &insn_data[icode].operand[nopnds];
+      if (TREE_CODE (arg) == ADDR_EXPR
+	  && MEM_P (DECL_RTL (TREE_OPERAND (arg, 0))))
+	opnd = DECL_RTL (TREE_OPERAND (arg, 0));
+      else
+	opnd = expand_normal (arg);
+
+      if (!insn_op->predicate (opnd, insn_op->mode))
+	{
+	  if (!strcmp (insn_op->constraint, "n"))
+	    {
+	      if (!CONST_INT_P (opnd))
+		error ("argument %d must be an unsigned literal", nopnds);
+	      else
+		error ("argument %d is an unsigned literal that is "
+		       "out of range", nopnds);
+	      return const0_rtx;
+	    }
+	  opnd = copy_to_mode_reg (insn_op->mode, opnd);
+	}
+
+      /* Some MMA instructions have INOUT accumulator operands, so force
+	 their target register to be the same as their input register.  */
+      if (!void_func
+	  && nopnds == 1
+	  && !strcmp (insn_op->constraint, "0")
+	  && insn_op->mode == tmode
+	  && REG_P (opnd)
+	  && insn_data[icode].operand[0].predicate (opnd, tmode))
+	target = op[0] = opnd;
+
+      op[nopnds++] = opnd;
+    }
+
+  rtx pat;
+  switch (nopnds)
+    {
+    case 1:
+      pat = GEN_FCN (icode) (op[0]);
+      break;
+    case 2:
+      pat = GEN_FCN (icode) (op[0], op[1]);
+      break;
+    case 3:
+      /* The ASSEMBLE builtin source operands are reversed in little-endian
+	 mode, so reorder them.  */
+      if (fcode == RS6000_BIF_ASSEMBLE_PAIR_V_INTERNAL && !WORDS_BIG_ENDIAN)
+	std::swap (op[1], op[2]);
+      pat = GEN_FCN (icode) (op[0], op[1], op[2]);
+      break;
+    case 4:
+      pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3]);
+      break;
+    case 5:
+      /* The ASSEMBLE builtin source operands are reversed in little-endian
+	 mode, so reorder them.  */
+      if (fcode == RS6000_BIF_ASSEMBLE_ACC_INTERNAL && !WORDS_BIG_ENDIAN)
+	{
+	  std::swap (op[1], op[4]);
+	  std::swap (op[2], op[3]);
+	}
+      pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4]);
+      break;
+    case 6:
+      pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4], op[5]);
+      break;
+    case 7:
+      pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3], op[4], op[5], op[6]);
+      break;
+    default:
+      gcc_unreachable ();
+    }
+  if (!pat)
+    return NULL_RTX;
+  emit_insn (pat);
+
   return target;
 }
 
-- 
2.27.0


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 11/18] rs6000: Builtin expansion, part 6
  2021-09-01 16:13 [PATCHv5 00/18] Replace the Power target-specific builtin machinery Bill Schmidt
                   ` (9 preceding siblings ...)
  2021-09-01 16:13 ` [PATCH 10/18] rs6000: Builtin expansion, part 5 Bill Schmidt
@ 2021-09-01 16:13 ` Bill Schmidt
  2021-11-04  1:24   ` Segher Boessenkool
  2021-09-01 16:13 ` [PATCH 12/18] rs6000: Update rs6000_builtin_decl Bill Schmidt
                   ` (7 subsequent siblings)
  18 siblings, 1 reply; 52+ messages in thread
From: Bill Schmidt @ 2021-09-01 16:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc

Provide replacements for htm_spr_num and htm_expand_builtin.  No logic
changes are intended here, as usual.  Much code was factored out into
rs6000_expand_new_builtin, so the new version of htm_expand_builtin is
a little tidier.

Also implement the support for the "endian" and "32bit" attributes,
which is straightforward.  These just do icode substitution.

2021-09-01  Bill Schmidt  <wschmidt@linux.ibm.com>

gcc/
	* config/rs6000/rs6000-call.c (new_htm_spr_num): New function.
	(new_htm_expand_builtin): Implement.
	(rs6000_expand_new_builtin): Handle 32-bit and endian cases.
---
 gcc/config/rs6000/rs6000-call.c | 202 ++++++++++++++++++++++++++++++++
 1 file changed, 202 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index a8956eefd95..e34f6ce8745 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -15140,11 +15140,171 @@ new_mma_expand_builtin (tree exp, rtx target, insn_code icode,
   return target;
 }
 
+/* Return the appropriate SPR number associated with the given builtin.  */
+static inline HOST_WIDE_INT
+new_htm_spr_num (enum rs6000_gen_builtins code)
+{
+  if (code == RS6000_BIF_GET_TFHAR
+      || code == RS6000_BIF_SET_TFHAR)
+    return TFHAR_SPR;
+  else if (code == RS6000_BIF_GET_TFIAR
+	   || code == RS6000_BIF_SET_TFIAR)
+    return TFIAR_SPR;
+  else if (code == RS6000_BIF_GET_TEXASR
+	   || code == RS6000_BIF_SET_TEXASR)
+    return TEXASR_SPR;
+  gcc_assert (code == RS6000_BIF_GET_TEXASRU
+	      || code == RS6000_BIF_SET_TEXASRU);
+  return TEXASRU_SPR;
+}
+
 /* Expand the HTM builtin in EXP and store the result in TARGET.  */
 static rtx
 new_htm_expand_builtin (bifdata *bifaddr, rs6000_gen_builtins fcode,
 			tree exp, rtx target)
 {
+  tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
+  bool nonvoid = TREE_TYPE (TREE_TYPE (fndecl)) != void_type_node;
+
+  if (!TARGET_POWERPC64
+      && (fcode == RS6000_BIF_TABORTDC
+	  || fcode == RS6000_BIF_TABORTDCI))
+    {
+      error ("builtin %qs is only valid in 64-bit mode", bifaddr->bifname);
+      return const0_rtx;
+    }
+
+  rtx op[MAX_HTM_OPERANDS], pat;
+  int nopnds = 0;
+  tree arg;
+  call_expr_arg_iterator iter;
+  insn_code icode = bifaddr->icode;
+  bool uses_spr = bif_is_htmspr (*bifaddr);
+  rtx cr = NULL_RTX;
+
+  if (uses_spr)
+    icode = rs6000_htm_spr_icode (nonvoid);
+  const insn_operand_data *insn_op = &insn_data[icode].operand[0];
+
+  if (nonvoid)
+    {
+      machine_mode tmode = (uses_spr) ? insn_op->mode : E_SImode;
+      if (!target
+	  || GET_MODE (target) != tmode
+	  || (uses_spr && !insn_op->predicate (target, tmode)))
+	target = gen_reg_rtx (tmode);
+      if (uses_spr)
+	op[nopnds++] = target;
+    }
+
+  FOR_EACH_CALL_EXPR_ARG (arg, iter, exp)
+    {
+      if (arg == error_mark_node || nopnds >= MAX_HTM_OPERANDS)
+	return const0_rtx;
+
+      insn_op = &insn_data[icode].operand[nopnds];
+      op[nopnds] = expand_normal (arg);
+
+      if (!insn_op->predicate (op[nopnds], insn_op->mode))
+	{
+	  if (!strcmp (insn_op->constraint, "n"))
+	    {
+	      int arg_num = (nonvoid) ? nopnds : nopnds + 1;
+	      if (!CONST_INT_P (op[nopnds]))
+		error ("argument %d must be an unsigned literal", arg_num);
+	      else
+		error ("argument %d is an unsigned literal that is "
+		       "out of range", arg_num);
+	      return const0_rtx;
+	    }
+	  op[nopnds] = copy_to_mode_reg (insn_op->mode, op[nopnds]);
+	}
+
+      nopnds++;
+    }
+
+  /* Handle the builtins for extended mnemonics.  These accept
+     no arguments, but map to builtins that take arguments.  */
+  switch (fcode)
+    {
+    case RS6000_BIF_TENDALL:  /* Alias for: tend. 1  */
+    case RS6000_BIF_TRESUME:  /* Alias for: tsr. 1  */
+      op[nopnds++] = GEN_INT (1);
+      break;
+    case RS6000_BIF_TSUSPEND: /* Alias for: tsr. 0  */
+      op[nopnds++] = GEN_INT (0);
+      break;
+    default:
+      break;
+    }
+
+  /* If this builtin accesses SPRs, then pass in the appropriate
+     SPR number and SPR regno as the last two operands.  */
+  if (uses_spr)
+    {
+      machine_mode mode = (TARGET_POWERPC64) ? DImode : SImode;
+      op[nopnds++] = gen_rtx_CONST_INT (mode, new_htm_spr_num (fcode));
+    }
+  /* If this builtin accesses a CR, then pass in a scratch
+     CR as the last operand.  */
+  else if (bif_is_htmcr (*bifaddr))
+    {
+      cr = gen_reg_rtx (CCmode);
+      op[nopnds++] = cr;
+    }
+
+  switch (nopnds)
+    {
+    case 1:
+      pat = GEN_FCN (icode) (op[0]);
+      break;
+    case 2:
+      pat = GEN_FCN (icode) (op[0], op[1]);
+      break;
+    case 3:
+      pat = GEN_FCN (icode) (op[0], op[1], op[2]);
+      break;
+    case 4:
+      pat = GEN_FCN (icode) (op[0], op[1], op[2], op[3]);
+      break;
+    default:
+      gcc_unreachable ();
+    }
+  if (!pat)
+    return NULL_RTX;
+  emit_insn (pat);
+
+  if (bif_is_htmcr (*bifaddr))
+    {
+      if (fcode == RS6000_BIF_TBEGIN)
+	{
+	  /* Emit code to set TARGET to true or false depending on
+	     whether the tbegin. instruction succeeded or failed
+	     to start a transaction.  We do this by placing the 1's
+	     complement of CR's EQ bit into TARGET.  */
+	  rtx scratch = gen_reg_rtx (SImode);
+	  emit_insn (gen_rtx_SET (scratch,
+				  gen_rtx_EQ (SImode, cr,
+					      const0_rtx)));
+	  emit_insn (gen_rtx_SET (target,
+				  gen_rtx_XOR (SImode, scratch,
+					       GEN_INT (1))));
+	}
+      else
+	{
+	  /* Emit code to copy the 4-bit condition register field
+	     CR into the least significant end of register TARGET.  */
+	  rtx scratch1 = gen_reg_rtx (SImode);
+	  rtx scratch2 = gen_reg_rtx (SImode);
+	  rtx subreg = simplify_gen_subreg (CCmode, scratch1, SImode, 0);
+	  emit_insn (gen_movcc (subreg, cr));
+	  emit_insn (gen_lshrsi3 (scratch2, scratch1, GEN_INT (28)));
+	  emit_insn (gen_andsi3 (target, scratch2, GEN_INT (0xf)));
+	}
+    }
+
+  if (nonvoid)
+    return target;
   return const0_rtx;
 }
 
@@ -15294,6 +15454,48 @@ rs6000_expand_new_builtin (tree exp, rtx target,
   if (bif_is_htm (*bifaddr))
     return new_htm_expand_builtin (bifaddr, fcode, exp, target);
 
+  if (bif_is_32bit (*bifaddr) && TARGET_32BIT)
+    {
+      if (fcode == RS6000_BIF_MFTB)
+	icode = CODE_FOR_rs6000_mftb_si;
+      else
+	gcc_unreachable ();
+    }
+
+  if (bif_is_endian (*bifaddr) && BYTES_BIG_ENDIAN)
+    {
+      if (fcode == RS6000_BIF_LD_ELEMREV_V1TI)
+	icode = CODE_FOR_vsx_load_v1ti;
+      else if (fcode == RS6000_BIF_LD_ELEMREV_V2DF)
+	icode = CODE_FOR_vsx_load_v2df;
+      else if (fcode == RS6000_BIF_LD_ELEMREV_V2DI)
+	icode = CODE_FOR_vsx_load_v2di;
+      else if (fcode == RS6000_BIF_LD_ELEMREV_V4SF)
+	icode = CODE_FOR_vsx_load_v4sf;
+      else if (fcode == RS6000_BIF_LD_ELEMREV_V4SI)
+	icode = CODE_FOR_vsx_load_v4si;
+      else if (fcode == RS6000_BIF_LD_ELEMREV_V8HI)
+	icode = CODE_FOR_vsx_load_v8hi;
+      else if (fcode == RS6000_BIF_LD_ELEMREV_V16QI)
+	icode = CODE_FOR_vsx_load_v16qi;
+      else if (fcode == RS6000_BIF_ST_ELEMREV_V1TI)
+	icode = CODE_FOR_vsx_store_v1ti;
+      else if (fcode == RS6000_BIF_ST_ELEMREV_V2DF)
+	icode = CODE_FOR_vsx_store_v2df;
+      else if (fcode == RS6000_BIF_ST_ELEMREV_V2DI)
+	icode = CODE_FOR_vsx_store_v2di;
+      else if (fcode == RS6000_BIF_ST_ELEMREV_V4SF)
+	icode = CODE_FOR_vsx_store_v4sf;
+      else if (fcode == RS6000_BIF_ST_ELEMREV_V4SI)
+	icode = CODE_FOR_vsx_store_v4si;
+      else if (fcode == RS6000_BIF_ST_ELEMREV_V8HI)
+	icode = CODE_FOR_vsx_store_v8hi;
+      else if (fcode == RS6000_BIF_ST_ELEMREV_V16QI)
+	icode = CODE_FOR_vsx_store_v16qi;
+      else
+	gcc_unreachable ();
+    }
+
   rtx pat;
   const int MAX_BUILTIN_ARGS = 6;
   tree arg[MAX_BUILTIN_ARGS];
-- 
2.27.0


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 12/18] rs6000: Update rs6000_builtin_decl
  2021-09-01 16:13 [PATCHv5 00/18] Replace the Power target-specific builtin machinery Bill Schmidt
                   ` (10 preceding siblings ...)
  2021-09-01 16:13 ` [PATCH 11/18] rs6000: Builtin expansion, part 6 Bill Schmidt
@ 2021-09-01 16:13 ` Bill Schmidt
  2021-11-05 20:27   ` Segher Boessenkool
  2021-09-01 16:13 ` [PATCH 13/18] rs6000: Miscellaneous uses of rs6000_builtins_decl_x Bill Schmidt
                   ` (6 subsequent siblings)
  18 siblings, 1 reply; 52+ messages in thread
From: Bill Schmidt @ 2021-09-01 16:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc

Create a new version of this function that uses the new infrastructure,
and particularly checks for supported builtins the new way.

2021-08-31  Bill Schmidt  <wschmidt@linux.ibm.com>

gcc/
	* config/rs6000/rs6000-call.c (rs6000_new_builtin_decl): New
	function.
	(rs6000_builtin_decl): Call it.
---
 gcc/config/rs6000/rs6000-call.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index e34f6ce8745..b9ce3f28f9a 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -16236,11 +16236,31 @@ rs6000_init_builtins (void)
     }
 }
 
+static tree
+rs6000_new_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)
+{
+  rs6000_gen_builtins fcode = (rs6000_gen_builtins) code;
+
+  if (fcode >= RS6000_OVLD_MAX)
+    return error_mark_node;
+
+  if (!rs6000_new_builtin_is_supported (fcode))
+    {
+      rs6000_invalid_new_builtin (fcode);
+      return error_mark_node;
+    }
+
+  return rs6000_builtin_decls_x[code];
+}
+
 /* Returns the rs6000 builtin decl for CODE.  */
 
 tree
 rs6000_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)
 {
+  if (new_builtins_are_live)
+    return rs6000_new_builtin_decl (code, initialize_p);
+
   HOST_WIDE_INT fnmask;
 
   if (code >= RS6000_BUILTIN_COUNT)
-- 
2.27.0


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 13/18] rs6000: Miscellaneous uses of rs6000_builtins_decl_x
  2021-09-01 16:13 [PATCHv5 00/18] Replace the Power target-specific builtin machinery Bill Schmidt
                   ` (11 preceding siblings ...)
  2021-09-01 16:13 ` [PATCH 12/18] rs6000: Update rs6000_builtin_decl Bill Schmidt
@ 2021-09-01 16:13 ` Bill Schmidt
  2021-11-05 20:36   ` Segher Boessenkool
  2021-09-01 16:13 ` [PATCH 14/18] rs6000: Debug support Bill Schmidt
                   ` (5 subsequent siblings)
  18 siblings, 1 reply; 52+ messages in thread
From: Bill Schmidt @ 2021-09-01 16:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc

There are a few leftover places where we use the old rs6000_builtins_decl
array, but we need to use rs6000_builtins_decl_x instead when the new
builtins infrastructure is in play.

2021-07-28  Bill Schmidt  <wschmidt@linux.ibm.com>

gcc/
	* config/rs6000/rs6000.c (rs6000_builtin_reciprocal): Use
	rs6000_builtin_decls_x when appropriate.
	(add_condition_to_bb): Likewise.
	(rs6000_atomic_assign_expand_fenv): Likewise.
---
 gcc/config/rs6000/rs6000.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 52c78c7500c..fa86b797b0d 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -22681,12 +22681,16 @@ rs6000_builtin_reciprocal (tree fndecl)
       if (!RS6000_RECIP_AUTO_RSQRTE_P (V2DFmode))
 	return NULL_TREE;
 
+      if (new_builtins_are_live)
+	return rs6000_builtin_decls_x[RS6000_BIF_RSQRT_2DF];
       return rs6000_builtin_decls[VSX_BUILTIN_RSQRT_2DF];
 
     case VSX_BUILTIN_XVSQRTSP:
       if (!RS6000_RECIP_AUTO_RSQRTE_P (V4SFmode))
 	return NULL_TREE;
 
+      if (new_builtins_are_live)
+	return rs6000_builtin_decls_x[RS6000_BIF_RSQRT_4SF];
       return rs6000_builtin_decls[VSX_BUILTIN_RSQRT_4SF];
 
     default:
@@ -25275,7 +25279,10 @@ add_condition_to_bb (tree function_decl, tree version_decl,
 
   tree bool_zero = build_int_cst (bool_int_type_node, 0);
   tree cond_var = create_tmp_var (bool_int_type_node);
-  tree predicate_decl = rs6000_builtin_decls [(int) RS6000_BUILTIN_CPU_SUPPORTS];
+  tree predicate_decl
+    = (new_builtins_are_live
+       ? rs6000_builtin_decls_x[(int) RS6000_BIF_CPU_SUPPORTS]
+       : rs6000_builtin_decls [(int) RS6000_BUILTIN_CPU_SUPPORTS]);
   const char *arg_str = rs6000_clone_map[clone_isa].name;
   tree predicate_arg = build_string_literal (strlen (arg_str) + 1, arg_str);
   gimple *call_cond_stmt = gimple_build_call (predicate_decl, 1, predicate_arg);
@@ -27915,8 +27922,14 @@ rs6000_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update)
       return;
     }
 
-  tree mffs = rs6000_builtin_decls[RS6000_BUILTIN_MFFS];
-  tree mtfsf = rs6000_builtin_decls[RS6000_BUILTIN_MTFSF];
+  tree mffs
+    = (new_builtins_are_live
+       ? rs6000_builtin_decls_x[RS6000_BIF_MFFS]
+       : rs6000_builtin_decls[RS6000_BUILTIN_MFFS]);
+  tree mtfsf
+    = (new_builtins_are_live
+       ? rs6000_builtin_decls_x[RS6000_BIF_MTFSF]
+       : rs6000_builtin_decls[RS6000_BUILTIN_MTFSF]);
   tree call_mffs = build_call_expr (mffs, 0);
 
   /* Generates the equivalent of feholdexcept (&fenv_var)
-- 
2.27.0


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 14/18] rs6000: Debug support
  2021-09-01 16:13 [PATCHv5 00/18] Replace the Power target-specific builtin machinery Bill Schmidt
                   ` (12 preceding siblings ...)
  2021-09-01 16:13 ` [PATCH 13/18] rs6000: Miscellaneous uses of rs6000_builtins_decl_x Bill Schmidt
@ 2021-09-01 16:13 ` Bill Schmidt
  2021-11-05 21:34   ` Segher Boessenkool
  2021-09-01 16:13 ` [PATCH 15/18] rs6000: Update altivec.h for automated interfaces Bill Schmidt
                   ` (4 subsequent siblings)
  18 siblings, 1 reply; 52+ messages in thread
From: Bill Schmidt @ 2021-09-01 16:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc

2021-07-28  Bill Schmidt  <wschmidt@linux.ibm.com>

gcc/
	* config/rs6000/rs6000-call.c (rs6000_debug_type): New function.
	(def_builtin): Change debug formatting for easier parsing and
	include more information.
	(rs6000_init_builtins): Add dump of autogenerated builtins.
	(altivec_init_builtins): Dump __builtin_altivec_mask_for_load for
	completeness.
---
 gcc/config/rs6000/rs6000-call.c | 191 +++++++++++++++++++++++++++++++-
 1 file changed, 185 insertions(+), 6 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index b9ce3f28f9a..b6f669f06a5 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -8880,6 +8880,106 @@ rs6000_gimplify_va_arg (tree valist, tree type, gimple_seq *pre_p,
 
 /* Builtins.  */
 
+/* Debug utility to translate a type node to a single token.  */
+static
+const char *rs6000_debug_type (tree type)
+{
+  if (type == void_type_node)
+    return "void";
+  else if (type == long_integer_type_node)
+    return "long";
+  else if (type == long_unsigned_type_node)
+    return "ulong";
+  else if (type == long_long_integer_type_node)
+    return "longlong";
+  else if (type == long_long_unsigned_type_node)
+    return "ulonglong";
+  else if (type == bool_V2DI_type_node)
+    return "vbll";
+  else if (type == bool_V4SI_type_node)
+    return "vbi";
+  else if (type == bool_V8HI_type_node)
+    return "vbs";
+  else if (type == bool_V16QI_type_node)
+    return "vbc";
+  else if (type == bool_int_type_node)
+    return "bool";
+  else if (type == dfloat64_type_node)
+    return "_Decimal64";
+  else if (type == double_type_node)
+    return "double";
+  else if (type == intDI_type_node)
+    return "sll";
+  else if (type == intHI_type_node)
+    return "ss";
+  else if (type == ibm128_float_type_node)
+    return "__ibm128";
+  else if (type == opaque_V4SI_type_node)
+    return "opaque";
+  else if (POINTER_TYPE_P (type))
+    return "void*";
+  else if (type == intQI_type_node || type == char_type_node)
+    return "sc";
+  else if (type == dfloat32_type_node)
+    return "_Decimal32";
+  else if (type == float_type_node)
+    return "float";
+  else if (type == intSI_type_node || type == integer_type_node)
+    return "si";
+  else if (type == dfloat128_type_node)
+    return "_Decimal128";
+  else if (type == long_double_type_node)
+    return "longdouble";
+  else if (type == intTI_type_node)
+    return "sq";
+  else if (type == unsigned_intDI_type_node)
+    return "ull";
+  else if (type == unsigned_intHI_type_node)
+    return "us";
+  else if (type == unsigned_intQI_type_node)
+    return "uc";
+  else if (type == unsigned_intSI_type_node)
+    return "ui";
+  else if (type == unsigned_intTI_type_node)
+    return "uq";
+  else if (type == unsigned_V1TI_type_node)
+    return "vuq";
+  else if (type == unsigned_V2DI_type_node)
+    return "vull";
+  else if (type == unsigned_V4SI_type_node)
+    return "vui";
+  else if (type == unsigned_V8HI_type_node)
+    return "vus";
+  else if (type == unsigned_V16QI_type_node)
+    return "vuc";
+  else if (type == V16QI_type_node)
+    return "vsc";
+  else if (type == V1TI_type_node)
+    return "vsq";
+  else if (type == V2DF_type_node)
+    return "vd";
+  else if (type == V2DI_type_node)
+    return "vsll";
+  else if (type == V4SF_type_node)
+    return "vf";
+  else if (type == V4SI_type_node)
+    return "vsi";
+  else if (type == V8HI_type_node)
+    return "vss";
+  else if (type == pixel_V8HI_type_node)
+    return "vp";
+  else if (type == pcvoid_type_node)
+    return "voidc*";
+  else if (type == float128_type_node)
+    return "_Float128";
+  else if (type == vector_pair_type_node)
+    return "__vector_pair";
+  else if (type == vector_quad_type_node)
+    return "__vector_quad";
+  else
+    return "unknown";
+}
+
 static void
 def_builtin (const char *name, tree type, enum rs6000_builtins code)
 {
@@ -8908,7 +9008,7 @@ def_builtin (const char *name, tree type, enum rs6000_builtins code)
       /* const function, function only depends on the inputs.  */
       TREE_READONLY (t) = 1;
       TREE_NOTHROW (t) = 1;
-      attr_string = ", const";
+      attr_string = "= const";
     }
   else if ((classify & RS6000_BTC_PURE) != 0)
     {
@@ -8916,7 +9016,7 @@ def_builtin (const char *name, tree type, enum rs6000_builtins code)
 	 external state.  */
       DECL_PURE_P (t) = 1;
       TREE_NOTHROW (t) = 1;
-      attr_string = ", pure";
+      attr_string = "= pure";
     }
   else if ((classify & RS6000_BTC_FP) != 0)
     {
@@ -8930,12 +9030,12 @@ def_builtin (const char *name, tree type, enum rs6000_builtins code)
 	{
 	  DECL_PURE_P (t) = 1;
 	  DECL_IS_NOVOPS (t) = 1;
-	  attr_string = ", fp, pure";
+	  attr_string = "= fp, pure";
 	}
       else
 	{
 	  TREE_READONLY (t) = 1;
-	  attr_string = ", fp, const";
+	  attr_string = "= fp, const";
 	}
     }
   else if ((classify & (RS6000_BTC_QUAD | RS6000_BTC_PAIR)) != 0)
@@ -8945,8 +9045,20 @@ def_builtin (const char *name, tree type, enum rs6000_builtins code)
     gcc_unreachable ();
 
   if (TARGET_DEBUG_BUILTIN)
-    fprintf (stderr, "rs6000_builtin, code = %4d, %s%s\n",
-	     (int)code, name, attr_string);
+    {
+      tree t = TREE_TYPE (type);
+      fprintf (stderr, "%s %s (", rs6000_debug_type (t), name);
+      t = TYPE_ARG_TYPES (type);
+      while (t && TREE_VALUE (t) != void_type_node)
+	{
+	  fprintf (stderr, "%s",
+		   rs6000_debug_type (TREE_VALUE (t)));
+	  t = TREE_CHAIN (t);
+	  if (t && TREE_VALUE (t) != void_type_node)
+	    fprintf (stderr, ", ");
+	}
+      fprintf (stderr, "); %s [%4d]\n", attr_string, (int)code);
+    }
 }
 
 static const struct builtin_compatibility bdesc_compat[] =
@@ -16097,6 +16209,67 @@ rs6000_init_builtins (void)
   /* Execute the autogenerated initialization code for builtins.  */
   rs6000_init_generated_builtins ();
 
+  if (TARGET_DEBUG_BUILTIN)
+     {
+      fprintf (stderr, "\nAutogenerated built-in functions:\n\n");
+      for (int i = 1; i < (int) RS6000_BIF_MAX; i++)
+	{
+	  bif_enable e = rs6000_builtin_info_x[i].enable;
+	  if (e == ENB_P5 && !TARGET_POPCNTB)
+	    continue;
+	  if (e == ENB_P6 && !TARGET_CMPB)
+	    continue;
+	  if (e == ENB_ALTIVEC && !TARGET_ALTIVEC)
+	    continue;
+	  if (e == ENB_VSX && !TARGET_VSX)
+	    continue;
+	  if (e == ENB_P7 && !TARGET_POPCNTD)
+	    continue;
+	  if (e == ENB_P7_64 && (!TARGET_POPCNTD || !TARGET_POWERPC64))
+	    continue;
+	  if (e == ENB_P8 && !TARGET_DIRECT_MOVE)
+	    continue;
+	  if (e == ENB_P8V && !TARGET_P8_VECTOR)
+	    continue;
+	  if (e == ENB_P9 && !TARGET_MODULO)
+	    continue;
+	  if (e == ENB_P9_64 && (!TARGET_MODULO || !TARGET_POWERPC64))
+	    continue;
+	  if (e == ENB_P9V && !TARGET_P9_VECTOR)
+	    continue;
+	  if (e == ENB_IEEE128_HW && !TARGET_FLOAT128_HW)
+	    continue;
+	  if (e == ENB_DFP && !TARGET_DFP)
+	    continue;
+	  if (e == ENB_CRYPTO && !TARGET_CRYPTO)
+	    continue;
+	  if (e == ENB_HTM && !TARGET_HTM)
+	    continue;
+	  if (e == ENB_P10 && !TARGET_POWER10)
+	    continue;
+	  if (e == ENB_P10_64 && (!TARGET_POWER10 || !TARGET_POWERPC64))
+	    continue;
+	  if (e == ENB_MMA && !TARGET_MMA)
+	    continue;
+	  tree fntype = rs6000_builtin_info_x[i].fntype;
+	  tree t = TREE_TYPE (fntype);
+	  fprintf (stderr, "%s %s (", rs6000_debug_type (t),
+		   rs6000_builtin_info_x[i].bifname);
+	  t = TYPE_ARG_TYPES (fntype);
+	  while (t && TREE_VALUE (t) != void_type_node)
+	    {
+	      fprintf (stderr, "%s",
+		       rs6000_debug_type (TREE_VALUE (t)));
+	      t = TREE_CHAIN (t);
+	      if (t && TREE_VALUE (t) != void_type_node)
+		fprintf (stderr, ", ");
+	    }
+	  fprintf (stderr, "); %s [%4d]\n",
+		   rs6000_builtin_info_x[i].attr_string, (int) i);
+	}
+      fprintf (stderr, "\nEnd autogenerated built-in functions.\n\n\n");
+     }
+
   if (new_builtins_are_live)
     {
       altivec_builtin_mask_for_load
@@ -16763,6 +16936,12 @@ altivec_init_builtins (void)
 			       ALTIVEC_BUILTIN_MASK_FOR_LOAD,
 			       BUILT_IN_MD, NULL, NULL_TREE);
   TREE_READONLY (decl) = 1;
+  if (TARGET_DEBUG_BUILTIN)
+    fprintf (stderr, "%s __builtin_altivec_mask_for_load (%s); [%4d]\n",
+	     rs6000_debug_type (TREE_TYPE (v16qi_ftype_pcvoid)),
+	     rs6000_debug_type (TREE_VALUE
+				(TYPE_ARG_TYPES (v16qi_ftype_pcvoid))),
+	     (int) ALTIVEC_BUILTIN_MASK_FOR_LOAD);
   /* Record the decl. Will be used by rs6000_builtin_mask_for_load.  */
   altivec_builtin_mask_for_load = decl;
 
-- 
2.27.0


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 15/18] rs6000: Update altivec.h for automated interfaces
  2021-09-01 16:13 [PATCHv5 00/18] Replace the Power target-specific builtin machinery Bill Schmidt
                   ` (13 preceding siblings ...)
  2021-09-01 16:13 ` [PATCH 14/18] rs6000: Debug support Bill Schmidt
@ 2021-09-01 16:13 ` Bill Schmidt
  2021-11-05 22:08   ` Segher Boessenkool
  2021-09-01 16:13 ` [PATCH 16/18] rs6000: Test case adjustments Bill Schmidt
                   ` (3 subsequent siblings)
  18 siblings, 1 reply; 52+ messages in thread
From: Bill Schmidt @ 2021-09-01 16:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc

2021-07-28  Bill Schmidt  <wschmidt@linux.ibm.com>

gcc/
	* config/rs6000/altivec.h: Delete a number of #defines that are
	now superfluous.  Alphabetize.  Include rs6000-vecdefines.h.
	Include some synonyms.
---
 gcc/config/rs6000/altivec.h | 519 +++---------------------------------
 1 file changed, 38 insertions(+), 481 deletions(-)

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index 5b631c7ebaf..9dfa285ccd1 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -55,32 +55,36 @@
 #define __CR6_LT		2
 #define __CR6_LT_REV		3
 
-/* Synonyms.  */
+#include "rs6000-vecdefines.h"
+
+/* Deprecated interfaces.  */
+#define vec_lvx vec_ld
+#define vec_lvxl vec_ldl
+#define vec_stvx vec_st
+#define vec_stvxl vec_stl
 #define vec_vaddcuw vec_addc
 #define vec_vand vec_and
 #define vec_vandc vec_andc
-#define vec_vrfip vec_ceil
 #define vec_vcmpbfp vec_cmpb
 #define vec_vcmpgefp vec_cmpge
 #define vec_vctsxs vec_cts
 #define vec_vctuxs vec_ctu
 #define vec_vexptefp vec_expte
-#define vec_vrfim vec_floor
-#define vec_lvx vec_ld
-#define vec_lvxl vec_ldl
 #define vec_vlogefp vec_loge
 #define vec_vmaddfp vec_madd
 #define vec_vmhaddshs vec_madds
-#define vec_vmladduhm vec_mladd
 #define vec_vmhraddshs vec_mradds
+#define vec_vmladduhm vec_mladd
 #define vec_vnmsubfp vec_nmsub
 #define vec_vnor vec_nor
 #define vec_vor vec_or
-#define vec_vpkpx vec_packpx
 #define vec_vperm vec_perm
-#define vec_permxor __builtin_vec_vpermxor
+#define vec_vpkpx vec_packpx
 #define vec_vrefp vec_re
+#define vec_vrfim vec_floor
 #define vec_vrfin vec_round
+#define vec_vrfip vec_ceil
+#define vec_vrfiz vec_trunc
 #define vec_vrsqrtefp vec_rsqrte
 #define vec_vsel vec_sel
 #define vec_vsldoi vec_sld
@@ -91,440 +95,53 @@
 #define vec_vspltisw vec_splat_s32
 #define vec_vsr vec_srl
 #define vec_vsro vec_sro
-#define vec_stvx vec_st
-#define vec_stvxl vec_stl
 #define vec_vsubcuw vec_subc
 #define vec_vsum2sws vec_sum2s
 #define vec_vsumsws vec_sums
-#define vec_vrfiz vec_trunc
 #define vec_vxor vec_xor
 
+/* For _ARCH_PWR8.  Always define to support #pragma GCC target.  */
+#define vec_vclz vec_cntlz
+#define vec_vgbbd vec_gb
+#define vec_vmrgew vec_mergee
+#define vec_vmrgow vec_mergeo
+#define vec_vpopcntu vec_popcnt
+#define vec_vrld vec_rl
+#define vec_vsld vec_sl
+#define vec_vsrd vec_sr
+#define vec_vsrad vec_sra
+
+/* For _ARCH_PWR9.  Always define to support #pragma GCC target.  */
+#define vec_extract_fp_from_shorth vec_extract_fp32_from_shorth
+#define vec_extract_fp_from_shortl vec_extract_fp32_from_shortl
+#define vec_vctz vec_cnttz
+
+/* Synonyms.  */
 /* Functions that are resolved by the backend to one of the
    typed builtins.  */
-#define vec_vaddfp __builtin_vec_vaddfp
-#define vec_addc __builtin_vec_addc
-#define vec_adde __builtin_vec_adde
-#define vec_addec __builtin_vec_addec
-#define vec_vaddsws __builtin_vec_vaddsws
-#define vec_vaddshs __builtin_vec_vaddshs
-#define vec_vaddsbs __builtin_vec_vaddsbs
-#define vec_vavgsw __builtin_vec_vavgsw
-#define vec_vavguw __builtin_vec_vavguw
-#define vec_vavgsh __builtin_vec_vavgsh
-#define vec_vavguh __builtin_vec_vavguh
-#define vec_vavgsb __builtin_vec_vavgsb
-#define vec_vavgub __builtin_vec_vavgub
-#define vec_ceil __builtin_vec_ceil
-#define vec_cmpb __builtin_vec_cmpb
-#define vec_vcmpeqfp __builtin_vec_vcmpeqfp
-#define vec_cmpge __builtin_vec_cmpge
-#define vec_vcmpgtfp __builtin_vec_vcmpgtfp
-#define vec_vcmpgtsw __builtin_vec_vcmpgtsw
-#define vec_vcmpgtuw __builtin_vec_vcmpgtuw
-#define vec_vcmpgtsh __builtin_vec_vcmpgtsh
-#define vec_vcmpgtuh __builtin_vec_vcmpgtuh
-#define vec_vcmpgtsb __builtin_vec_vcmpgtsb
-#define vec_vcmpgtub __builtin_vec_vcmpgtub
-#define vec_vcfsx __builtin_vec_vcfsx
-#define vec_vcfux __builtin_vec_vcfux
-#define vec_cts __builtin_vec_cts
-#define vec_ctu __builtin_vec_ctu
-#define vec_cpsgn __builtin_vec_copysign
-#define vec_double __builtin_vec_double
-#define vec_doublee __builtin_vec_doublee
-#define vec_doubleo __builtin_vec_doubleo
-#define vec_doublel __builtin_vec_doublel
-#define vec_doubleh __builtin_vec_doubleh
-#define vec_expte __builtin_vec_expte
-#define vec_float __builtin_vec_float
-#define vec_float2 __builtin_vec_float2
-#define vec_floate __builtin_vec_floate
-#define vec_floato __builtin_vec_floato
-#define vec_floor __builtin_vec_floor
-#define vec_loge __builtin_vec_loge
-#define vec_madd __builtin_vec_madd
-#define vec_madds __builtin_vec_madds
-#define vec_mtvscr __builtin_vec_mtvscr
-#define vec_reve __builtin_vec_vreve
-#define vec_vmaxfp __builtin_vec_vmaxfp
-#define vec_vmaxsw __builtin_vec_vmaxsw
-#define vec_vmaxsh __builtin_vec_vmaxsh
-#define vec_vmaxsb __builtin_vec_vmaxsb
-#define vec_vminfp __builtin_vec_vminfp
-#define vec_vminsw __builtin_vec_vminsw
-#define vec_vminsh __builtin_vec_vminsh
-#define vec_vminsb __builtin_vec_vminsb
-#define vec_mradds __builtin_vec_mradds
-#define vec_vmsumshm __builtin_vec_vmsumshm
-#define vec_vmsumuhm __builtin_vec_vmsumuhm
-#define vec_vmsummbm __builtin_vec_vmsummbm
-#define vec_vmsumubm __builtin_vec_vmsumubm
-#define vec_vmsumshs __builtin_vec_vmsumshs
-#define vec_vmsumuhs __builtin_vec_vmsumuhs
-#define vec_vmsumudm __builtin_vec_vmsumudm
-#define vec_vmulesb __builtin_vec_vmulesb
-#define vec_vmulesh __builtin_vec_vmulesh
-#define vec_vmuleuh __builtin_vec_vmuleuh
-#define vec_vmuleub __builtin_vec_vmuleub
-#define vec_vmulosh __builtin_vec_vmulosh
-#define vec_vmulouh __builtin_vec_vmulouh
-#define vec_vmulosb __builtin_vec_vmulosb
-#define vec_vmuloub __builtin_vec_vmuloub
-#define vec_nmsub __builtin_vec_nmsub
-#define vec_packpx __builtin_vec_packpx
-#define vec_vpkswss __builtin_vec_vpkswss
-#define vec_vpkuwus __builtin_vec_vpkuwus
-#define vec_vpkshss __builtin_vec_vpkshss
-#define vec_vpkuhus __builtin_vec_vpkuhus
-#define vec_vpkswus __builtin_vec_vpkswus
-#define vec_vpkshus __builtin_vec_vpkshus
-#define vec_re __builtin_vec_re
-#define vec_round __builtin_vec_round
-#define vec_recipdiv __builtin_vec_recipdiv
-#define vec_rlmi __builtin_vec_rlmi
-#define vec_vrlnm __builtin_vec_rlnm
 #define vec_rlnm(a,b,c) (__builtin_vec_rlnm((a),((c)<<8)|(b)))
-#define vec_rsqrt __builtin_vec_rsqrt
-#define vec_rsqrte __builtin_vec_rsqrte
-#define vec_signed __builtin_vec_vsigned
-#define vec_signed2 __builtin_vec_vsigned2
-#define vec_signede __builtin_vec_vsignede
-#define vec_signedo __builtin_vec_vsignedo
-#define vec_unsigned __builtin_vec_vunsigned
-#define vec_unsigned2 __builtin_vec_vunsigned2
-#define vec_unsignede __builtin_vec_vunsignede
-#define vec_unsignedo __builtin_vec_vunsignedo
-#define vec_vsubfp __builtin_vec_vsubfp
-#define vec_subc __builtin_vec_subc
-#define vec_sube __builtin_vec_sube
-#define vec_subec __builtin_vec_subec
-#define vec_vsubsws __builtin_vec_vsubsws
-#define vec_vsubshs __builtin_vec_vsubshs
-#define vec_vsubsbs __builtin_vec_vsubsbs
-#define vec_sum4s __builtin_vec_sum4s
-#define vec_vsum4shs __builtin_vec_vsum4shs
-#define vec_vsum4sbs __builtin_vec_vsum4sbs
-#define vec_vsum4ubs __builtin_vec_vsum4ubs
-#define vec_sum2s __builtin_vec_sum2s
-#define vec_sums __builtin_vec_sums
-#define vec_trunc __builtin_vec_trunc
-#define vec_vupkhpx __builtin_vec_vupkhpx
-#define vec_vupkhsh __builtin_vec_vupkhsh
-#define vec_vupkhsb __builtin_vec_vupkhsb
-#define vec_vupklpx __builtin_vec_vupklpx
-#define vec_vupklsh __builtin_vec_vupklsh
-#define vec_vupklsb __builtin_vec_vupklsb
-#define vec_abs __builtin_vec_abs
-#define vec_nabs __builtin_vec_nabs
-#define vec_abss __builtin_vec_abss
-#define vec_add __builtin_vec_add
-#define vec_adds __builtin_vec_adds
-#define vec_and __builtin_vec_and
-#define vec_andc __builtin_vec_andc
-#define vec_avg __builtin_vec_avg
-#define vec_cmpeq __builtin_vec_cmpeq
-#define vec_cmpne __builtin_vec_cmpne
-#define vec_cmpgt __builtin_vec_cmpgt
-#define vec_ctf __builtin_vec_ctf
-#define vec_dst __builtin_vec_dst
-#define vec_dstst __builtin_vec_dstst
-#define vec_dststt __builtin_vec_dststt
-#define vec_dstt __builtin_vec_dstt
-#define vec_ld __builtin_vec_ld
-#define vec_lde __builtin_vec_lde
-#define vec_ldl __builtin_vec_ldl
-#define vec_lvebx __builtin_vec_lvebx
-#define vec_lvehx __builtin_vec_lvehx
-#define vec_lvewx __builtin_vec_lvewx
-#define vec_xl_zext __builtin_vec_ze_lxvrx
-#define vec_xl_sext __builtin_vec_se_lxvrx
-#define vec_xst_trunc __builtin_vec_tr_stxvrx
-#define vec_neg __builtin_vec_neg
-#define vec_pmsum_be __builtin_vec_vpmsum
-#define vec_shasigma_be __builtin_crypto_vshasigma
-/* Cell only intrinsics.  */
-#ifdef __PPU__
-#define vec_lvlx __builtin_vec_lvlx
-#define vec_lvlxl __builtin_vec_lvlxl
-#define vec_lvrx __builtin_vec_lvrx
-#define vec_lvrxl __builtin_vec_lvrxl
-#endif
-#define vec_lvsl __builtin_vec_lvsl
-#define vec_lvsr __builtin_vec_lvsr
-#define vec_max __builtin_vec_max
-#define vec_mergee __builtin_vec_vmrgew
-#define vec_mergeh __builtin_vec_mergeh
-#define vec_mergel __builtin_vec_mergel
-#define vec_mergeo __builtin_vec_vmrgow
-#define vec_min __builtin_vec_min
-#define vec_mladd __builtin_vec_mladd
-#define vec_msum __builtin_vec_msum
-#define vec_msums __builtin_vec_msums
-#define vec_mul __builtin_vec_mul
-#define vec_mule __builtin_vec_mule
-#define vec_mulo __builtin_vec_mulo
-#define vec_nor __builtin_vec_nor
-#define vec_or __builtin_vec_or
-#define vec_pack __builtin_vec_pack
-#define vec_packs __builtin_vec_packs
-#define vec_packsu __builtin_vec_packsu
-#define vec_perm __builtin_vec_perm
-#define vec_rl __builtin_vec_rl
-#define vec_sel __builtin_vec_sel
-#define vec_sl __builtin_vec_sl
-#define vec_sld __builtin_vec_sld
-#define vec_sldw __builtin_vsx_xxsldwi
-#define vec_sll __builtin_vec_sll
-#define vec_slo __builtin_vec_slo
-#define vec_splat __builtin_vec_splat
-#define vec_sr __builtin_vec_sr
-#define vec_sra __builtin_vec_sra
-#define vec_srl __builtin_vec_srl
-#define vec_sro __builtin_vec_sro
-#define vec_st __builtin_vec_st
-#define vec_ste __builtin_vec_ste
-#define vec_stl __builtin_vec_stl
-#define vec_stvebx __builtin_vec_stvebx
-#define vec_stvehx __builtin_vec_stvehx
-#define vec_stvewx __builtin_vec_stvewx
-/* Cell only intrinsics.  */
-#ifdef __PPU__
-#define vec_stvlx __builtin_vec_stvlx
-#define vec_stvlxl __builtin_vec_stvlxl
-#define vec_stvrx __builtin_vec_stvrx
-#define vec_stvrxl __builtin_vec_stvrxl
-#endif
-#define vec_sub __builtin_vec_sub
-#define vec_subs __builtin_vec_subs
-#define vec_sum __builtin_vec_sum
-#define vec_unpackh __builtin_vec_unpackh
-#define vec_unpackl __builtin_vec_unpackl
-#define vec_vaddubm __builtin_vec_vaddubm
-#define vec_vaddubs __builtin_vec_vaddubs
-#define vec_vadduhm __builtin_vec_vadduhm
-#define vec_vadduhs __builtin_vec_vadduhs
-#define vec_vadduwm __builtin_vec_vadduwm
-#define vec_vadduws __builtin_vec_vadduws
-#define vec_vcmpequb __builtin_vec_vcmpequb
-#define vec_vcmpequh __builtin_vec_vcmpequh
-#define vec_vcmpequw __builtin_vec_vcmpequw
-#define vec_vmaxub __builtin_vec_vmaxub
-#define vec_vmaxuh __builtin_vec_vmaxuh
-#define vec_vmaxuw __builtin_vec_vmaxuw
-#define vec_vminub __builtin_vec_vminub
-#define vec_vminuh __builtin_vec_vminuh
-#define vec_vminuw __builtin_vec_vminuw
-#define vec_vmrghb __builtin_vec_vmrghb
-#define vec_vmrghh __builtin_vec_vmrghh
-#define vec_vmrghw __builtin_vec_vmrghw
-#define vec_vmrglb __builtin_vec_vmrglb
-#define vec_vmrglh __builtin_vec_vmrglh
-#define vec_vmrglw __builtin_vec_vmrglw
-#define vec_vpkuhum __builtin_vec_vpkuhum
-#define vec_vpkuwum __builtin_vec_vpkuwum
-#define vec_vrlb __builtin_vec_vrlb
-#define vec_vrlh __builtin_vec_vrlh
-#define vec_vrlw __builtin_vec_vrlw
-#define vec_vslb __builtin_vec_vslb
-#define vec_vslh __builtin_vec_vslh
-#define vec_vslw __builtin_vec_vslw
-#define vec_vspltb __builtin_vec_vspltb
-#define vec_vsplth __builtin_vec_vsplth
-#define vec_vspltw __builtin_vec_vspltw
-#define vec_vsrab __builtin_vec_vsrab
-#define vec_vsrah __builtin_vec_vsrah
-#define vec_vsraw __builtin_vec_vsraw
-#define vec_vsrb __builtin_vec_vsrb
-#define vec_vsrh __builtin_vec_vsrh
-#define vec_vsrw __builtin_vec_vsrw
-#define vec_vsububs __builtin_vec_vsububs
-#define vec_vsububm __builtin_vec_vsububm
-#define vec_vsubuhm __builtin_vec_vsubuhm
-#define vec_vsubuhs __builtin_vec_vsubuhs
-#define vec_vsubuwm __builtin_vec_vsubuwm
-#define vec_vsubuws __builtin_vec_vsubuws
-#define vec_xor __builtin_vec_xor
-
-#define vec_extract __builtin_vec_extract
-#define vec_insert __builtin_vec_insert
-#define vec_splats __builtin_vec_splats
-#define vec_promote __builtin_vec_promote
 
 #ifdef __VSX__
 /* VSX additions */
-#define vec_div __builtin_vec_div
-#define vec_mul __builtin_vec_mul
-#define vec_msub __builtin_vec_msub
-#define vec_nmadd __builtin_vec_nmadd
-#define vec_nearbyint __builtin_vec_nearbyint
-#define vec_rint __builtin_vec_rint
-#define vec_sqrt __builtin_vec_sqrt
 #define vec_vsx_ld __builtin_vec_vsx_ld
 #define vec_vsx_st __builtin_vec_vsx_st
-#define vec_xl __builtin_vec_vsx_ld
-#define vec_xl_be __builtin_vec_xl_be
-#define vec_xst __builtin_vec_vsx_st
-#define vec_xst_be __builtin_vec_xst_be
-
-/* Note, xxsldi and xxpermdi were added as __builtin_vsx_<xxx> functions
-   instead of __builtin_vec_<xxx>  */
-#define vec_xxsldwi __builtin_vsx_xxsldwi
-#define vec_xxpermdi __builtin_vsx_xxpermdi
-#endif
-
-#ifdef _ARCH_PWR8
-/* Vector additions added in ISA 2.07.  */
-#define vec_eqv __builtin_vec_eqv
-#define vec_nand __builtin_vec_nand
-#define vec_orc __builtin_vec_orc
-#define vec_vaddcuq __builtin_vec_vaddcuq
-#define vec_vaddudm __builtin_vec_vaddudm
-#define vec_vadduqm __builtin_vec_vadduqm
-#define vec_vbpermq __builtin_vec_vbpermq
-#define vec_bperm __builtin_vec_vbperm_api
-#define vec_vclz __builtin_vec_vclz
-#define vec_cntlz __builtin_vec_vclz
-#define vec_vclzb __builtin_vec_vclzb
-#define vec_vclzd __builtin_vec_vclzd
-#define vec_vclzh __builtin_vec_vclzh
-#define vec_vclzw __builtin_vec_vclzw
-#define vec_vaddecuq __builtin_vec_vaddecuq
-#define vec_vaddeuqm __builtin_vec_vaddeuqm
-#define vec_vsubecuq __builtin_vec_vsubecuq
-#define vec_vsubeuqm __builtin_vec_vsubeuqm
-#define vec_vgbbd __builtin_vec_vgbbd
-#define vec_gb __builtin_vec_vgbbd
-#define vec_vmaxsd __builtin_vec_vmaxsd
-#define vec_vmaxud __builtin_vec_vmaxud
-#define vec_vminsd __builtin_vec_vminsd
-#define vec_vminud __builtin_vec_vminud
-#define vec_vmrgew __builtin_vec_vmrgew
-#define vec_vmrgow __builtin_vec_vmrgow
-#define vec_vpksdss __builtin_vec_vpksdss
-#define vec_vpksdus __builtin_vec_vpksdus
-#define vec_vpkudum __builtin_vec_vpkudum
-#define vec_vpkudus __builtin_vec_vpkudus
-#define vec_vpopcnt __builtin_vec_vpopcnt
-#define vec_vpopcntb __builtin_vec_vpopcntb
-#define vec_vpopcntd __builtin_vec_vpopcntd
-#define vec_vpopcnth __builtin_vec_vpopcnth
-#define vec_vpopcntw __builtin_vec_vpopcntw
-#define vec_popcnt __builtin_vec_vpopcntu
-#define vec_vrld __builtin_vec_vrld
-#define vec_vsld __builtin_vec_vsld
-#define vec_vsrad __builtin_vec_vsrad
-#define vec_vsrd __builtin_vec_vsrd
-#define vec_vsubcuq __builtin_vec_vsubcuq
-#define vec_vsubudm __builtin_vec_vsubudm
-#define vec_vsubuqm __builtin_vec_vsubuqm
-#define vec_vupkhsw __builtin_vec_vupkhsw
-#define vec_vupklsw __builtin_vec_vupklsw
-#define vec_revb __builtin_vec_revb
-#define vec_sbox_be __builtin_crypto_vsbox_be
-#define vec_cipher_be __builtin_crypto_vcipher_be
-#define vec_cipherlast_be __builtin_crypto_vcipherlast_be
-#define vec_ncipher_be __builtin_crypto_vncipher_be
-#define vec_ncipherlast_be __builtin_crypto_vncipherlast_be
-#endif
-
-#ifdef __POWER9_VECTOR__
-/* Vector additions added in ISA 3.0.  */
-#define vec_first_match_index __builtin_vec_first_match_index
-#define vec_first_match_or_eos_index __builtin_vec_first_match_or_eos_index
-#define vec_first_mismatch_index __builtin_vec_first_mismatch_index
-#define vec_first_mismatch_or_eos_index __builtin_vec_first_mismatch_or_eos_index
-#define vec_pack_to_short_fp32 __builtin_vec_convert_4f32_8f16
-#define vec_parity_lsbb __builtin_vec_vparity_lsbb
-#define vec_vctz __builtin_vec_vctz
-#define vec_cnttz __builtin_vec_vctz
-#define vec_vctzb __builtin_vec_vctzb
-#define vec_vctzd __builtin_vec_vctzd
-#define vec_vctzh __builtin_vec_vctzh
-#define vec_vctzw __builtin_vec_vctzw
-#define vec_extract4b __builtin_vec_extract4b
-#define vec_insert4b __builtin_vec_insert4b
-#define vec_vprtyb __builtin_vec_vprtyb
-#define vec_vprtybd __builtin_vec_vprtybd
-#define vec_vprtybw __builtin_vec_vprtybw
-
-#ifdef _ARCH_PPC64
-#define vec_vprtybq __builtin_vec_vprtybq
-#endif
-
-#define vec_absd __builtin_vec_vadu
-#define vec_absdb __builtin_vec_vadub
-#define vec_absdh __builtin_vec_vaduh
-#define vec_absdw __builtin_vec_vaduw
-
-#define vec_slv __builtin_vec_vslv
-#define vec_srv __builtin_vec_vsrv
-
-#define vec_extract_exp __builtin_vec_extract_exp
-#define vec_extract_sig __builtin_vec_extract_sig
-#define vec_insert_exp __builtin_vec_insert_exp
-#define vec_test_data_class __builtin_vec_test_data_class
-
-#define vec_extract_fp_from_shorth __builtin_vec_vextract_fp_from_shorth
-#define vec_extract_fp_from_shortl __builtin_vec_vextract_fp_from_shortl
-#define vec_extract_fp32_from_shorth __builtin_vec_vextract_fp_from_shorth
-#define vec_extract_fp32_from_shortl __builtin_vec_vextract_fp_from_shortl
-
-#define scalar_extract_exp __builtin_vec_scalar_extract_exp
-#define scalar_extract_sig __builtin_vec_scalar_extract_sig
-#define scalar_insert_exp __builtin_vec_scalar_insert_exp
-#define scalar_test_data_class __builtin_vec_scalar_test_data_class
-#define scalar_test_neg __builtin_vec_scalar_test_neg
-
-#define scalar_cmp_exp_gt __builtin_vec_scalar_cmp_exp_gt
-#define scalar_cmp_exp_lt __builtin_vec_scalar_cmp_exp_lt
-#define scalar_cmp_exp_eq __builtin_vec_scalar_cmp_exp_eq
-#define scalar_cmp_exp_unordered __builtin_vec_scalar_cmp_exp_unordered
-
-#ifdef _ARCH_PPC64
-#define vec_xl_len __builtin_vec_lxvl
-#define vec_xst_len __builtin_vec_stxvl
-#define vec_xl_len_r __builtin_vec_xl_len_r
-#define vec_xst_len_r __builtin_vec_xst_len_r
-#endif
-
-#define vec_cmpnez __builtin_vec_vcmpnez
-
-#define vec_cntlz_lsbb __builtin_vec_vclzlsbb
-#define vec_cnttz_lsbb __builtin_vec_vctzlsbb
-
-#define vec_test_lsbb_all_ones __builtin_vec_xvtlsbb_all_ones
-#define vec_test_lsbb_all_zeros __builtin_vec_xvtlsbb_all_zeros
-
-#define vec_xlx __builtin_vec_vextulx
-#define vec_xrx __builtin_vec_vexturx
-#define vec_signexti  __builtin_vec_vsignexti
-#define vec_signextll __builtin_vec_vsignextll
+#define __builtin_vec_xl __builtin_vec_vsx_ld
+#define __builtin_vec_xst __builtin_vec_vsx_st
 
-#endif
-
-/* BCD builtins, map ABI builtin name to existing builtin name.  */
-#define __builtin_bcdadd     __builtin_vec_bcdadd
-#define __builtin_bcdadd_lt  __builtin_vec_bcdadd_lt
-#define __builtin_bcdadd_eq  __builtin_vec_bcdadd_eq
-#define __builtin_bcdadd_gt  __builtin_vec_bcdadd_gt
 #define __builtin_bcdadd_ofl __builtin_vec_bcdadd_ov
-#define __builtin_bcdadd_ov  __builtin_vec_bcdadd_ov
-#define __builtin_bcdsub     __builtin_vec_bcdsub
-#define __builtin_bcdsub_lt  __builtin_vec_bcdsub_lt
-#define __builtin_bcdsub_eq  __builtin_vec_bcdsub_eq
-#define __builtin_bcdsub_gt  __builtin_vec_bcdsub_gt
 #define __builtin_bcdsub_ofl __builtin_vec_bcdsub_ov
-#define __builtin_bcdsub_ov  __builtin_vec_bcdsub_ov
-#define __builtin_bcdinvalid __builtin_vec_bcdinvalid
-#define __builtin_bcdmul10   __builtin_vec_bcdmul10
-#define __builtin_bcddiv10   __builtin_vec_bcddiv10
-#define __builtin_bcd2dfp    __builtin_vec_denb2dfp
 #define __builtin_bcdcmpeq(a,b)   __builtin_vec_bcdsub_eq(a,b,0)
 #define __builtin_bcdcmpgt(a,b)   __builtin_vec_bcdsub_gt(a,b,0)
 #define __builtin_bcdcmplt(a,b)   __builtin_vec_bcdsub_lt(a,b,0)
 #define __builtin_bcdcmpge(a,b)   __builtin_vec_bcdsub_ge(a,b,0)
 #define __builtin_bcdcmple(a,b)   __builtin_vec_bcdsub_le(a,b,0)
+#endif
 
+/* For _ARCH_PWR10.  Always define to support #pragma GCC target.  */
+#define __builtin_vec_se_lxvrx __builtin_vec_xl_sext
+#define __builtin_vec_tr_stxvrx __builtin_vec_xst_trunc
+#define __builtin_vec_ze_lxvrx __builtin_vec_xl_zext
+#define __builtin_vsx_xxpermx __builtin_vec_xxpermx
 
 /* Predicates.
    For C++, we use templates in order to allow non-parenthesized arguments.
@@ -700,14 +317,9 @@ __altivec_scalar_pred(vec_any_nle,
 #define vec_any_nle(a1, a2) __builtin_vec_vcmpge_p (__CR6_LT_REV, (a2), (a1))
 #endif
 
-/* These do not accept vectors, so they do not have a __builtin_vec_*
-   counterpart.  */
+/* Miscellaneous definitions.  */
 #define vec_dss(x) __builtin_altivec_dss((x))
 #define vec_dssall() __builtin_altivec_dssall ()
-#define vec_mfvscr() ((__vector unsigned short) __builtin_altivec_mfvscr ())
-#define vec_splat_s8(x) __builtin_altivec_vspltisb ((x))
-#define vec_splat_s16(x) __builtin_altivec_vspltish ((x))
-#define vec_splat_s32(x) __builtin_altivec_vspltisw ((x))
 #define vec_splat_u8(x) ((__vector unsigned char) vec_splat_s8 ((x)))
 #define vec_splat_u16(x) ((__vector unsigned short) vec_splat_s16 ((x)))
 #define vec_splat_u32(x) ((__vector unsigned int) vec_splat_s32 ((x)))
@@ -716,59 +328,4 @@ __altivec_scalar_pred(vec_any_nle,
    to #define vec_step to __builtin_vec_step.  */
 #define vec_step(x) __builtin_vec_step (* (__typeof__ (x) *) 0)
 
-#ifdef _ARCH_PWR10
-#define vec_signextq  __builtin_vec_vsignextq
-#define vec_dive __builtin_vec_dive
-#define vec_mod  __builtin_vec_mod
-
-/* May modify these macro definitions if future capabilities overload
-   with support for different vector argument and result types.  */
-#define vec_cntlzm(a, b)	__builtin_altivec_vclzdm (a, b)
-#define vec_cnttzm(a, b)	__builtin_altivec_vctzdm (a, b)
-#define vec_pdep(a, b)	__builtin_altivec_vpdepd (a, b)
-#define vec_pext(a, b)	__builtin_altivec_vpextd (a, b)
-#define vec_cfuge(a, b)	__builtin_altivec_vcfuged (a, b)
-#define vec_genpcvm(a, b)	__builtin_vec_xxgenpcvm (a, b)
-
-/* Overloaded built-in functions for ISA 3.1.  */
-#define vec_extractl(a, b, c)	__builtin_vec_extractl (a, b, c)
-#define vec_extracth(a, b, c)	__builtin_vec_extracth (a, b, c)
-#define vec_insertl(a, b, c)   __builtin_vec_insertl (a, b, c)
-#define vec_inserth(a, b, c)   __builtin_vec_inserth (a, b, c)
-#define vec_replace_elt(a, b, c)       __builtin_vec_replace_elt (a, b, c)
-#define vec_replace_unaligned(a, b, c) __builtin_vec_replace_un (a, b, c)
-#define vec_sldb(a, b, c)      __builtin_vec_sldb (a, b, c)
-#define vec_srdb(a, b, c)      __builtin_vec_srdb (a, b, c)
-#define vec_splati(a)  __builtin_vec_xxspltiw (a)
-#define vec_splatid(a) __builtin_vec_xxspltid (a)
-#define vec_splati_ins(a, b, c)        __builtin_vec_xxsplti32dx (a, b, c)
-#define vec_blendv(a, b, c)    __builtin_vec_xxblend (a, b, c)
-#define vec_permx(a, b, c, d)  __builtin_vec_xxpermx (a, b, c, d)
-
-#define vec_gnb(a, b)	__builtin_vec_gnb (a, b)
-#define vec_clrl(a, b)	__builtin_vec_clrl (a, b)
-#define vec_clrr(a, b)	__builtin_vec_clrr (a, b)
-#define vec_ternarylogic(a, b, c, d)	__builtin_vec_xxeval (a, b, c, d)
-
-#define vec_strir(a)	__builtin_vec_strir (a)
-#define vec_stril(a)	__builtin_vec_stril (a)
-
-#define vec_strir_p(a)	__builtin_vec_strir_p (a)
-#define vec_stril_p(a)	__builtin_vec_stril_p (a)
-
-#define vec_mulh(a, b) __builtin_vec_mulh ((a), (b))
-#define vec_dive(a, b) __builtin_vec_dive ((a), (b))
-#define vec_mod(a, b) __builtin_vec_mod ((a), (b))
-
-/* VSX Mask Manipulation builtin. */
-#define vec_genbm __builtin_vec_mtvsrbm
-#define vec_genhm __builtin_vec_mtvsrhm
-#define vec_genwm __builtin_vec_mtvsrwm
-#define vec_gendm __builtin_vec_mtvsrdm
-#define vec_genqm __builtin_vec_mtvsrqm
-#define vec_cntm __builtin_vec_cntm
-#define vec_expandm __builtin_vec_vexpandm
-#define vec_extractm __builtin_vec_vextractm
-#endif
-
 #endif /* _ALTIVEC_H */
-- 
2.27.0


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 16/18] rs6000: Test case adjustments
  2021-09-01 16:13 [PATCHv5 00/18] Replace the Power target-specific builtin machinery Bill Schmidt
                   ` (14 preceding siblings ...)
  2021-09-01 16:13 ` [PATCH 15/18] rs6000: Update altivec.h for automated interfaces Bill Schmidt
@ 2021-09-01 16:13 ` Bill Schmidt
  2021-11-05 22:37   ` Segher Boessenkool
  2021-09-01 16:13 ` [PATCH 17/18] rs6000: Enable the new builtin support Bill Schmidt
                   ` (2 subsequent siblings)
  18 siblings, 1 reply; 52+ messages in thread
From: Bill Schmidt @ 2021-09-01 16:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc

2021-07-19  Bill Schmidt  <wschmidt@linux.ibm.com>

gcc/testsuite/
	* gcc.target/powerpc/bfp/scalar-extract-exp-2.c: Adjust.
	* gcc.target/powerpc/bfp/scalar-extract-sig-2.c: Adjust.
	* gcc.target/powerpc/bfp/scalar-insert-exp-2.c: Adjust.
	* gcc.target/powerpc/bfp/scalar-insert-exp-5.c: Adjust.
	* gcc.target/powerpc/bfp/scalar-insert-exp-8.c: Adjust.
	* gcc.target/powerpc/bfp/scalar-test-neg-2.c: Adjust.
	* gcc.target/powerpc/bfp/scalar-test-neg-3.c: Adjust.
	* gcc.target/powerpc/bfp/scalar-test-neg-5.c: Adjust.
	* gcc.target/powerpc/byte-in-set-2.c: Adjust.
	* gcc.target/powerpc/cmpb-2.c: Adjust.
	* gcc.target/powerpc/cmpb32-2.c: Adjust.
	* gcc.target/powerpc/crypto-builtin-2.c: Adjust.
	* gcc.target/powerpc/fold-vec-splat-floatdouble.c: Adjust.
	* gcc.target/powerpc/fold-vec-splat-longlong.c: Adjust.
	* gcc.target/powerpc/fold-vec-splat-misc-invalid.c: Adjust.
	* gcc.target/powerpc/int_128bit-runnable.c: Adjust.
	* gcc.target/powerpc/p8vector-builtin-8.c: Adjust.
	* gcc.target/powerpc/pr80315-1.c: Adjust.
	* gcc.target/powerpc/pr80315-2.c: Adjust.
	* gcc.target/powerpc/pr80315-3.c: Adjust.
	* gcc.target/powerpc/pr80315-4.c: Adjust.
	* gcc.target/powerpc/pr88100.c: Adjust.
	* gcc.target/powerpc/pragma_misc9.c: Adjust.
	* gcc.target/powerpc/pragma_power8.c: Adjust.
	* gcc.target/powerpc/pragma_power9.c: Adjust.
	* gcc.target/powerpc/test_fpscr_drn_builtin_error.c: Adjust.
	* gcc.target/powerpc/test_fpscr_rn_builtin_error.c: Adjust.
	* gcc.target/powerpc/test_mffsl.c: Adjust.
	* gcc.target/powerpc/vec-gnb-2.c: Adjust.
	* gcc.target/powerpc/vsu/vec-all-nez-7.c: Adjust.
	* gcc.target/powerpc/vsu/vec-any-eqz-7.c: Adjust.
	* gcc.target/powerpc/vsu/vec-cmpnez-7.c: Adjust.
	* gcc.target/powerpc/vsu/vec-cntlz-lsbb-2.c: Adjust.
	* gcc.target/powerpc/vsu/vec-cnttz-lsbb-2.c: Adjust.
	* gcc.target/powerpc/vsu/vec-xst-len-12.c: Adjust.
	* gcc.target/powerpc/vsu/vec-xst-len-13.c: Adjust.
---
 .../gcc.target/powerpc/bfp/scalar-extract-exp-2.c  |  2 +-
 .../gcc.target/powerpc/bfp/scalar-extract-sig-2.c  |  2 +-
 .../gcc.target/powerpc/bfp/scalar-insert-exp-2.c   |  2 +-
 .../gcc.target/powerpc/bfp/scalar-insert-exp-5.c   |  2 +-
 .../gcc.target/powerpc/bfp/scalar-insert-exp-8.c   |  2 +-
 .../gcc.target/powerpc/bfp/scalar-test-neg-2.c     |  2 +-
 .../gcc.target/powerpc/bfp/scalar-test-neg-3.c     |  2 +-
 .../gcc.target/powerpc/bfp/scalar-test-neg-5.c     |  2 +-
 gcc/testsuite/gcc.target/powerpc/byte-in-set-2.c   |  2 +-
 gcc/testsuite/gcc.target/powerpc/cmpb-2.c          |  2 +-
 gcc/testsuite/gcc.target/powerpc/cmpb32-2.c        |  2 +-
 .../gcc.target/powerpc/crypto-builtin-2.c          | 14 +++++++-------
 .../powerpc/fold-vec-splat-floatdouble.c           |  4 ++--
 .../gcc.target/powerpc/fold-vec-splat-longlong.c   | 10 +++-------
 .../powerpc/fold-vec-splat-misc-invalid.c          |  8 ++++----
 .../gcc.target/powerpc/int_128bit-runnable.c       |  6 +++---
 .../gcc.target/powerpc/p8vector-builtin-8.c        |  1 +
 gcc/testsuite/gcc.target/powerpc/pr80315-1.c       |  2 +-
 gcc/testsuite/gcc.target/powerpc/pr80315-2.c       |  2 +-
 gcc/testsuite/gcc.target/powerpc/pr80315-3.c       |  2 +-
 gcc/testsuite/gcc.target/powerpc/pr80315-4.c       |  2 +-
 gcc/testsuite/gcc.target/powerpc/pr88100.c         | 12 ++++++------
 gcc/testsuite/gcc.target/powerpc/pragma_misc9.c    |  2 +-
 gcc/testsuite/gcc.target/powerpc/pragma_power8.c   |  2 ++
 gcc/testsuite/gcc.target/powerpc/pragma_power9.c   |  3 +++
 .../powerpc/test_fpscr_drn_builtin_error.c         |  4 ++--
 .../powerpc/test_fpscr_rn_builtin_error.c          | 12 ++++++------
 gcc/testsuite/gcc.target/powerpc/test_mffsl.c      |  3 ++-
 gcc/testsuite/gcc.target/powerpc/vec-gnb-2.c       |  2 +-
 .../gcc.target/powerpc/vsu/vec-all-nez-7.c         |  2 +-
 .../gcc.target/powerpc/vsu/vec-any-eqz-7.c         |  2 +-
 .../gcc.target/powerpc/vsu/vec-cmpnez-7.c          |  2 +-
 .../gcc.target/powerpc/vsu/vec-cntlz-lsbb-2.c      |  2 +-
 .../gcc.target/powerpc/vsu/vec-cnttz-lsbb-2.c      |  2 +-
 .../gcc.target/powerpc/vsu/vec-xl-len-13.c         |  2 +-
 .../gcc.target/powerpc/vsu/vec-xst-len-12.c        |  2 +-
 36 files changed, 65 insertions(+), 62 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-2.c b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-2.c
index 922180675fc..53b67c95cf9 100644
--- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-exp-2.c
@@ -14,7 +14,7 @@ get_exponent (double *p)
 {
   double source = *p;
 
-  return scalar_extract_exp (source);	/* { dg-error "'__builtin_vec_scalar_extract_exp' is not supported in this compiler configuration" } */
+  return scalar_extract_exp (source);	/* { dg-error "'__builtin_vsx_scalar_extract_exp' requires the" } */
 }
 
 
diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-2.c b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-2.c
index e24d4bd23fe..39ee74c94dc 100644
--- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-extract-sig-2.c
@@ -12,5 +12,5 @@ get_significand (double *p)
 {
   double source = *p;
 
-  return __builtin_vec_scalar_extract_sig (source); /* { dg-error "'__builtin_vec_scalar_extract_sig' is not supported in this compiler configuration" } */
+  return __builtin_vec_scalar_extract_sig (source); /* { dg-error "'__builtin_vsx_scalar_extract_sig' requires the" } */
 }
diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-2.c b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-2.c
index feb943104da..efd69725905 100644
--- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-2.c
@@ -16,5 +16,5 @@ insert_exponent (unsigned long long int *significand_p,
   unsigned long long int significand = *significand_p;
   unsigned long long int exponent = *exponent_p;
 
-  return scalar_insert_exp (significand, exponent); /* { dg-error "'__builtin_vec_scalar_insert_exp' is not supported in this compiler configuration" } */
+  return scalar_insert_exp (significand, exponent); /* { dg-error "'__builtin_vsx_scalar_insert_exp' requires the" } */
 }
diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-5.c b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-5.c
index 0e5683d5d1a..f85966a6fdf 100644
--- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-5.c
+++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-5.c
@@ -16,5 +16,5 @@ insert_exponent (double *significand_p,
   double significand = *significand_p;
   unsigned long long int exponent = *exponent_p;
 
-  return scalar_insert_exp (significand, exponent); /* { dg-error "'__builtin_vec_scalar_insert_exp' is not supported in this compiler configuration" } */
+  return scalar_insert_exp (significand, exponent); /* { dg-error "'__builtin_vsx_scalar_insert_exp_dp' requires the" } */
 }
diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-8.c b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-8.c
index bd68f770985..b1be8284b4e 100644
--- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-8.c
+++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-insert-exp-8.c
@@ -16,5 +16,5 @@ insert_exponent (unsigned __int128 *significand_p, /* { dg-error "'__int128' is
   unsigned __int128 significand = *significand_p;  /* { dg-error "'__int128' is not supported on this target" } */
   unsigned long long int exponent = *exponent_p;
 
-  return scalar_insert_exp (significand, exponent); /* { dg-error "'__builtin_vec_scalar_insert_exp' is not supported in this compiler configuration" } */
+  return scalar_insert_exp (significand, exponent); /* { dg-error "'__builtin_vsx_scalar_insert_exp' requires the" } */
 }
diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-2.c b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-2.c
index 7d2b4deefc3..46d743a899b 100644
--- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-2.c
@@ -10,5 +10,5 @@ test_neg (float *p)
 {
   float source = *p;
 
-  return __builtin_vec_scalar_test_neg_sp (source); /* { dg-error "'__builtin_vsx_scalar_test_neg_sp' requires" } */
+  return __builtin_vec_scalar_test_neg (source); /* { dg-error "'__builtin_vsx_scalar_test_neg_sp' requires" } */
 }
diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-3.c b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-3.c
index b503dfa8b56..bfc892b116e 100644
--- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-3.c
@@ -10,5 +10,5 @@ test_neg (double *p)
 {
   double source = *p;
 
-  return __builtin_vec_scalar_test_neg_dp (source); /* { dg-error "'__builtin_vsx_scalar_test_neg_dp' requires" } */
+  return __builtin_vec_scalar_test_neg (source); /* { dg-error "'__builtin_vsx_scalar_test_neg_dp' requires" } */
 }
diff --git a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-5.c b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-5.c
index bab86040a7b..8c55c1cfb5c 100644
--- a/gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-5.c
+++ b/gcc/testsuite/gcc.target/powerpc/bfp/scalar-test-neg-5.c
@@ -10,5 +10,5 @@ test_neg (__ieee128 *p)
 {
   __ieee128 source = *p;
 
-  return __builtin_vec_scalar_test_neg_qp (source); /* { dg-error "'__builtin_vsx_scalar_test_neg_qp' requires" } */
+  return __builtin_vec_scalar_test_neg (source); /* { dg-error "'__builtin_vsx_scalar_test_neg_qp' requires" } */
 }
diff --git a/gcc/testsuite/gcc.target/powerpc/byte-in-set-2.c b/gcc/testsuite/gcc.target/powerpc/byte-in-set-2.c
index 44cc7782760..4c676ba356d 100644
--- a/gcc/testsuite/gcc.target/powerpc/byte-in-set-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/byte-in-set-2.c
@@ -10,5 +10,5 @@
 int
 test_byte_in_set (unsigned char b, unsigned long long set_members)
 {
-  return __builtin_byte_in_set (b, set_members); /* { dg-warning "implicit declaration of function" } */
+  return __builtin_byte_in_set (b, set_members); /* { dg-error "'__builtin_scalar_byte_in_set' requires the" } */
 }
diff --git a/gcc/testsuite/gcc.target/powerpc/cmpb-2.c b/gcc/testsuite/gcc.target/powerpc/cmpb-2.c
index 113ab6a5f99..02b84d0731d 100644
--- a/gcc/testsuite/gcc.target/powerpc/cmpb-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/cmpb-2.c
@@ -8,7 +8,7 @@ void abort ();
 unsigned long long int
 do_compare (unsigned long long int a, unsigned long long int b)
 {
-  return __builtin_cmpb (a, b);	/* { dg-warning "implicit declaration of function '__builtin_cmpb'" } */
+  return __builtin_cmpb (a, b);	/* { dg-error "'__builtin_p6_cmpb' requires the '-mcpu=power6' option" } */
 }
 
 void
diff --git a/gcc/testsuite/gcc.target/powerpc/cmpb32-2.c b/gcc/testsuite/gcc.target/powerpc/cmpb32-2.c
index 37b54745e0e..d4264ab6e7d 100644
--- a/gcc/testsuite/gcc.target/powerpc/cmpb32-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/cmpb32-2.c
@@ -7,7 +7,7 @@ void abort ();
 unsigned int
 do_compare (unsigned int a, unsigned int b)
 {
-  return __builtin_cmpb (a, b);  /* { dg-warning "implicit declaration of function '__builtin_cmpb'" } */
+  return __builtin_cmpb (a, b);  /* { dg-error "'__builtin_p6_cmpb_32' requires the '-mcpu=power6' option" } */
 }
 
 void
diff --git a/gcc/testsuite/gcc.target/powerpc/crypto-builtin-2.c b/gcc/testsuite/gcc.target/powerpc/crypto-builtin-2.c
index 4066b1228dc..b3a6c737a3e 100644
--- a/gcc/testsuite/gcc.target/powerpc/crypto-builtin-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/crypto-builtin-2.c
@@ -5,21 +5,21 @@
 
 void use_builtins_d (__vector unsigned long long *p, __vector unsigned long long *q, __vector unsigned long long *r, __vector unsigned long long *s)
 {
-  p[0] = __builtin_crypto_vcipher (q[0], r[0]); /* { dg-error "'__builtin_crypto_vcipher' is not supported with the current options" } */
-  p[1] = __builtin_crypto_vcipherlast (q[1], r[1]); /* { dg-error "'__builtin_crypto_vcipherlast' is not supported with the current options" } */
-  p[2] = __builtin_crypto_vncipher (q[2], r[2]); /* { dg-error "'__builtin_crypto_vncipher' is not supported with the current options" } */
-  p[3] = __builtin_crypto_vncipherlast (q[3], r[3]); /* { dg-error "'__builtin_crypto_vncipherlast' is not supported with the current options" } */
+  p[0] = __builtin_crypto_vcipher (q[0], r[0]); /* { dg-error "'__builtin_crypto_vcipher' requires the '-mcrypto' option" } */
+  p[1] = __builtin_crypto_vcipherlast (q[1], r[1]); /* { dg-error "'__builtin_crypto_vcipherlast' requires the '-mcrypto' option" } */
+  p[2] = __builtin_crypto_vncipher (q[2], r[2]); /* { dg-error "'__builtin_crypto_vncipher' requires the '-mcrypto' option" } */
+  p[3] = __builtin_crypto_vncipherlast (q[3], r[3]); /* { dg-error "'__builtin_crypto_vncipherlast' requires the '-mcrypto' option" } */
   p[4] = __builtin_crypto_vpermxor (q[4], r[4], s[4]);
   p[5] = __builtin_crypto_vpmsumd (q[5], r[5]);
-  p[6] = __builtin_crypto_vshasigmad (q[6], 1, 15); /* { dg-error "'__builtin_crypto_vshasigmad' is not supported with the current options" } */
-  p[7] = __builtin_crypto_vsbox (q[7]); /* { dg-error "'__builtin_crypto_vsbox' is not supported with the current options" } */
+  p[6] = __builtin_crypto_vshasigmad (q[6], 1, 15); /* { dg-error "'__builtin_crypto_vshasigmad' requires the '-mcrypto' option" } */
+  p[7] = __builtin_crypto_vsbox (q[7]); /* { dg-error "'__builtin_crypto_vsbox' requires the '-mcrypto' option" } */
 }
 
 void use_builtins_w (__vector unsigned int *p, __vector unsigned int *q, __vector unsigned int *r, __vector unsigned int *s)
 {
   p[0] = __builtin_crypto_vpermxor (q[0], r[0], s[0]);
   p[1] = __builtin_crypto_vpmsumw (q[1], r[1]);
-  p[2] = __builtin_crypto_vshasigmaw (q[2], 1, 15); /* { dg-error "'__builtin_crypto_vshasigmaw' is not supported with the current options" } */
+  p[2] = __builtin_crypto_vshasigmaw (q[2], 1, 15); /* { dg-error "'__builtin_crypto_vshasigmaw' requires the '-mcrypto' option" } */
 }
 
 void use_builtins_h (__vector unsigned short *p, __vector unsigned short *q, __vector unsigned short *r, __vector unsigned short *s)
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-splat-floatdouble.c b/gcc/testsuite/gcc.target/powerpc/fold-vec-splat-floatdouble.c
index 76619177388..b95fa324633 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-splat-floatdouble.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-splat-floatdouble.c
@@ -18,7 +18,7 @@ vector float test_fc ()
 vector double testd_00 (vector double x) { return vec_splat (x, 0b00000); }
 vector double testd_01 (vector double x) { return vec_splat (x, 0b00001); }
 vector double test_dc ()
-{ const vector double y = { 3.0, 5.0 }; return vec_splat (y, 0b00010); }
+{ const vector double y = { 3.0, 5.0 }; return vec_splat (y, 0b00001); }
 
 /* If the source vector is a known constant, we will generate a load or possibly
    XXSPLTIW.  */
@@ -28,5 +28,5 @@ vector double test_dc ()
 /* { dg-final { scan-assembler-times {\mvspltw\M|\mxxspltw\M} 3 } } */
 
 /* For double types, we will generate xxpermdi instructions.  */
-/* { dg-final { scan-assembler-times "xxpermdi" 3 } } */
+/* { dg-final { scan-assembler-times "xxpermdi" 2 } } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-splat-longlong.c b/gcc/testsuite/gcc.target/powerpc/fold-vec-splat-longlong.c
index b95b987abce..3fa1f05d6f5 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-splat-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-splat-longlong.c
@@ -9,23 +9,19 @@
 
 vector bool long long testb_00 (vector bool long long x) { return vec_splat (x, 0b00000); }
 vector bool long long testb_01 (vector bool long long x) { return vec_splat (x, 0b00001); }
-vector bool long long testb_02 (vector bool long long x) { return vec_splat (x, 0b00010); }
 
 vector signed long long tests_00 (vector signed long long x) { return vec_splat (x, 0b00000); }
 vector signed long long tests_01 (vector signed long long x) { return vec_splat (x, 0b00001); }
-vector signed long long tests_02 (vector signed long long x) { return vec_splat (x, 0b00010); }
 
 vector unsigned long long testu_00 (vector unsigned long long x) { return vec_splat (x, 0b00000); }
 vector unsigned long long testu_01 (vector unsigned long long x) { return vec_splat (x, 0b00001); }
-vector unsigned long long testu_02 (vector unsigned long long x) { return vec_splat (x, 0b00010); }
 
 /* Similar test as above, but the source vector is a known constant. */
-vector bool long long test_bll () { const vector bool long long y = {12, 23}; return vec_splat (y, 0b00010); }
-vector signed long long test_sll () { const vector signed long long y = {34, 45}; return vec_splat (y, 0b00010); }
-vector unsigned long long test_ull () { const vector unsigned long long y = {56, 67}; return vec_splat (y, 0b00010); }
+vector bool long long test_bll () { const vector bool long long y = {12, 23}; return vec_splat (y, 0b00001); }
+vector signed long long test_sll () { const vector signed long long y = {34, 45}; return vec_splat (y, 0b00001); }
 
 /* Assorted load instructions for the initialization with known constants. */
-/* { dg-final { scan-assembler-times {\mlvx\M|\mlxvd2x\M|\mlxv\M|\mplxv\M} 3 } } */
+/* { dg-final { scan-assembler-times {\mlvx\M|\mlxvd2x\M|\mlxv\M|\mplxv\M|\mxxspltib\M} 2 } } */
 
 /* xxpermdi for vec_splat of long long vectors.
  At the time of this writing, the number of xxpermdi instructions
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-splat-misc-invalid.c b/gcc/testsuite/gcc.target/powerpc/fold-vec-splat-misc-invalid.c
index 20f5b05561e..263a1723d31 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-splat-misc-invalid.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-splat-misc-invalid.c
@@ -10,24 +10,24 @@
 vector signed short
 testss_1 (unsigned int ui)
 {
-  return vec_splat_s16 (ui);/* { dg-error "argument 1 must be a 5-bit signed literal" } */
+  return vec_splat_s16 (ui);/* { dg-error "argument 1 must be a literal between -16 and 15, inclusive" } */
 }
 
 vector unsigned short
 testss_2 (signed int si)
 {
-  return vec_splat_u16 (si);/* { dg-error "argument 1 must be a 5-bit signed literal" } */
+  return vec_splat_u16 (si);/* { dg-error "argument 1 must be a literal between -16 and 15, inclusive" } */
 }
 
 vector signed char
 testsc_1 (unsigned int ui)
 {
-  return vec_splat_s8 (ui); /* { dg-error "argument 1 must be a 5-bit signed literal" } */
+  return vec_splat_s8 (ui); /* { dg-error "argument 1 must be a literal between -16 and 15, inclusive" } */
 }
 
 vector unsigned char
 testsc_2 (signed int si)
 {
-  return vec_splat_u8 (si);/* { dg-error "argument 1 must be a 5-bit signed literal" } */
+  return vec_splat_u8 (si);/* { dg-error "argument 1 must be a literal between -16 and 15, inclusive" } */
 }
 
diff --git a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
index 1255ee9f0ab..1356793635a 100644
--- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
@@ -11,9 +11,9 @@
 /* { dg-final { scan-assembler-times {\mvrlq\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mvrlqnm\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mvrlqmi\M} 2 } } */
-/* { dg-final { scan-assembler-times {\mvcmpequq\M} 16 } } */
-/* { dg-final { scan-assembler-times {\mvcmpgtsq\M} 16 } } */
-/* { dg-final { scan-assembler-times {\mvcmpgtuq\M} 16 } } */
+/* { dg-final { scan-assembler-times {\mvcmpequq\M} 24 } } */
+/* { dg-final { scan-assembler-times {\mvcmpgtsq\M} 26 } } */
+/* { dg-final { scan-assembler-times {\mvcmpgtuq\M} 26 } } */
 /* { dg-final { scan-assembler-times {\mvmuloud\M} 1 } } */
 /* { dg-final { scan-assembler-times {\mvmulesd\M} 1 } } */
 /* { dg-final { scan-assembler-times {\mvmulosd\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/p8vector-builtin-8.c b/gcc/testsuite/gcc.target/powerpc/p8vector-builtin-8.c
index 0cfbe68c3a4..1d09aad9fbf 100644
--- a/gcc/testsuite/gcc.target/powerpc/p8vector-builtin-8.c
+++ b/gcc/testsuite/gcc.target/powerpc/p8vector-builtin-8.c
@@ -126,6 +126,7 @@ void foo (vector signed char *vscr,
 /* { dg-final { scan-assembler-times "vsubcuw" 4 } } */
 /* { dg-final { scan-assembler-times "vsubuwm" 4 } } */
 /* { dg-final { scan-assembler-times "vbpermq" 2 } } */
+/* { dg-final { scan-assembler-times "vbpermd" 0 } } */
 /* { dg-final { scan-assembler-times "xxleqv" 4 } } */
 /* { dg-final { scan-assembler-times "vgbbd" 1 } } */
 /* { dg-final { scan-assembler-times "xxlnand" 4 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/pr80315-1.c b/gcc/testsuite/gcc.target/powerpc/pr80315-1.c
index e2db0ff4b5f..f37f1f169a2 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr80315-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr80315-1.c
@@ -10,6 +10,6 @@ main()
   int mask;
 
   /* Argument 2 must be 0 or 1.  Argument 3 must be in range 0..15.  */
-  res = __builtin_crypto_vshasigmaw (test, 1, 0xff); /* { dg-error {argument 3 must be in the range \[0, 15\]} } */
+  res = __builtin_crypto_vshasigmaw (test, 1, 0xff); /* { dg-error {argument 3 must be a 4-bit unsigned literal} } */
   return 0;
 }
diff --git a/gcc/testsuite/gcc.target/powerpc/pr80315-2.c b/gcc/testsuite/gcc.target/powerpc/pr80315-2.c
index 144b705c012..0819a0511b7 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr80315-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr80315-2.c
@@ -10,6 +10,6 @@ main ()
   int mask;
 
   /* Argument 2 must be 0 or 1.  Argument 3 must be in range 0..15.  */
-  res = __builtin_crypto_vshasigmad (test, 1, 0xff); /* { dg-error {argument 3 must be in the range \[0, 15\]} } */
+  res = __builtin_crypto_vshasigmad (test, 1, 0xff); /* { dg-error {argument 3 must be a 4-bit unsigned literal} } */
   return 0;
 }
diff --git a/gcc/testsuite/gcc.target/powerpc/pr80315-3.c b/gcc/testsuite/gcc.target/powerpc/pr80315-3.c
index 99a3e24eadd..cc2e46cf5cb 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr80315-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr80315-3.c
@@ -12,6 +12,6 @@ main ()
   int mask;
 
   /* Argument 2 must be 0 or 1.  Argument 3 must be in range 0..15.  */
-  res = vec_shasigma_be (test, 1, 0xff); /* { dg-error {argument 3 must be in the range \[0, 15\]} } */
+  res = vec_shasigma_be (test, 1, 0xff); /* { dg-error {argument 3 must be a 4-bit unsigned literal} } */
   return res;
 }
diff --git a/gcc/testsuite/gcc.target/powerpc/pr80315-4.c b/gcc/testsuite/gcc.target/powerpc/pr80315-4.c
index 7f5f6f75029..ac12910741b 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr80315-4.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr80315-4.c
@@ -12,6 +12,6 @@ main ()
   int mask;
 
   /* Argument 2 must be 0 or 1.  Argument 3 must be in range 0..15.  */
-  res = vec_shasigma_be (test, 1, 0xff); /* { dg-error {argument 3 must be in the range \[0, 15\]} } */
+  res = vec_shasigma_be (test, 1, 0xff); /* { dg-error {argument 3 must be a 4-bit unsigned literal} } */
   return res;
 }
diff --git a/gcc/testsuite/gcc.target/powerpc/pr88100.c b/gcc/testsuite/gcc.target/powerpc/pr88100.c
index 4452145ce95..764c897a497 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr88100.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr88100.c
@@ -10,35 +10,35 @@
 vector unsigned char
 splatu1 (void)
 {
-  return vec_splat_u8(0x100);/* { dg-error "argument 1 must be a 5-bit signed literal" } */
+  return vec_splat_u8(0x100);/* { dg-error "argument 1 must be a literal between -16 and 15, inclusive" } */
 }
 
 vector unsigned short
 splatu2 (void)
 {
-  return vec_splat_u16(0x10000);/* { dg-error "argument 1 must be a 5-bit signed literal" } */
+  return vec_splat_u16(0x10000);/* { dg-error "argument 1 must be a literal between -16 and 15, inclusive" } */
 }
 
 vector unsigned int
 splatu3 (void)
 {
-  return vec_splat_u32(0x10000000);/* { dg-error "argument 1 must be a 5-bit signed literal" } */
+  return vec_splat_u32(0x10000000);/* { dg-error "argument 1 must be a literal between -16 and 15, inclusive" } */
 }
 
 vector signed char
 splats1 (void)
 {
-  return vec_splat_s8(0x100);/* { dg-error "argument 1 must be a 5-bit signed literal" } */
+  return vec_splat_s8(0x100);/* { dg-error "argument 1 must be a literal between -16 and 15, inclusive" } */
 }
 
 vector signed short
 splats2 (void)
 {
-  return vec_splat_s16(0x10000);/* { dg-error "argument 1 must be a 5-bit signed literal" } */
+  return vec_splat_s16(0x10000);/* { dg-error "argument 1 must be a literal between -16 and 15, inclusive" } */
 }
 
 vector signed int
 splats3 (void)
 {
-  return vec_splat_s32(0x10000000);/* { dg-error "argument 1 must be a 5-bit signed literal" } */
+  return vec_splat_s32(0x10000000);/* { dg-error "argument 1 must be a literal between -16 and 15, inclusive" } */
 }
diff --git a/gcc/testsuite/gcc.target/powerpc/pragma_misc9.c b/gcc/testsuite/gcc.target/powerpc/pragma_misc9.c
index e03099bd084..61274463653 100644
--- a/gcc/testsuite/gcc.target/powerpc/pragma_misc9.c
+++ b/gcc/testsuite/gcc.target/powerpc/pragma_misc9.c
@@ -20,7 +20,7 @@ vector bool int
 test2 (vector signed int a, vector signed int b)
 {
   return vec_cmpnez (a, b);
-  /* { dg-error "'__builtin_altivec_vcmpnezw' requires the '-mcpu=power9' option" "" { target *-*-* } .-1 } */
+  /* { dg-error "'__builtin_altivec_vcmpnezw' requires the '-mpower9-vector' option" "" { target *-*-* } .-1 } */
 }
 
 #pragma GCC target ("cpu=power7")
diff --git a/gcc/testsuite/gcc.target/powerpc/pragma_power8.c b/gcc/testsuite/gcc.target/powerpc/pragma_power8.c
index c8d2cdd6c1a..cb0f30844d3 100644
--- a/gcc/testsuite/gcc.target/powerpc/pragma_power8.c
+++ b/gcc/testsuite/gcc.target/powerpc/pragma_power8.c
@@ -19,6 +19,7 @@ test1 (vector int a, vector int b)
 #pragma GCC target ("cpu=power7")
 /* Force a re-read of altivec.h with new cpu target. */
 #undef _ALTIVEC_H
+#undef _RS6000_VECDEFINES_H
 #include <altivec.h>
 #ifdef _ARCH_PWR7
 vector signed int
@@ -33,6 +34,7 @@ test2 (vector signed int a, vector signed int b)
 #pragma GCC target ("cpu=power8")
 /* Force a re-read of altivec.h with new cpu target. */
 #undef _ALTIVEC_H
+#undef _RS6000_VECDEFINES_H
 #include <altivec.h>
 #ifdef _ARCH_PWR8
 vector int
diff --git a/gcc/testsuite/gcc.target/powerpc/pragma_power9.c b/gcc/testsuite/gcc.target/powerpc/pragma_power9.c
index e33aad1aaf7..e05f1f4ddfa 100644
--- a/gcc/testsuite/gcc.target/powerpc/pragma_power9.c
+++ b/gcc/testsuite/gcc.target/powerpc/pragma_power9.c
@@ -17,6 +17,7 @@ test1 (vector int a, vector int b)
 
 #pragma GCC target ("cpu=power7")
 #undef _ALTIVEC_H
+#undef _RS6000_VECDEFINES_H
 #include <altivec.h>
 #ifdef _ARCH_PWR7
 vector signed int
@@ -30,6 +31,7 @@ test2 (vector signed int a, vector signed int b)
 
 #pragma GCC target ("cpu=power8")
 #undef _ALTIVEC_H
+#undef _RS6000_VECDEFINES_H
 #include <altivec.h>
 #ifdef _ARCH_PWR8
 vector int
@@ -50,6 +52,7 @@ test3b (vec_t a, vec_t b)
 
 #pragma GCC target ("cpu=power9,power9-vector")
 #undef _ALTIVEC_H
+#undef _RS6000_VECDEFINES_H
 #include <altivec.h>
 #ifdef _ARCH_PWR9
 vector bool int
diff --git a/gcc/testsuite/gcc.target/powerpc/test_fpscr_drn_builtin_error.c b/gcc/testsuite/gcc.target/powerpc/test_fpscr_drn_builtin_error.c
index 028ab0b6d66..4f9d9e08e8a 100644
--- a/gcc/testsuite/gcc.target/powerpc/test_fpscr_drn_builtin_error.c
+++ b/gcc/testsuite/gcc.target/powerpc/test_fpscr_drn_builtin_error.c
@@ -9,8 +9,8 @@ int main ()
      __builtin_set_fpscr_drn() also support a variable as an argument but
      can't test variable value at compile time.  */
 
-  __builtin_set_fpscr_drn(-1);  /* { dg-error "Argument must be a value between 0 and 7" } */ 
-  __builtin_set_fpscr_drn(8);   /* { dg-error "Argument must be a value between 0 and 7" } */ 
+  __builtin_set_fpscr_drn(-1);  /* { dg-error "argument 1 must be a variable or a literal between 0 and 7, inclusive" } */ 
+  __builtin_set_fpscr_drn(8);   /* { dg-error "argument 1 must be a variable or a literal between 0 and 7, inclusive" } */ 
 
 }
 
diff --git a/gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_error.c b/gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_error.c
index aea65091b0c..10391b71008 100644
--- a/gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_error.c
+++ b/gcc/testsuite/gcc.target/powerpc/test_fpscr_rn_builtin_error.c
@@ -8,13 +8,13 @@ int main ()
      int arguments.  The builtins __builtin_set_fpscr_rn() also supports a
      variable as an argument but can't test variable value at compile time.  */
 
-  __builtin_mtfsb0(-1);  /* { dg-error "Argument must be a constant between 0 and 31" } */
-  __builtin_mtfsb0(32);  /* { dg-error "Argument must be a constant between 0 and 31" } */
+  __builtin_mtfsb0(-1);  /* { dg-error "argument 1 must be a 5-bit unsigned literal" } */
+  __builtin_mtfsb0(32);  /* { dg-error "argument 1 must be a 5-bit unsigned literal" } */
 
-  __builtin_mtfsb1(-1);  /* { dg-error "Argument must be a constant between 0 and 31" } */
-  __builtin_mtfsb1(32);  /* { dg-error "Argument must be a constant between 0 and 31" } */ 
+  __builtin_mtfsb1(-1);  /* { dg-error "argument 1 must be a 5-bit unsigned literal" } */
+  __builtin_mtfsb1(32);  /* { dg-error "argument 1 must be a 5-bit unsigned literal" } */ 
 
-  __builtin_set_fpscr_rn(-1);  /* { dg-error "Argument must be a value between 0 and 3" } */ 
-  __builtin_set_fpscr_rn(4);   /* { dg-error "Argument must be a value between 0 and 3" } */ 
+  __builtin_set_fpscr_rn(-1);  /* { dg-error "argument 1 must be a variable or a literal between 0 and 3, inclusive" } */ 
+  __builtin_set_fpscr_rn(4);   /* { dg-error "argument 1 must be a variable or a literal between 0 and 3, inclusive" } */ 
 }
 
diff --git a/gcc/testsuite/gcc.target/powerpc/test_mffsl.c b/gcc/testsuite/gcc.target/powerpc/test_mffsl.c
index 41377efba1a..28c2b91988e 100644
--- a/gcc/testsuite/gcc.target/powerpc/test_mffsl.c
+++ b/gcc/testsuite/gcc.target/powerpc/test_mffsl.c
@@ -1,5 +1,6 @@
 /* { dg-do run { target { powerpc*-*-* } } } */
-/* { dg-options "-O2 -std=c99" } */
+/* { dg-options "-O2 -std=c99 -mcpu=power9" } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
 
 #ifdef DEBUG
 #include <stdio.h>
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-gnb-2.c b/gcc/testsuite/gcc.target/powerpc/vec-gnb-2.c
index 895bb953b37..4e59cbffa17 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-gnb-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-gnb-2.c
@@ -20,7 +20,7 @@ do_vec_gnb (vector unsigned __int128 source, int stride)
     case 5:
       return vec_gnb (source, 1);	/* { dg-error "between 2 and 7" } */
     case 6:
-      return vec_gnb (source, stride);	/* { dg-error "unsigned literal" } */
+      return vec_gnb (source, stride);	/* { dg-error "literal" } */
     case 7:
       return vec_gnb (source, 7);
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vsu/vec-all-nez-7.c b/gcc/testsuite/gcc.target/powerpc/vsu/vec-all-nez-7.c
index f53c6dca0a9..d1ef054b488 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsu/vec-all-nez-7.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsu/vec-all-nez-7.c
@@ -12,5 +12,5 @@ test_all_not_equal_and_not_zero (vector unsigned short *arg1_p,
   vector unsigned short arg_2 = *arg2_p;
 
   return __builtin_vec_vcmpnez_p (__CR6_LT, arg_1, arg_2);
-  /* { dg-error "'__builtin_altivec_vcmpnezh_p' requires the '-mcpu=power9' option" "" { target *-*-* } .-1 } */
+  /* { dg-error "'__builtin_altivec_vcmpnezh_p' requires the '-mpower9-vector' option" "" { target *-*-* } .-1 } */
 }
diff --git a/gcc/testsuite/gcc.target/powerpc/vsu/vec-any-eqz-7.c b/gcc/testsuite/gcc.target/powerpc/vsu/vec-any-eqz-7.c
index 757acd93110..b5cdea5fb3e 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsu/vec-any-eqz-7.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsu/vec-any-eqz-7.c
@@ -11,5 +11,5 @@ test_any_equal (vector unsigned int *arg1_p, vector unsigned int *arg2_p)
   vector unsigned int arg_2 = *arg2_p;
 
   return __builtin_vec_vcmpnez_p (__CR6_LT_REV, arg_1, arg_2);
-  /* { dg-error "'__builtin_altivec_vcmpnezw_p' requires the '-mcpu=power9' option" "" { target *-*-* } .-1 } */
+  /* { dg-error "'__builtin_altivec_vcmpnezw_p' requires the '-mpower9-vector' option" "" { target *-*-* } .-1 } */
 }
diff --git a/gcc/testsuite/gcc.target/powerpc/vsu/vec-cmpnez-7.c b/gcc/testsuite/gcc.target/powerpc/vsu/vec-cmpnez-7.c
index 811b32f1c32..320421e6028 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsu/vec-cmpnez-7.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsu/vec-cmpnez-7.c
@@ -10,5 +10,5 @@ fetch_data (vector unsigned int *arg1_p, vector unsigned int *arg2_p)
   vector unsigned int arg_1 = *arg1_p;
   vector unsigned int arg_2 = *arg2_p;
 
-  return __builtin_vec_vcmpnez (arg_1, arg_2);	/* { dg-error "'__builtin_altivec_vcmpnezw' requires the '-mcpu=power9' option" } */
+  return __builtin_vec_vcmpnez (arg_1, arg_2);	/* { dg-error "'__builtin_altivec_vcmpnezw' requires the '-mpower9-vector' option" } */
 }
diff --git a/gcc/testsuite/gcc.target/powerpc/vsu/vec-cntlz-lsbb-2.c b/gcc/testsuite/gcc.target/powerpc/vsu/vec-cntlz-lsbb-2.c
index 6ee066d1eff..251285536c2 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsu/vec-cntlz-lsbb-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsu/vec-cntlz-lsbb-2.c
@@ -9,5 +9,5 @@ count_leading_zero_byte_bits (vector unsigned char *arg1_p)
 {
   vector unsigned char arg_1 = *arg1_p;
 
-  return __builtin_vec_vclzlsbb (arg_1);	/* { dg-error "'__builtin_altivec_vclzlsbb_v16qi' requires the '-mcpu=power9' option" } */
+  return __builtin_vec_vclzlsbb (arg_1);	/* { dg-error "'__builtin_altivec_vclzlsbb_v16qi' requires the '-mpower9-vector' option" } */
 }
diff --git a/gcc/testsuite/gcc.target/powerpc/vsu/vec-cnttz-lsbb-2.c b/gcc/testsuite/gcc.target/powerpc/vsu/vec-cnttz-lsbb-2.c
index ecd0add70d0..83ca92daced 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsu/vec-cnttz-lsbb-2.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsu/vec-cnttz-lsbb-2.c
@@ -9,5 +9,5 @@ count_trailing_zero_byte_bits (vector unsigned char *arg1_p)
 {
   vector unsigned char arg_1 = *arg1_p;
 
-  return __builtin_vec_vctzlsbb (arg_1);	/* { dg-error "'__builtin_altivec_vctzlsbb_v16qi' requires the '-mcpu=power9' option" } */
+  return __builtin_vec_vctzlsbb (arg_1);	/* { dg-error "'__builtin_altivec_vctzlsbb_v16qi' requires the '-mpower9-vector' option" } */
 }
diff --git a/gcc/testsuite/gcc.target/powerpc/vsu/vec-xl-len-13.c b/gcc/testsuite/gcc.target/powerpc/vsu/vec-xl-len-13.c
index 1cfed57d6a6..0f601fbbb50 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsu/vec-xl-len-13.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsu/vec-xl-len-13.c
@@ -13,5 +13,5 @@
 int
 fetch_data (float *address, size_t length)
 {
-  return __builtin_vec_lxvl (address, length);	/* { dg-warning "'__builtin_vec_lxvl'" } */
+  return __builtin_vec_lxvl (address, length);	/* { dg-error "'__builtin_vsx_lxvl' requires the" } */
 }
diff --git a/gcc/testsuite/gcc.target/powerpc/vsu/vec-xst-len-12.c b/gcc/testsuite/gcc.target/powerpc/vsu/vec-xst-len-12.c
index 3a51132a5a2..f30d49cb4cc 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsu/vec-xst-len-12.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsu/vec-xst-len-12.c
@@ -13,5 +13,5 @@ store_data (vector double *datap, double *address, size_t length)
 {
   vector double data = *datap;
 
-  __builtin_vec_stxvl (data, address, length); /* { dg-error "'__builtin_vec_stxvl' is not supported in this compiler configuration" } */
+  __builtin_vec_stxvl (data, address, length); /* { dg-error "'__builtin_altivec_stxvl' requires the" } */
 }
-- 
2.27.0


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 17/18] rs6000: Enable the new builtin support
  2021-09-01 16:13 [PATCHv5 00/18] Replace the Power target-specific builtin machinery Bill Schmidt
                   ` (15 preceding siblings ...)
  2021-09-01 16:13 ` [PATCH 16/18] rs6000: Test case adjustments Bill Schmidt
@ 2021-09-01 16:13 ` Bill Schmidt
  2021-11-05 22:10   ` Segher Boessenkool
  2021-09-01 16:13 ` [PATCH 18/18] rs6000: Add escape-newline support for builtins files Bill Schmidt
  2021-09-13 13:33 ` [PATCHv5 00/18] Replace the Power target-specific builtin machinery Bill Schmidt
  18 siblings, 1 reply; 52+ messages in thread
From: Bill Schmidt @ 2021-09-01 16:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc

2021-03-05  Bill Schmidt  <wschmidt@linux.ibm.com>

gcc/
	* config/rs6000/rs6000-gen-builtins.c (write_init_file):
	Initialize new_builtins_are_live to 1.
---
 gcc/config/rs6000/rs6000-gen-builtins.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/rs6000-gen-builtins.c b/gcc/config/rs6000/rs6000-gen-builtins.c
index 7f711210aff..fdef65fe1d4 100644
--- a/gcc/config/rs6000/rs6000-gen-builtins.c
+++ b/gcc/config/rs6000/rs6000-gen-builtins.c
@@ -2791,7 +2791,7 @@ write_init_file (void)
   fprintf (init_file, "#include \"rs6000-builtins.h\"\n");
   fprintf (init_file, "\n");
 
-  fprintf (init_file, "int new_builtins_are_live = 0;\n\n");
+  fprintf (init_file, "int new_builtins_are_live = 1;\n\n");
 
   fprintf (init_file, "tree rs6000_builtin_decls_x[RS6000_OVLD_MAX];\n\n");
 
-- 
2.27.0


^ permalink raw reply	[flat|nested] 52+ messages in thread

* [PATCH 18/18] rs6000: Add escape-newline support for builtins files
  2021-09-01 16:13 [PATCHv5 00/18] Replace the Power target-specific builtin machinery Bill Schmidt
                   ` (16 preceding siblings ...)
  2021-09-01 16:13 ` [PATCH 17/18] rs6000: Enable the new builtin support Bill Schmidt
@ 2021-09-01 16:13 ` Bill Schmidt
  2021-11-05 23:50   ` Segher Boessenkool
  2021-09-13 13:33 ` [PATCHv5 00/18] Replace the Power target-specific builtin machinery Bill Schmidt
  18 siblings, 1 reply; 52+ messages in thread
From: Bill Schmidt @ 2021-09-01 16:13 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc

2021-08-19  Bill Schmidt  <wschmidt@linux.ibm.com>

gcc/
	* config/rs6000/rs6000-builtin-new.def (VEC_INIT_V16QI): Use
	escape-newline support.
	(VEC_INIT_V4SI): Likewise.
	(VEC_INIT_V8HI): Likewise.
	(PACK_V1TI): Likewise.
	(DIVDEU): Likewise.
	(VFIRSTMISMATCHOREOSINDEX_V16QI): Likewise.
	(VFIRSTMISMATCHOREOSINDEX_V8HI): Likewise.
	(VFIRSTMISMATCHOREOSINDEX_V4SI): Likewise.
	(CMPRB2): Likewise.
	(VSTDCP): Likewise.
	(VSIEDP): Likewise.
	(FMAF128_ODD): Likewise.
	(VSCEQPUO): Likewise.
	(VSIEQP): Likewise.
	(VSIEQPF): Likewise.
	(VSTDCQP): Likewise.
	(PACK_TD): Likewise.
	(TABORTDC): Likewise.
	(TABORTDCI): Likewise.
	(SE_LXVRBX): Likewise.
	(SE_LXVRHX): Likewise.
	(SE_LXVRWX): Likewise.
	(SE_LXVRDX): Likewise.
	(VREPLACE_UN_UV2DI): Likewise.
	(VREPLACE_UN_UV4SI): Likewise.
	(VREPLACE_UN_V2DI): Likewise.
	(VREPLACE_ELT_UV2DI): Likewise.
	(VREPLACE_ELT_V2DI): Likewise.
	(ZE_LXVRBX): Likewise.
	(ZE_LXVRHX): Likewise.
	(ZE_LXVRWX): Likewise.
	(ZE_LXVRDX): Likewise.
	(CFUGED): Likewise.
	(CNTLZDM): Likewise.
	(CNTTZDM): Likewise.
	(PDEPD): Likewise.
	(PEXTD): Likewise.
	(PMXVBF16GER2): Likewise.
	(PMXVBF16GER2_INTERNAL): Likewise.
	(PMXVBF16GER2NN): Likewise.
	(PMXVBF16GER2NN_INTERNAL): Likewise.
	(PMXVBF16GER2NP): Likewise.
	(PMXVBF16GER2NP_INTERNAL): Likewise.
	(PMXVBF16GER2PN): Likewise.
	(PMXVBF16GER2PN_INTERNAL): Likewise.
	(PMXVBF16GER2PP): Likewise.
	(PMXVBF16GER2PP_INTERNAL): Likewise.
	(PMXVF16GER2): Likewise.
	(PMXVF16GER2_INTERNAL): Likewise.
	(PMXVF16GER2NN): Likewise.
	(PMXVF16GER2NN_INTERNAL): Likewise.
	(PMXVF16GER2NP): Likewise.
	(PMXVF16GER2NP_INTERNAL): Likewise.
	(PMXVF16GER2PN): Likewise.
	(PMXVF16GER2PN_INTERNAL): Likewise.
	(PMXVF16GER2PP): Likewise.
	(PMXVF16GER2PP_INTERNAL): Likewise.
	(PMXVF32GER_INTERNAL): Likewise.
	(PMXVF32GERNN): Likewise.
	(PMXVF32GERNN_INTERNAL): Likewise.
	(PMXVF32GERNP): Likewise.
	(PMXVF32GERNP_INTERNAL): Likewise.
	(PMXVF32GERPN): Likewise.
	(PMXVF32GERPN_INTERNAL): Likewise.
	(PMXVF32GERPP): Likewise.
	(PMXVF32GERPP_INTERNAL): Likewise.
	(PMXVF64GER): Likewise.
	(PMXVF64GER_INTERNAL): Likewise.
	(PMXVF64GERNN): Likewise.
	(PMXVF64GERNN_INTERNAL): Likewise.
	(PMXVF64GERNP): Likewise.
	(PMXVF64GERNP_INTERNAL): Likewise.
	(PMXVF64GERPN): Likewise.
	(PMXVF64GERPN_INTERNAL): Likewise.
	(PMXVF64GERPP): Likewise.
	(PMXVF64GERPP_INTERNAL): Likewise.
	(PMXVI16GER2): Likewise.
	(PMXVI16GER2_INTERNAL): Likewise.
	(PMXVI16GER2PP): Likewise.
	(PMXVI16GER2PP_INTERNAL): Likewise.
	(PMXVI16GER2S): Likewise.
	(PMXVI16GER2S_INTERNAL): Likewise.
	(PMXVI16GER2SPP): Likewise.
	(PMXVI16GER2SPP_INTERNAL): Likewise.
	(PMXVI4GER8): Likewise.
	(PMXVI4GER8_INTERNAL): Likewise.
	(PMXVI4GER8PP): Likewise.
	(PMXVI4GER8PP_INTERNAL): Likewise.
	(PMXVI8GER4): Likewise.
	(PMXVI8GER4_INTERNAL): Likewise.
	(PMXVI8GER4PP): Likewise.
	(PMXVI8GER4PP_INTERNAL): Likewise.
	(PMXVI8GER4SPP): Likewise.
	(PMXVI8GER4SPP_INTERNAL): Likewise.
	* config/rs6000/rs6000-gen-builtins.c (MAXLINES): New macro.
	(lines): New variable.
	(lastline): Likewise.
	(real_line_pos): New function.
	(diag): Change signature.
	(bif_diag): Change signature; support escape-newline handling.
	(ovld_diag): Likewise.
	(fatal): Move earlier.
	(consume_whitespace): Adjust diag call.
	(advance_line): Add escape-newline handling; call fatal.
	(safe_inc_pos): Adjust diag call.
	(match_identifier): Likewise.
	(match_integer): Likewise.
	(match_to_right_bracket): Call fatal instead of diag; adjust diag
	call.
	(match_basetype): Adjust diag calls.
	(match_bracketed_pair): Likewise.
	(match_const_restriction): Likewise.
	(match_type): Likewise.
	(parse_args): Likewise.
	(parse_bif_attrs): Likewise.
	(complete_vector_type): Likewise.
	(complete_base_type): Likewise.
	(parse_prototype): Likewise.
	(parse_bif_entry): Likewise.
	(parse_bif_stanza): Likewise.
	(parse_ovld_entry): Likewise.
	(parse_ovld_stanza): Likewise.
	(main): Allocate buffers for lines[].
---
 gcc/config/rs6000/rs6000-builtin-new.def | 288 +++++++++++++++--------
 gcc/config/rs6000/rs6000-gen-builtins.c  | 280 +++++++++++++---------
 2 files changed, 358 insertions(+), 210 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtin-new.def b/gcc/config/rs6000/rs6000-builtin-new.def
index 1966516551e..c22aaf767c7 100644
--- a/gcc/config/rs6000/rs6000-builtin-new.def
+++ b/gcc/config/rs6000/rs6000-builtin-new.def
@@ -1094,16 +1094,22 @@
   const signed short __builtin_vec_ext_v8hi (vss, signed int);
     VEC_EXT_V8HI nothing {extract}
 
-  const vsc __builtin_vec_init_v16qi (signed char, signed char, signed char, signed char, signed char, signed char, signed char, signed char, signed char, signed char, signed char, signed char, signed char, signed char, signed char, signed char);
+  const vsc __builtin_vec_init_v16qi (signed char, signed char, signed char, \
+            signed char, signed char, signed char, signed char, signed char, \
+            signed char, signed char, signed char, signed char, signed char, \
+            signed char, signed char, signed char);
     VEC_INIT_V16QI nothing {init}
 
   const vf __builtin_vec_init_v4sf (float, float, float, float);
     VEC_INIT_V4SF nothing {init}
 
-  const vsi __builtin_vec_init_v4si (signed int, signed int, signed int, signed int);
+  const vsi __builtin_vec_init_v4si (signed int, signed int, signed int, \
+                                     signed int);
     VEC_INIT_V4SI nothing {init}
 
-  const vss __builtin_vec_init_v8hi (signed short, signed short, signed short, signed short, signed short, signed short, signed short, signed short);
+  const vss __builtin_vec_init_v8hi (signed short, signed short, signed short,\
+             signed short, signed short, signed short, signed short, \
+             signed short);
     VEC_INIT_V8HI nothing {init}
 
   const vsc __builtin_vec_set_v16qi (vsc, signed char, const int<4>);
@@ -2023,7 +2029,8 @@
   const unsigned int __builtin_divweu (unsigned int, unsigned int);
     DIVWEU diveu_si {}
 
-  const vsq __builtin_pack_vector_int128 (unsigned long long, unsigned long long);
+  const vsq __builtin_pack_vector_int128 (unsigned long long, \
+                                          unsigned long long);
     PACK_V1TI packv1ti {}
 
   void __builtin_ppc_speculation_barrier ();
@@ -2038,7 +2045,8 @@
   const signed long long __builtin_divde (signed long long, signed long long);
     DIVDE dive_di {}
 
-  const unsigned long long __builtin_divdeu (unsigned long long, unsigned long long);
+  const unsigned long long __builtin_divdeu (unsigned long long, \
+                                             unsigned long long);
     DIVDEU diveu_di {}
 
 
@@ -2515,13 +2523,16 @@
   const signed int __builtin_altivec_first_mismatch_index_v4si (vsi, vsi);
     VFIRSTMISMATCHINDEX_V4SI first_mismatch_index_v4si {}
 
-  const signed int __builtin_altivec_first_mismatch_or_eos_index_v16qi (vsc, vsc);
+  const signed int \
+      __builtin_altivec_first_mismatch_or_eos_index_v16qi (vsc, vsc);
     VFIRSTMISMATCHOREOSINDEX_V16QI first_mismatch_or_eos_index_v16qi {}
 
-  const signed int __builtin_altivec_first_mismatch_or_eos_index_v8hi (vss, vss);
+  const signed int \
+      __builtin_altivec_first_mismatch_or_eos_index_v8hi (vss, vss);
     VFIRSTMISMATCHOREOSINDEX_V8HI first_mismatch_or_eos_index_v8hi {}
 
-  const signed int __builtin_altivec_first_mismatch_or_eos_index_v4si (vsi, vsi);
+  const signed int \
+      __builtin_altivec_first_mismatch_or_eos_index_v4si (vsi, vsi);
     VFIRSTMISMATCHOREOSINDEX_V4SI first_mismatch_or_eos_index_v4si {}
 
   const vsc __builtin_altivec_vadub (vsc, vsc);
@@ -2695,7 +2706,8 @@
   const signed int __builtin_scalar_byte_in_range (signed int, signed int);
     CMPRB cmprb {}
 
-  const signed int __builtin_scalar_byte_in_either_range (signed int, signed int);
+  const signed int \
+      __builtin_scalar_byte_in_either_range (signed int, signed int);
     CMPRB2 cmprb2 {}
 
   const vsll __builtin_vsx_extract4b (vsc, const int[0,12]);
@@ -2734,10 +2746,12 @@
   const signed int __builtin_vsx_scalar_cmp_exp_dp_unordered (double, double);
     VSCEDPUO xscmpexpdp_unordered {}
 
-  const signed int __builtin_vsx_scalar_test_data_class_dp (double, const int<7>);
+  const signed int \
+      __builtin_vsx_scalar_test_data_class_dp (double, const int<7>);
     VSTDCDP xststdcdp {}
 
-  const signed int __builtin_vsx_scalar_test_data_class_sp (float, const int<7>);
+  const signed int \
+      __builtin_vsx_scalar_test_data_class_sp (float, const int<7>);
     VSTDCSP xststdcsp {}
 
   const signed int __builtin_vsx_scalar_test_neg_dp (double);
@@ -2835,7 +2849,8 @@
   const signed long __builtin_vsx_scalar_extract_sig (double);
     VSESDP xsxsigdp {}
 
-  const double __builtin_vsx_scalar_insert_exp (unsigned long long, unsigned long long);
+  const double __builtin_vsx_scalar_insert_exp (unsigned long long, \
+                                                unsigned long long);
     VSIEDP xsiexpdp {}
 
   const double __builtin_vsx_scalar_insert_exp_dp (double, unsigned long long);
@@ -2853,7 +2868,8 @@
   fpmath _Float128 __builtin_divf128_round_to_odd (_Float128, _Float128);
     DIVF128_ODD divkf3_odd {}
 
-  fpmath _Float128 __builtin_fmaf128_round_to_odd (_Float128, _Float128, _Float128);
+  fpmath _Float128 __builtin_fmaf128_round_to_odd (_Float128, _Float128, \
+                                                   _Float128);
     FMAF128_ODD fmakf4_odd {}
 
   fpmath _Float128 __builtin_mulf128_round_to_odd (_Float128, _Float128);
@@ -2868,7 +2884,8 @@
   const signed int __builtin_vsx_scalar_cmp_exp_qp_lt (_Float128, _Float128);
     VSCEQPLT xscmpexpqp_lt_kf {}
 
-  const signed int __builtin_vsx_scalar_cmp_exp_qp_unordered (_Float128, _Float128);
+  const signed int \
+      __builtin_vsx_scalar_cmp_exp_qp_unordered (_Float128, _Float128);
     VSCEQPUO xscmpexpqp_unordered_kf {}
 
   fpmath _Float128 __builtin_sqrtf128_round_to_odd (_Float128);
@@ -2886,13 +2903,16 @@
   const signed __int128 __builtin_vsx_scalar_extract_sigq (_Float128);
     VSESQP xsxsigqp_kf {}
 
-  const _Float128 __builtin_vsx_scalar_insert_exp_q (unsigned __int128, unsigned long long);
+  const _Float128 __builtin_vsx_scalar_insert_exp_q (unsigned __int128, \
+                                                     unsigned long long);
     VSIEQP xsiexpqp_kf {}
 
-  const _Float128 __builtin_vsx_scalar_insert_exp_qp (_Float128, unsigned long long);
+  const _Float128 __builtin_vsx_scalar_insert_exp_qp (_Float128, \
+                                                      unsigned long long);
     VSIEQPF xsiexpqpf_kf {}
 
-  const signed int __builtin_vsx_scalar_test_data_class_qp (_Float128, const int<7>);
+  const signed int __builtin_vsx_scalar_test_data_class_qp (_Float128, \
+                                                            const int<7>);
     VSTDCQP xststdcqp_kf {}
 
   const signed int __builtin_vsx_scalar_test_neg_qp (_Float128);
@@ -2941,7 +2961,8 @@
   const signed long long __builtin_dxexq (_Decimal128);
     DXEXQ dfp_dxex_td {}
 
-  const _Decimal128 __builtin_pack_dec128 (unsigned long long, unsigned long long);
+  const _Decimal128 __builtin_pack_dec128 (unsigned long long, \
+                                           unsigned long long);
     PACK_TD packtd {}
 
   void __builtin_set_fpscr_drn (const int[0,7]);
@@ -3017,10 +3038,12 @@
   unsigned int __builtin_tabort (unsigned int);
     TABORT tabort {htm,htmcr}
 
-  unsigned int __builtin_tabortdc (unsigned long long, unsigned long long, unsigned long long);
+  unsigned int __builtin_tabortdc (unsigned long long, unsigned long long, \
+                                   unsigned long long);
     TABORTDC tabortdc {htm,htmcr}
 
-  unsigned int __builtin_tabortdci (unsigned long long, unsigned long long, unsigned long long);
+  unsigned int __builtin_tabortdci (unsigned long long, unsigned long long, \
+                                    unsigned long long);
     TABORTDCI tabortdci {htm,htmcr}
 
   unsigned int __builtin_tabortwc (unsigned int, unsigned int, unsigned int);
@@ -3115,16 +3138,20 @@
   const vui __builtin_altivec_mtvsrwm (unsigned long long);
     MTVSRWM vec_mtvsr_v4si {}
 
-  pure signed __int128 __builtin_altivec_se_lxvrbx (signed long, const signed char *);
+  pure signed __int128 __builtin_altivec_se_lxvrbx (signed long, \
+                                                    const signed char *);
     SE_LXVRBX vsx_lxvrbx {lxvrse}
 
-  pure signed __int128 __builtin_altivec_se_lxvrhx (signed long, const signed short *);
+  pure signed __int128 __builtin_altivec_se_lxvrhx (signed long, \
+                                                    const signed short *);
     SE_LXVRHX vsx_lxvrhx {lxvrse}
 
-  pure signed __int128 __builtin_altivec_se_lxvrwx (signed long, const signed int *);
+  pure signed __int128 __builtin_altivec_se_lxvrwx (signed long, \
+                                                    const signed int *);
     SE_LXVRWX vsx_lxvrwx {lxvrse}
 
-  pure signed __int128 __builtin_altivec_se_lxvrdx (signed long, const signed long long *);
+  pure signed __int128 __builtin_altivec_se_lxvrdx (signed long, \
+                                                    const signed long long *);
     SE_LXVRDX vsx_lxvrdx {lxvrse}
 
   void __builtin_altivec_tr_stxvrbx (vsq, signed long, signed char *);
@@ -3358,16 +3385,19 @@
   const vull __builtin_altivec_vpextd (vull, vull);
     VPEXTD vpextd {}
 
-  const vull __builtin_altivec_vreplace_un_uv2di (vull, unsigned long long, const int<4>);
+  const vull __builtin_altivec_vreplace_un_uv2di (vull, unsigned long long, \
+                                                  const int<4>);
     VREPLACE_UN_UV2DI vreplace_un_v2di {}
 
-  const vui __builtin_altivec_vreplace_un_uv4si (vui, unsigned int, const int<4>);
+  const vui __builtin_altivec_vreplace_un_uv4si (vui, unsigned int, \
+                                                 const int<4>);
     VREPLACE_UN_UV4SI vreplace_un_v4si {}
 
   const vd __builtin_altivec_vreplace_un_v2df (vd, double, const int<4>);
     VREPLACE_UN_V2DF vreplace_un_v2df {}
 
-  const vsll __builtin_altivec_vreplace_un_v2di (vsll, signed long long, const int<4>);
+  const vsll __builtin_altivec_vreplace_un_v2di (vsll, signed long long, \
+                                                 const int<4>);
     VREPLACE_UN_V2DI vreplace_un_v2di {}
 
   const vf __builtin_altivec_vreplace_un_v4sf (vf, float, const int<4>);
@@ -3376,7 +3406,8 @@
   const vsi __builtin_altivec_vreplace_un_v4si (vsi, signed int, const int<4>);
     VREPLACE_UN_V4SI vreplace_un_v4si {}
 
-  const vull __builtin_altivec_vreplace_uv2di (vull, unsigned long long, const int<1>);
+  const vull __builtin_altivec_vreplace_uv2di (vull, unsigned long long, \
+                                               const int<1>);
     VREPLACE_ELT_UV2DI vreplace_elt_v2di {}
 
   const vui __builtin_altivec_vreplace_uv4si (vui, unsigned int, const int<2>);
@@ -3385,7 +3416,8 @@
   const vd __builtin_altivec_vreplace_v2df (vd, double, const int<1>);
     VREPLACE_ELT_V2DF vreplace_elt_v2df {}
 
-  const vsll __builtin_altivec_vreplace_v2di (vsll, signed long long, const int<1>);
+  const vsll __builtin_altivec_vreplace_v2di (vsll, signed long long, \
+                                              const int<1>);
     VREPLACE_ELT_V2DI vreplace_elt_v2di {}
 
   const vf __builtin_altivec_vreplace_v4sf (vf, float, const int<2>);
@@ -3553,33 +3585,42 @@
   const vss __builtin_vsx_xxpermx_v8hi (vss, vss, vuc, const int<3>);
     XXPERMX_V8HI xxpermx {}
 
-  pure unsigned __int128 __builtin_altivec_ze_lxvrbx (signed long, const unsigned char *);
+  pure unsigned __int128 __builtin_altivec_ze_lxvrbx (signed long, \
+                                                      const unsigned char *);
     ZE_LXVRBX vsx_lxvrbx {lxvrze}
 
-  pure unsigned __int128 __builtin_altivec_ze_lxvrhx (signed long, const unsigned short *);
+  pure unsigned __int128 __builtin_altivec_ze_lxvrhx (signed long, \
+                                                      const unsigned short *);
     ZE_LXVRHX vsx_lxvrhx {lxvrze}
 
-  pure unsigned __int128 __builtin_altivec_ze_lxvrwx (signed long, const unsigned int *);
+  pure unsigned __int128 __builtin_altivec_ze_lxvrwx (signed long, \
+                                                      const unsigned int *);
     ZE_LXVRWX vsx_lxvrwx {lxvrze}
 
-  pure unsigned __int128 __builtin_altivec_ze_lxvrdx (signed long, const unsigned long long *);
+  pure unsigned __int128 \
+      __builtin_altivec_ze_lxvrdx (signed long, const unsigned long long *);
     ZE_LXVRDX vsx_lxvrdx {lxvrze}
 
 
 [power10-64]
-  const unsigned long long __builtin_cfuged (unsigned long long, unsigned long long);
+  const unsigned long long __builtin_cfuged (unsigned long long, \
+                                             unsigned long long);
     CFUGED cfuged {}
 
-  const unsigned long long __builtin_cntlzdm (unsigned long long, unsigned long long);
+  const unsigned long long __builtin_cntlzdm (unsigned long long, \
+                                              unsigned long long);
     CNTLZDM cntlzdm {}
 
-  const unsigned long long __builtin_cnttzdm (unsigned long long, unsigned long long);
+  const unsigned long long __builtin_cnttzdm (unsigned long long, \
+                                              unsigned long long);
     CNTTZDM cnttzdm {}
 
-  const unsigned long long __builtin_pdepd (unsigned long long, unsigned long long);
+  const unsigned long long __builtin_pdepd (unsigned long long, \
+                                            unsigned long long);
     PDEPD pdepd {}
 
-  const unsigned long long __builtin_pextd (unsigned long long, unsigned long long);
+  const unsigned long long __builtin_pextd (unsigned long long, \
+                                            unsigned long long);
     PEXTD pextd {}
 
 
@@ -3614,178 +3655,235 @@
   vuc __builtin_mma_disassemble_pair_internal (v256, const int<2>);
     DISASSEMBLE_PAIR_INTERNAL vsx_disassemble_pair {mma}
 
-  void __builtin_mma_pmxvbf16ger2 (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
+  void __builtin_mma_pmxvbf16ger2 (v512 *, vuc, vuc, const int<4>, \
+                                   const int<4>, const int<2>);
     PMXVBF16GER2 nothing {mma,mmaint}
 
-  v512 __builtin_mma_pmxvbf16ger2_internal (vuc, vuc, const int<4>, const int<4>, const int<2>);
+  v512 __builtin_mma_pmxvbf16ger2_internal (vuc, vuc, const int<4>, \
+                                            const int<4>, const int<2>);
     PMXVBF16GER2_INTERNAL mma_pmxvbf16ger2 {mma}
 
-  void __builtin_mma_pmxvbf16ger2nn (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
+  void __builtin_mma_pmxvbf16ger2nn (v512 *, vuc, vuc, const int<4>, \
+                                     const int<4>, const int<2>);
     PMXVBF16GER2NN nothing {mma,quad,mmaint}
 
-  v512 __builtin_mma_pmxvbf16ger2nn_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<2>);
+  v512 __builtin_mma_pmxvbf16ger2nn_internal (v512, vuc, vuc, const int<4>, \
+                                              const int<4>, const int<2>);
     PMXVBF16GER2NN_INTERNAL mma_pmxvbf16ger2nn {mma,quad}
 
-  void __builtin_mma_pmxvbf16ger2np (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
+  void __builtin_mma_pmxvbf16ger2np (v512 *, vuc, vuc, const int<4>, \
+                                     const int<4>, const int<2>);
     PMXVBF16GER2NP nothing {mma,quad,mmaint}
 
-  v512 __builtin_mma_pmxvbf16ger2np_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<2>);
+  v512 __builtin_mma_pmxvbf16ger2np_internal (v512, vuc, vuc, const int<4>, \
+                                              const int<4>, const int<2>);
     PMXVBF16GER2NP_INTERNAL mma_pmxvbf16ger2np {mma,quad}
 
-  void __builtin_mma_pmxvbf16ger2pn (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
+  void __builtin_mma_pmxvbf16ger2pn (v512 *, vuc, vuc, const int<4>, \
+                                     const int<4>, const int<2>);
     PMXVBF16GER2PN nothing {mma,quad,mmaint}
 
-  v512 __builtin_mma_pmxvbf16ger2pn_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<2>);
+  v512 __builtin_mma_pmxvbf16ger2pn_internal (v512, vuc, vuc, const int<4>, \
+                                              const int<4>, const int<2>);
     PMXVBF16GER2PN_INTERNAL mma_pmxvbf16ger2pn {mma,quad}
 
-  void __builtin_mma_pmxvbf16ger2pp (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
+  void __builtin_mma_pmxvbf16ger2pp (v512 *, vuc, vuc, const int<4>, \
+                                     const int<4>, const int<2>);
     PMXVBF16GER2PP nothing {mma,quad,mmaint}
 
-  v512 __builtin_mma_pmxvbf16ger2pp_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<2>);
+  v512 __builtin_mma_pmxvbf16ger2pp_internal (v512, vuc, vuc, const int<4>, \
+                                              const int<4>, const int<2>);
     PMXVBF16GER2PP_INTERNAL mma_pmxvbf16ger2pp {mma,quad}
 
-  void __builtin_mma_pmxvf16ger2 (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
+  void __builtin_mma_pmxvf16ger2 (v512 *, vuc, vuc, const int<4>, \
+                                  const int<4>, const int<2>);
     PMXVF16GER2 nothing {mma,mmaint}
 
-  v512 __builtin_mma_pmxvf16ger2_internal (vuc, vuc, const int<4>, const int<4>, const int<2>);
+  v512 __builtin_mma_pmxvf16ger2_internal (vuc, vuc, const int<4>, \
+                                           const int<4>, const int<2>);
     PMXVF16GER2_INTERNAL mma_pmxvf16ger2 {mma}
 
-  void __builtin_mma_pmxvf16ger2nn (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
+  void __builtin_mma_pmxvf16ger2nn (v512 *, vuc, vuc, const int<4>, \
+                                    const int<4>, const int<2>);
     PMXVF16GER2NN nothing {mma,quad,mmaint}
 
-  v512 __builtin_mma_pmxvf16ger2nn_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<2>);
+  v512 __builtin_mma_pmxvf16ger2nn_internal (v512, vuc, vuc, const int<4>, \
+                                             const int<4>, const int<2>);
     PMXVF16GER2NN_INTERNAL mma_pmxvf16ger2nn {mma,quad}
 
-  void __builtin_mma_pmxvf16ger2np (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
+  void __builtin_mma_pmxvf16ger2np (v512 *, vuc, vuc, const int<4>, \
+                                    const int<4>, const int<2>);
     PMXVF16GER2NP nothing {mma,quad,mmaint}
 
-  v512 __builtin_mma_pmxvf16ger2np_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<2>);
+  v512 __builtin_mma_pmxvf16ger2np_internal (v512, vuc, vuc, const int<4>, \
+                                             const int<4>, const int<2>);
     PMXVF16GER2NP_INTERNAL mma_pmxvf16ger2np {mma,quad}
 
-  void __builtin_mma_pmxvf16ger2pn (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
+  void __builtin_mma_pmxvf16ger2pn (v512 *, vuc, vuc, const int<4>, \
+                                    const int<4>, const int<2>);
     PMXVF16GER2PN nothing {mma,quad,mmaint}
 
-  v512 __builtin_mma_pmxvf16ger2pn_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<2>);
+  v512 __builtin_mma_pmxvf16ger2pn_internal (v512, vuc, vuc, const int<4>, \
+                                             const int<4>, const int<2>);
     PMXVF16GER2PN_INTERNAL mma_pmxvf16ger2pn {mma,quad}
 
-  void __builtin_mma_pmxvf16ger2pp (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
+  void __builtin_mma_pmxvf16ger2pp (v512 *, vuc, vuc, const int<4>, \
+                                    const int<4>, const int<2>);
     PMXVF16GER2PP nothing {mma,quad,mmaint}
 
-  v512 __builtin_mma_pmxvf16ger2pp_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<2>);
+  v512 __builtin_mma_pmxvf16ger2pp_internal (v512, vuc, vuc, const int<4>, \
+                                             const int<4>, const int<2>);
     PMXVF16GER2PP_INTERNAL mma_pmxvf16ger2pp {mma,quad}
 
   void __builtin_mma_pmxvf32ger (v512 *, vuc, vuc, const int<4>, const int<4>);
     PMXVF32GER nothing {mma,mmaint}
 
-  v512 __builtin_mma_pmxvf32ger_internal (vuc, vuc, const int<4>, const int<4>);
+  v512 __builtin_mma_pmxvf32ger_internal (vuc, vuc, const int<4>, \
+                                          const int<4>);
     PMXVF32GER_INTERNAL mma_pmxvf32ger {mma}
 
-  void __builtin_mma_pmxvf32gernn (v512 *, vuc, vuc, const int<4>, const int<4>);
+  void __builtin_mma_pmxvf32gernn (v512 *, vuc, vuc, const int<4>, \
+                                   const int<4>);
     PMXVF32GERNN nothing {mma,quad,mmaint}
 
-  v512 __builtin_mma_pmxvf32gernn_internal (v512, vuc, vuc, const int<4>, const int<4>);
+  v512 __builtin_mma_pmxvf32gernn_internal (v512, vuc, vuc, const int<4>, \
+                                            const int<4>);
     PMXVF32GERNN_INTERNAL mma_pmxvf32gernn {mma,quad}
 
-  void __builtin_mma_pmxvf32gernp (v512 *, vuc, vuc, const int<4>, const int<4>);
+  void __builtin_mma_pmxvf32gernp (v512 *, vuc, vuc, const int<4>, \
+                                   const int<4>);
     PMXVF32GERNP nothing {mma,quad,mmaint}
 
-  v512 __builtin_mma_pmxvf32gernp_internal (v512, vuc, vuc, const int<4>, const int<4>);
+  v512 __builtin_mma_pmxvf32gernp_internal (v512, vuc, vuc, const int<4>, \
+                                            const int<4>);
     PMXVF32GERNP_INTERNAL mma_pmxvf32gernp {mma,quad}
 
-  void __builtin_mma_pmxvf32gerpn (v512 *, vuc, vuc, const int<4>, const int<4>);
+  void __builtin_mma_pmxvf32gerpn (v512 *, vuc, vuc, const int<4>, \
+                                   const int<4>);
     PMXVF32GERPN nothing {mma,quad,mmaint}
 
-  v512 __builtin_mma_pmxvf32gerpn_internal (v512, vuc, vuc, const int<4>, const int<4>);
+  v512 __builtin_mma_pmxvf32gerpn_internal (v512, vuc, vuc, const int<4>, \
+                                            const int<4>);
     PMXVF32GERPN_INTERNAL mma_pmxvf32gerpn {mma,quad}
 
-  void __builtin_mma_pmxvf32gerpp (v512 *, vuc, vuc, const int<4>, const int<4>);
+  void __builtin_mma_pmxvf32gerpp (v512 *, vuc, vuc, const int<4>, \
+                                   const int<4>);
     PMXVF32GERPP nothing {mma,quad,mmaint}
 
-  v512 __builtin_mma_pmxvf32gerpp_internal (v512, vuc, vuc, const int<4>, const int<4>);
+  v512 __builtin_mma_pmxvf32gerpp_internal (v512, vuc, vuc, const int<4>, \
+                                            const int<4>);
     PMXVF32GERPP_INTERNAL mma_pmxvf32gerpp {mma,quad}
 
-  void __builtin_mma_pmxvf64ger (v512 *, v256, vuc, const int<4>, const int<2>);
+  void __builtin_mma_pmxvf64ger (v512 *, v256, vuc, const int<4>, \
+                                 const int<2>);
     PMXVF64GER nothing {mma,pair,mmaint}
 
-  v512 __builtin_mma_pmxvf64ger_internal (v256, vuc, const int<4>, const int<2>);
+  v512 __builtin_mma_pmxvf64ger_internal (v256, vuc, const int<4>, \
+                                          const int<2>);
     PMXVF64GER_INTERNAL mma_pmxvf64ger {mma,pair}
 
-  void __builtin_mma_pmxvf64gernn (v512 *, v256, vuc, const int<4>, const int<2>);
+  void __builtin_mma_pmxvf64gernn (v512 *, v256, vuc, const int<4>, \
+                                   const int<2>);
     PMXVF64GERNN nothing {mma,pair,quad,mmaint}
 
-  v512 __builtin_mma_pmxvf64gernn_internal (v512, v256, vuc, const int<4>, const int<2>);
+  v512 __builtin_mma_pmxvf64gernn_internal (v512, v256, vuc, const int<4>, \
+                                            const int<2>);
     PMXVF64GERNN_INTERNAL mma_pmxvf64gernn {mma,pair,quad}
 
-  void __builtin_mma_pmxvf64gernp (v512 *, v256, vuc, const int<4>, const int<2>);
+  void __builtin_mma_pmxvf64gernp (v512 *, v256, vuc, const int<4>, \
+                                   const int<2>);
     PMXVF64GERNP nothing {mma,pair,quad,mmaint}
 
-  v512 __builtin_mma_pmxvf64gernp_internal (v512, v256, vuc, const int<4>, const int<2>);
+  v512 __builtin_mma_pmxvf64gernp_internal (v512, v256, vuc, const int<4>, \
+                                            const int<2>);
     PMXVF64GERNP_INTERNAL mma_pmxvf64gernp {mma,pair,quad}
 
-  void __builtin_mma_pmxvf64gerpn (v512 *, v256, vuc, const int<4>, const int<2>);
+  void __builtin_mma_pmxvf64gerpn (v512 *, v256, vuc, const int<4>, \
+                                   const int<2>);
     PMXVF64GERPN nothing {mma,pair,quad,mmaint}
 
-  v512 __builtin_mma_pmxvf64gerpn_internal (v512, v256, vuc, const int<4>, const int<2>);
+  v512 __builtin_mma_pmxvf64gerpn_internal (v512, v256, vuc, const int<4>, \
+                                            const int<2>);
     PMXVF64GERPN_INTERNAL mma_pmxvf64gerpn {mma,pair,quad}
 
-  void __builtin_mma_pmxvf64gerpp (v512 *, v256, vuc, const int<4>, const int<2>);
+  void __builtin_mma_pmxvf64gerpp (v512 *, v256, vuc, const int<4>, \
+                                   const int<2>);
     PMXVF64GERPP nothing {mma,pair,quad,mmaint}
 
-  v512 __builtin_mma_pmxvf64gerpp_internal (v512, v256, vuc, const int<4>, const int<2>);
+  v512 __builtin_mma_pmxvf64gerpp_internal (v512, v256, vuc, const int<4>, \
+                                            const int<2>);
     PMXVF64GERPP_INTERNAL mma_pmxvf64gerpp {mma,pair,quad}
 
-  void __builtin_mma_pmxvi16ger2 (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
+  void __builtin_mma_pmxvi16ger2 (v512 *, vuc, vuc, const int<4>, \
+                                  const int<4>, const int<2>);
     PMXVI16GER2 nothing {mma,mmaint}
 
-  v512 __builtin_mma_pmxvi16ger2_internal (vuc, vuc, const int<4>, const int<4>, const int<2>);
+  v512 __builtin_mma_pmxvi16ger2_internal (vuc, vuc, const int<4>, \
+                                           const int<4>, const int<2>);
     PMXVI16GER2_INTERNAL mma_pmxvi16ger2 {mma}
 
-  void __builtin_mma_pmxvi16ger2pp (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
+  void __builtin_mma_pmxvi16ger2pp (v512 *, vuc, vuc, const int<4>, \
+                                    const int<4>, const int<2>);
     PMXVI16GER2PP nothing {mma,quad,mmaint}
 
-  v512 __builtin_mma_pmxvi16ger2pp_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<2>);
+  v512 __builtin_mma_pmxvi16ger2pp_internal (v512, vuc, vuc, const int<4>, \
+                                             const int<4>, const int<2>);
     PMXVI16GER2PP_INTERNAL mma_pmxvi16ger2pp {mma,quad}
 
-  void __builtin_mma_pmxvi16ger2s (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
+  void __builtin_mma_pmxvi16ger2s (v512 *, vuc, vuc, const int<4>, \
+                                   const int<4>, const int<2>);
     PMXVI16GER2S nothing {mma,mmaint}
 
-  v512 __builtin_mma_pmxvi16ger2s_internal (vuc, vuc, const int<4>, const int<4>, const int<2>);
+  v512 __builtin_mma_pmxvi16ger2s_internal (vuc, vuc, const int<4>, \
+                                            const int<4>, const int<2>);
     PMXVI16GER2S_INTERNAL mma_pmxvi16ger2s {mma}
 
-  void __builtin_mma_pmxvi16ger2spp (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
+  void __builtin_mma_pmxvi16ger2spp (v512 *, vuc, vuc, const int<4>, \
+                                     const int<4>, const int<2>);
     PMXVI16GER2SPP nothing {mma,quad,mmaint}
 
-  v512 __builtin_mma_pmxvi16ger2spp_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<2>);
+  v512 __builtin_mma_pmxvi16ger2spp_internal (v512, vuc, vuc, const int<4>, \
+                                              const int<4>, const int<2>);
     PMXVI16GER2SPP_INTERNAL mma_pmxvi16ger2spp {mma,quad}
 
-  void __builtin_mma_pmxvi4ger8 (v512 *, vuc, vuc, const int<4>, const int<4>, const int<8>);
+  void __builtin_mma_pmxvi4ger8 (v512 *, vuc, vuc, const int<4>, \
+                                 const int<4>, const int<8>);
     PMXVI4GER8 nothing {mma,mmaint}
 
-  v512 __builtin_mma_pmxvi4ger8_internal (vuc, vuc, const int<4>, const int<4>, const int<8>);
+  v512 __builtin_mma_pmxvi4ger8_internal (vuc, vuc, const int<4>, \
+                                          const int<4>, const int<8>);
     PMXVI4GER8_INTERNAL mma_pmxvi4ger8 {mma}
 
-  void __builtin_mma_pmxvi4ger8pp (v512 *, vuc, vuc, const int<4>, const int<4>, const int<4>);
+  void __builtin_mma_pmxvi4ger8pp (v512 *, vuc, vuc, const int<4>, \
+                                   const int<4>, const int<4>);
     PMXVI4GER8PP nothing {mma,quad,mmaint}
 
-  v512 __builtin_mma_pmxvi4ger8pp_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<4>);
+  v512 __builtin_mma_pmxvi4ger8pp_internal (v512, vuc, vuc, const int<4>, \
+                                            const int<4>, const int<4>);
     PMXVI4GER8PP_INTERNAL mma_pmxvi4ger8pp {mma,quad}
 
-  void __builtin_mma_pmxvi8ger4 (v512 *, vuc, vuc, const int<4>, const int<4>, const int<4>);
+  void __builtin_mma_pmxvi8ger4 (v512 *, vuc, vuc, const int<4>, \
+                                 const int<4>, const int<4>);
     PMXVI8GER4 nothing {mma,mmaint}
 
-  v512 __builtin_mma_pmxvi8ger4_internal (vuc, vuc, const int<4>, const int<4>, const int<4>);
+  v512 __builtin_mma_pmxvi8ger4_internal (vuc, vuc, const int<4>, \
+                                          const int<4>, const int<4>);
     PMXVI8GER4_INTERNAL mma_pmxvi8ger4 {mma}
 
-  void __builtin_mma_pmxvi8ger4pp (v512 *, vuc, vuc, const int<4>, const int<4>, const int<4>);
+  void __builtin_mma_pmxvi8ger4pp (v512 *, vuc, vuc, const int<4>, \
+                                   const int<4>, const int<4>);
     PMXVI8GER4PP nothing {mma,quad,mmaint}
 
-  v512 __builtin_mma_pmxvi8ger4pp_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<4>);
+  v512 __builtin_mma_pmxvi8ger4pp_internal (v512, vuc, vuc, const int<4>, \
+                                            const int<4>, const int<4>);
     PMXVI8GER4PP_INTERNAL mma_pmxvi8ger4pp {mma,quad}
 
-  void __builtin_mma_pmxvi8ger4spp (v512 *, vuc, vuc, const int<4>, const int<4>, const int<4>);
+  void __builtin_mma_pmxvi8ger4spp (v512 *, vuc, vuc, const int<4>, \
+                                    const int<4>, const int<4>);
     PMXVI8GER4SPP nothing {mma,quad,mmaint}
 
-  v512 __builtin_mma_pmxvi8ger4spp_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<4>);
+  v512 __builtin_mma_pmxvi8ger4spp_internal (v512, vuc, vuc, const int<4>, \
+                                             const int<4>, const int<4>);
     PMXVI8GER4SPP_INTERNAL mma_pmxvi8ger4spp {mma,quad}
 
   void __builtin_mma_xvbf16ger2 (v512 *, vuc, vuc);
diff --git a/gcc/config/rs6000/rs6000-gen-builtins.c b/gcc/config/rs6000/rs6000-gen-builtins.c
index fdef65fe1d4..3405ff0b7b4 100644
--- a/gcc/config/rs6000/rs6000-gen-builtins.c
+++ b/gcc/config/rs6000/rs6000-gen-builtins.c
@@ -188,6 +188,15 @@ static char linebuf[LINELEN];
 static int line;
 static int pos;
 
+/* Escape-newline support.  For readability, we prefer to allow developers
+   to use escape-newline to continue long lines to the next one.  We
+   maintain a buffer of "original" lines here, which are concatenated into
+   linebuf, above, and which can be used to convert the virtual line
+   position "line / pos" into actual line and position information.  */
+#define MAXLINES 4
+static char *lines[MAXLINES];
+static int lastline;
+
 /* Used to determine whether a type can be void (only return types).  */
 enum void_status
 {
@@ -568,31 +577,61 @@ static typemap type_map[TYPE_MAP_SIZE] =
     { "vp8hi",		"pixel_V8HI" },
   };
 
+/* From a possibly extended line with a virtual position, calculate
+   the current line and character position.  */
+static void
+real_line_pos (int diagpos, int *real_line, int *real_pos)
+{
+  *real_line = line - lastline;
+  *real_pos = diagpos;
+
+  for (int i = 0; i < MAXLINES && *real_pos > (int) strlen (lines[i]); i++)
+    {
+      (*real_line)++;
+      *real_pos -= strlen (lines[i]) - 2;
+    }
+
+  /* Convert from zero-base to one-base for printing.  */
+  (*real_pos)++;
+}
+
 /* Pointer to a diagnostic function.  */
-static void (*diag) (const char *, ...)
-  __attribute__ ((format (printf, 1, 2)));
+static void (*diag) (int, const char *, ...)
+  __attribute__ ((format (printf, 2, 3)));
 
 /* Custom diagnostics.  */
-static void __attribute__ ((format (printf, 1, 2)))
-bif_diag (const char * fmt, ...)
+static void __attribute__ ((format (printf, 2, 3)))
+bif_diag (int diagpos, const char * fmt, ...)
 {
   va_list args;
-  fprintf (stderr, "%s:%d: ", bif_path, line);
+  int real_line, real_pos;
+  real_line_pos (diagpos, &real_line, &real_pos);
+  fprintf (stderr, "%s:%d:%d: ", bif_path, real_line, real_pos);
   va_start (args, fmt);
   vfprintf (stderr, fmt, args);
   va_end (args);
 }
 
-static void __attribute__ ((format (printf, 1, 2)))
-ovld_diag (const char * fmt, ...)
+static void __attribute__ ((format (printf, 2, 3)))
+ovld_diag (int diagpos, const char * fmt, ...)
 {
   va_list args;
-  fprintf (stderr, "%s:%d: ", ovld_path, line);
+  int real_line, real_pos;
+  real_line_pos (diagpos, &real_line, &real_pos);
+  fprintf (stderr, "%s:%d:%d: ", ovld_path, real_line, real_pos);
   va_start (args, fmt);
   vfprintf (stderr, fmt, args);
   va_end (args);
 }
 
+/* Produce a fatal error message.  */
+static void
+fatal (const char *msg)
+{
+  fprintf (stderr, "FATAL: %s\n", msg);
+  abort ();
+}
+
 /* Pass over whitespace (other than a newline, which terminates the scan).  */
 static void
 consume_whitespace (void)
@@ -602,7 +641,7 @@ consume_whitespace (void)
 
   if (pos >= LINELEN)
     {
-      diag ("line length overrun at %d.\n", pos);
+      diag (pos, "line length overrun.\n");
       exit (1);
     }
 
@@ -620,8 +659,28 @@ advance_line (FILE *file)
 	return 0;
       line++;
       size_t len = strlen (linebuf);
+
+      /* Escape-newline processing.  */
+      lastline = 0;
+      if (len > 1)
+	{
+	  strcpy (lines[0], linebuf);
+	  while (linebuf[len - 2] == '\\'
+		 && linebuf[len - 1] == '\n')
+	    {
+	      if (lastline == MAXLINES - 1)
+		fatal ("number of supported overflow lines exceeded");
+	      lastline++;
+	      line++;
+	      if (!fgets (lines[lastline], LINELEN, file))
+		fatal ("unexpected end of file");
+	      strcpy (&linebuf[len - 2], lines[lastline]);
+	      len += strlen (lines[lastline]) - 2;
+	    }
+	}
+
       if (linebuf[len - 1] != '\n')
-	(*diag) ("line doesn't terminate with newline\n");
+	fatal ("line doesn't terminate with newline");
       pos = 0;
       consume_whitespace ();
       if (linebuf[pos] != '\n' && linebuf[pos] != ';')
@@ -634,7 +693,7 @@ safe_inc_pos (void)
 {
   if (++pos >= LINELEN)
     {
-      (*diag) ("line length overrun.\n");
+      diag (pos, "line length overrun.\n");
       exit (1);
     }
 }
@@ -651,7 +710,7 @@ match_identifier (void)
 
   if (lastpos >= LINELEN - 1)
     {
-      diag ("line length overrun at %d.\n", lastpos);
+      diag (lastpos, "line length overrun.\n");
       exit (1);
     }
 
@@ -681,7 +740,7 @@ match_integer (void)
 
   if (lastpos >= LINELEN - 1)
     {
-      diag ("line length overrun at %d.\n", lastpos);
+      diag (lastpos, "line length overrun.\n");
       exit (1);
     }
 
@@ -705,16 +764,13 @@ match_to_right_bracket (void)
   while (lastpos < LINELEN - 1 && linebuf[lastpos + 1] != ']')
     {
       if (linebuf[lastpos + 1] == '\n')
-	{
-	  (*diag) ("no ']' found before end of line.\n");
-	  exit (1);
-	}
+	fatal ("no ']' found before end of line.\n");
       ++lastpos;
     }
 
   if (lastpos >= LINELEN - 1)
     {
-      diag ("line length overrun at %d.\n", lastpos);
+      diag (lastpos, "line length overrun.\n");
       exit (1);
     }
 
@@ -740,14 +796,6 @@ handle_pointer (typeinfo *typedata)
     }
 }
 
-/* Produce a fatal error message.  */
-static void
-fatal (const char *msg)
-{
-  fprintf (stderr, "FATAL: %s\n", msg);
-  abort ();
-}
-
 static bif_stanza
 stanza_name_to_stanza (const char *stanza_name)
 {
@@ -771,7 +819,7 @@ match_basetype (typeinfo *typedata)
   char *token = match_identifier ();
   if (!token)
     {
-      (*diag) ("missing base type in return type at column %d\n", pos + 1);
+      diag (pos, "missing base type in return type\n");
       return 0;
     }
 
@@ -825,7 +873,7 @@ match_basetype (typeinfo *typedata)
     typedata->base = BT_IBM128;
   else
     {
-      (*diag) ("unrecognized base type at column %d\n", oldpos + 1);
+      diag (oldpos, "unrecognized base type\n");
       return 0;
     }
 
@@ -845,13 +893,13 @@ match_bracketed_pair (typeinfo *typedata, char open, char close,
       char *x = match_integer ();
       if (x == NULL)
 	{
-	  (*diag) ("malformed integer at column %d.\n", oldpos + 1);
+	  diag (oldpos, "malformed integer.\n");
 	  return 0;
 	}
       consume_whitespace ();
       if (linebuf[pos] != ',')
 	{
-	  (*diag) ("missing comma at column %d.\n", pos + 1);
+	  diag (pos, "missing comma.\n");
 	  return 0;
 	}
       safe_inc_pos ();
@@ -860,7 +908,7 @@ match_bracketed_pair (typeinfo *typedata, char open, char close,
       char *y = match_integer ();
       if (y == NULL)
 	{
-	  (*diag) ("malformed integer at column %d.\n", oldpos + 1);
+	  diag (oldpos, "malformed integer.\n");
 	  return 0;
 	}
       typedata->restr = restr;
@@ -870,7 +918,7 @@ match_bracketed_pair (typeinfo *typedata, char open, char close,
       consume_whitespace ();
       if (linebuf[pos] != close)
 	{
-	  (*diag) ("malformed restriction at column %d.\n", pos + 1);
+	  diag (pos, "malformed restriction.\n");
 	  return 0;
 	}
       safe_inc_pos ();
@@ -905,7 +953,7 @@ match_const_restriction (typeinfo *typedata)
       char *x = match_integer ();
       if (x == NULL)
 	{
-	  (*diag) ("malformed integer at column %d.\n", oldpos + 1);
+	  diag (oldpos, "malformed integer.\n");
 	  return 0;
 	}
       consume_whitespace ();
@@ -918,7 +966,7 @@ match_const_restriction (typeinfo *typedata)
 	}
       else if (linebuf[pos] != ',')
 	{
-	  (*diag) ("malformed restriction at column %d.\n", pos + 1);
+	  diag (pos, "malformed restriction.\n");
 	  return 0;
 	}
       safe_inc_pos ();
@@ -926,7 +974,7 @@ match_const_restriction (typeinfo *typedata)
       char *y = match_integer ();
       if (y == NULL)
 	{
-	  (*diag) ("malformed integer at column %d.\n", oldpos + 1);
+	  diag (oldpos, "malformed integer.\n");
 	  return 0;
 	}
       typedata->restr = RES_RANGE;
@@ -936,7 +984,7 @@ match_const_restriction (typeinfo *typedata)
       consume_whitespace ();
       if (linebuf[pos] != '>')
 	{
-	  (*diag) ("malformed restriction at column %d.\n", pos + 1);
+	  diag (pos, "malformed restriction.\n");
 	  return 0;
 	}
       safe_inc_pos ();
@@ -1217,8 +1265,7 @@ match_type (typeinfo *typedata, int voidok)
 	return 1;
       if (typedata->base != BT_INT)
 	{
-	  (*diag)("'const' at %d requires pointer or integer type",
-		  oldpos + 1);
+	  diag (oldpos, "'const' requires pointer or integer type\n");
 	  return 0;
 	}
       consume_whitespace ();
@@ -1248,7 +1295,7 @@ parse_args (prototype *protoptr)
   consume_whitespace ();
   if (linebuf[pos] != '(')
     {
-      (*diag) ("missing '(' at column %d.\n", pos + 1);
+      diag (pos, "missing '('.\n");
       return PC_PARSEFAIL;
     }
   safe_inc_pos ();
@@ -1266,7 +1313,7 @@ parse_args (prototype *protoptr)
 	  {
 	    if (restr_cnt >= MAXRESTROPNDS)
 	      {
-		(*diag) ("More than two %d operands\n", MAXRESTROPNDS);
+		diag (pos, "More than two %d operands\n", MAXRESTROPNDS);
 		return PC_PARSEFAIL;
 	      }
 	    restr_opnd[restr_cnt] = *nargs + 1;
@@ -1283,20 +1330,20 @@ parse_args (prototype *protoptr)
 	  safe_inc_pos ();
 	else if (linebuf[pos] != ')')
 	  {
-	    (*diag) ("arg not followed by ',' or ')' at column %d.\n",
-		     pos + 1);
+	    diag (pos, "arg not followed by ',' or ')'.\n");
 	    return PC_PARSEFAIL;
 	  }
 
 #ifdef DEBUG
-	(*diag) ("argument type: isvoid = %d, isconst = %d, isvector = %d, "
-		 "issigned = %d, isunsigned = %d, isbool = %d, ispixel = %d, "
-		 "ispointer = %d, base = %d, restr = %d, val1 = \"%s\", "
-		 "val2 = \"%s\", pos = %d.\n",
-		 argtype->isvoid, argtype->isconst, argtype->isvector,
-		 argtype->issigned, argtype->isunsigned, argtype->isbool,
-		 argtype->ispixel, argtype->ispointer, argtype->base,
-		 argtype->restr, argtype->val1, argtype->val2, pos + 1);
+	diag (0,
+	      "argument type: isvoid = %d, isconst = %d, isvector = %d, "
+	      "issigned = %d, isunsigned = %d, isbool = %d, ispixel = %d, "
+	      "ispointer = %d, base = %d, restr = %d, val1 = \"%s\", "
+	      "val2 = \"%s\", pos = %d.\n",
+	      argtype->isvoid, argtype->isconst, argtype->isvector,
+	      argtype->issigned, argtype->isunsigned, argtype->isbool,
+	      argtype->ispixel, argtype->ispointer, argtype->base,
+	      argtype->restr, argtype->val1, argtype->val2, pos + 1);
 #endif
       }
     else
@@ -1306,7 +1353,7 @@ parse_args (prototype *protoptr)
 	pos = oldpos;
 	if (linebuf[pos] != ')')
 	  {
-	    (*diag) ("badly terminated arg list at column %d.\n", pos + 1);
+	    diag (pos, "badly terminated arg list.\n");
 	    return PC_PARSEFAIL;
 	  }
 	safe_inc_pos ();
@@ -1323,7 +1370,7 @@ parse_bif_attrs (attrinfo *attrptr)
   consume_whitespace ();
   if (linebuf[pos] != '{')
     {
-      (*diag) ("missing attribute set at column %d.\n", pos + 1);
+      diag (pos, "missing attribute set.\n");
       return PC_PARSEFAIL;
     }
   safe_inc_pos ();
@@ -1383,7 +1430,7 @@ parse_bif_attrs (attrinfo *attrptr)
 	  attrptr->isendian = 1;
 	else
 	  {
-	    (*diag) ("unknown attribute at column %d.\n", oldpos + 1);
+	    diag (oldpos, "unknown attribute.\n");
 	    return PC_PARSEFAIL;
 	  }
 
@@ -1392,8 +1439,7 @@ parse_bif_attrs (attrinfo *attrptr)
 	  safe_inc_pos ();
 	else if (linebuf[pos] != '}')
 	  {
-	    (*diag) ("arg not followed by ',' or '}' at column %d.\n",
-		     pos + 1);
+	    diag (pos, "arg not followed by ',' or '}'.\n");
 	    return PC_PARSEFAIL;
 	  }
       }
@@ -1402,7 +1448,7 @@ parse_bif_attrs (attrinfo *attrptr)
 	pos = oldpos;
 	if (linebuf[pos] != '}')
 	  {
-	    (*diag) ("badly terminated attr set at column %d.\n", pos + 1);
+	    diag (pos, "badly terminated attr set.\n");
 	    return PC_PARSEFAIL;
 	  }
 	safe_inc_pos ();
@@ -1410,18 +1456,19 @@ parse_bif_attrs (attrinfo *attrptr)
   } while (attrname);
 
 #ifdef DEBUG
-  (*diag) ("attribute set: init = %d, set = %d, extract = %d, nosoft = %d, "
-	   "ldvec = %d, stvec = %d, reve = %d, pred = %d, htm = %d, "
-	   "htmspr = %d, htmcr = %d, mma = %d, quad = %d, pair = %d, "
-	   "mmaint = %d, no32bit = %d, 32bit = %d, cpu = %d, ldstmask = %d, "
-	   "lxvrse = %d, lxvrze = %d, endian = %d.\n",
-	   attrptr->isinit, attrptr->isset, attrptr->isextract,
-	   attrptr->isnosoft, attrptr->isldvec, attrptr->isstvec,
-	   attrptr->isreve, attrptr->ispred, attrptr->ishtm, attrptr->ishtmspr,
-	   attrptr->ishtmcr, attrptr->ismma, attrptr->isquad, attrptr->ispair,
-	   attrptr->ismmaint, attrptr->isno32bit, attrptr->is32bit,
-	   attrptr->iscpu, attrptr->isldstmask, attrptr->islxvrse,
-	   attrptr->islxvrze, attrptr->isendian);
+  diag (0,
+	"attribute set: init = %d, set = %d, extract = %d, nosoft = %d, "
+	"ldvec = %d, stvec = %d, reve = %d, pred = %d, htm = %d, "
+	"htmspr = %d, htmcr = %d, mma = %d, quad = %d, pair = %d, "
+	"mmaint = %d, no32bit = %d, 32bit = %d, cpu = %d, ldstmask = %d, "
+	"lxvrse = %d, lxvrze = %d, endian = %d.\n",
+	attrptr->isinit, attrptr->isset, attrptr->isextract,
+	attrptr->isnosoft, attrptr->isldvec, attrptr->isstvec,
+	attrptr->isreve, attrptr->ispred, attrptr->ishtm, attrptr->ishtmspr,
+	attrptr->ishtmcr, attrptr->ismma, attrptr->isquad, attrptr->ispair,
+	attrptr->ismmaint, attrptr->isno32bit, attrptr->is32bit,
+	attrptr->iscpu, attrptr->isldstmask, attrptr->islxvrse,
+	attrptr->islxvrze, attrptr->isendian);
 #endif
 
   return PC_OK;
@@ -1483,7 +1530,7 @@ complete_vector_type (typeinfo *typeptr, char *buf, int *bufi)
       *bufi += 4;
       break;
     default:
-      (*diag) ("unhandled basetype %d.\n", typeptr->base);
+      diag (pos, "unhandled basetype %d.\n", typeptr->base);
       exit (1);
     }
 }
@@ -1543,7 +1590,7 @@ complete_base_type (typeinfo *typeptr, char *buf, int *bufi)
       memcpy (&buf[*bufi], "if", 2);
       break;
     default:
-      (*diag) ("unhandled basetype %d.\n", typeptr->base);
+      diag (pos, "unhandled basetype %d.\n", typeptr->base);
       exit (1);
     }
 
@@ -1664,20 +1711,20 @@ parse_prototype (prototype *protoptr)
   int success = match_type (ret_type, VOID_OK);
   if (!success)
     {
-      (*diag) ("missing or badly formed return type at column %d.\n",
-	       oldpos + 1);
+      diag (oldpos, "missing or badly formed return type.\n");
       return PC_PARSEFAIL;
     }
 
 #ifdef DEBUG
-  (*diag) ("return type: isvoid = %d, isconst = %d, isvector = %d, "
-	   "issigned = %d, isunsigned = %d, isbool = %d, ispixel = %d, "
-	   "ispointer = %d, base = %d, restr = %d, val1 = \"%s\", "
-	   "val2 = \"%s\", pos = %d.\n",
-	   ret_type->isvoid, ret_type->isconst, ret_type->isvector,
-	   ret_type->issigned, ret_type->isunsigned, ret_type->isbool,
-	   ret_type->ispixel, ret_type->ispointer, ret_type->base,
-	   ret_type->restr, ret_type->val1, ret_type->val2, pos + 1);
+  diag (0,
+	"return type: isvoid = %d, isconst = %d, isvector = %d, "
+	"issigned = %d, isunsigned = %d, isbool = %d, ispixel = %d, "
+	"ispointer = %d, base = %d, restr = %d, val1 = \"%s\", "
+	"val2 = \"%s\", pos = %d.\n",
+	ret_type->isvoid, ret_type->isconst, ret_type->isvector,
+	ret_type->issigned, ret_type->isunsigned, ret_type->isbool,
+	ret_type->ispixel, ret_type->ispointer, ret_type->base,
+	ret_type->restr, ret_type->val1, ret_type->val2, pos + 1);
 #endif
 
   /* Get the bif name.  */
@@ -1686,12 +1733,12 @@ parse_prototype (prototype *protoptr)
   *bifname = match_identifier ();
   if (!*bifname)
     {
-      (*diag) ("missing function name at column %d.\n", oldpos + 1);
+      diag (oldpos, "missing function name.\n");
       return PC_PARSEFAIL;
     }
 
 #ifdef DEBUG
-  (*diag) ("function name is '%s'.\n", *bifname);
+  diag (0, "function name is '%s'.\n", *bifname);
 #endif
 
   /* Process arguments.  */
@@ -1702,14 +1749,14 @@ parse_prototype (prototype *protoptr)
   consume_whitespace ();
   if (linebuf[pos] != ';')
     {
-      (*diag) ("missing semicolon at column %d.\n", pos + 1);
+      diag (pos, "missing semicolon.\n");
       return PC_PARSEFAIL;
     }
   safe_inc_pos ();
   consume_whitespace ();
   if (linebuf[pos] != '\n')
     {
-      (*diag) ("garbage at end of line at column %d.\n", pos + 1);
+      diag (pos, "garbage at end of line.\n");
       return PC_PARSEFAIL;
     }
 
@@ -1729,7 +1776,7 @@ parse_bif_entry (void)
   /* Allocate an entry in the bif table.  */
   if (num_bifs >= MAXBIFS - 1)
     {
-      (*diag) ("too many built-in functions.\n");
+      diag (pos, "too many built-in functions.\n");
       return PC_PARSEFAIL;
     }
 
@@ -1742,7 +1789,7 @@ parse_bif_entry (void)
   char *token = match_identifier ();
   if (!token)
     {
-      (*diag) ("malformed entry at column %d\n", oldpos + 1);
+      diag (oldpos, "malformed entry.\n");
       return PC_PARSEFAIL;
     }
 
@@ -1769,7 +1816,7 @@ parse_bif_entry (void)
   /* Now process line 2.  First up is the builtin id.  */
   if (!advance_line (bif_file))
     {
-      (*diag) ("unexpected EOF.\n");
+      diag (pos, "unexpected EOF.\n");
       return PC_PARSEFAIL;
     }
 
@@ -1779,19 +1826,18 @@ parse_bif_entry (void)
   bifs[curr_bif].idname = match_identifier ();
   if (!bifs[curr_bif].idname)
     {
-      (*diag) ("missing builtin id at column %d.\n", pos + 1);
+      diag (pos, "missing builtin id.\n");
       return PC_PARSEFAIL;
     }
 
 #ifdef DEBUG
-  (*diag) ("ID name is '%s'.\n", bifs[curr_bif].idname);
+  diag (0, "ID name is '%s'.\n", bifs[curr_bif].idname);
 #endif
 
   /* Save the ID in a lookup structure.  */
   if (!rbt_insert (&bif_rbt, bifs[curr_bif].idname))
     {
-      (*diag) ("duplicate function ID '%s' at column %d.\n",
-	       bifs[curr_bif].idname, oldpos + 1);
+      diag (oldpos, "duplicate function ID '%s'.\n", bifs[curr_bif].idname);
       return PC_PARSEFAIL;
     }
 
@@ -1804,7 +1850,7 @@ parse_bif_entry (void)
 
   if (!rbt_insert (&bifo_rbt, buf))
     {
-      (*diag) ("internal error inserting '%s' in bifo_rbt\n", buf);
+      diag (pos, "internal error inserting '%s' in bifo_rbt\n", buf);
       return PC_PARSEFAIL;
     }
 
@@ -1813,12 +1859,12 @@ parse_bif_entry (void)
   bifs[curr_bif].patname = match_identifier ();
   if (!bifs[curr_bif].patname)
     {
-      (*diag) ("missing pattern name at column %d.\n", pos + 1);
+      diag (pos, "missing pattern name.\n");
       return PC_PARSEFAIL;
     }
 
 #ifdef DEBUG
-  (*diag) ("pattern name is '%s'.\n", bifs[curr_bif].patname);
+  diag (0, "pattern name is '%s'.\n", bifs[curr_bif].patname);
 #endif
 
   /* Process attributes.  */
@@ -1836,7 +1882,7 @@ parse_bif_stanza (void)
 
   if (linebuf[pos] != '[')
     {
-      (*diag) ("ill-formed stanza header at column %d.\n", pos + 1);
+      diag (pos, "ill-formed stanza header.\n");
       return PC_PARSEFAIL;
     }
   safe_inc_pos ();
@@ -1844,7 +1890,7 @@ parse_bif_stanza (void)
   const char *stanza_name = match_to_right_bracket ();
   if (!stanza_name)
     {
-      (*diag) ("no expression found in stanza header.\n");
+      diag (pos, "no expression found in stanza header.\n");
       return PC_PARSEFAIL;
     }
 
@@ -1852,7 +1898,7 @@ parse_bif_stanza (void)
 
   if (linebuf[pos] != ']')
     {
-      (*diag) ("ill-formed stanza header at column %d.\n", pos + 1);
+      diag (pos, "ill-formed stanza header.\n");
       return PC_PARSEFAIL;
     }
   safe_inc_pos ();
@@ -1860,7 +1906,7 @@ parse_bif_stanza (void)
   consume_whitespace ();
   if (linebuf[pos] != '\n' && pos != LINELEN - 1)
     {
-      (*diag) ("garbage after stanza header.\n");
+      diag (pos, "garbage after stanza header.\n");
       return PC_PARSEFAIL;
     }
 
@@ -1927,7 +1973,7 @@ parse_ovld_entry (void)
   /* Allocate an entry in the overload table.  */
   if (num_ovlds >= MAXOVLDS - 1)
     {
-      (*diag) ("too many overloads.\n");
+      diag (pos, "too many overloads.\n");
       return PC_PARSEFAIL;
     }
 
@@ -1948,7 +1994,7 @@ parse_ovld_entry (void)
      optional overload id.  */
   if (!advance_line (ovld_file))
     {
-      (*diag) ("unexpected EOF.\n");
+      diag (0, "unexpected EOF.\n");
       return PC_EOFILE;
     }
 
@@ -1960,18 +2006,18 @@ parse_ovld_entry (void)
   ovlds[curr_ovld].ovld_id_name = id;
   if (!id)
     {
-      (*diag) ("missing overload id at column %d.\n", pos + 1);
+      diag (pos, "missing overload id.\n");
       return PC_PARSEFAIL;
     }
 
 #ifdef DEBUG
-  (*diag) ("ID name is '%s'.\n", id);
+  diag (pos, "ID name is '%s'.\n", id);
 #endif
 
   /* The builtin id has to match one from the bif file.  */
   if (!rbt_find (&bif_rbt, id))
     {
-      (*diag) ("builtin ID '%s' not found in bif file.\n", id);
+      diag (pos, "builtin ID '%s' not found in bif file.\n", id);
       return PC_PARSEFAIL;
     }
 
@@ -1989,13 +2035,13 @@ parse_ovld_entry (void)
  /* Save the overload ID in a lookup structure.  */
   if (!rbt_insert (&ovld_rbt, id))
     {
-      (*diag) ("duplicate overload ID '%s' at column %d.\n", id, oldpos + 1);
+      diag (oldpos, "duplicate overload ID '%s'.\n", id);
       return PC_PARSEFAIL;
     }
 
   if (linebuf[pos] != '\n')
     {
-      (*diag) ("garbage at end of line at column %d.\n", pos + 1);
+      diag (pos, "garbage at end of line.\n");
       return PC_PARSEFAIL;
     }
   return PC_OK;
@@ -2012,7 +2058,7 @@ parse_ovld_stanza (void)
 
   if (linebuf[pos] != '[')
     {
-      (*diag) ("ill-formed stanza header at column %d.\n", pos + 1);
+      diag (pos, "ill-formed stanza header.\n");
       return PC_PARSEFAIL;
     }
   safe_inc_pos ();
@@ -2020,7 +2066,7 @@ parse_ovld_stanza (void)
   char *stanza_name = match_identifier ();
   if (!stanza_name)
     {
-      (*diag) ("no identifier found in stanza header.\n");
+      diag (pos, "no identifier found in stanza header.\n");
       return PC_PARSEFAIL;
     }
 
@@ -2028,7 +2074,7 @@ parse_ovld_stanza (void)
      with subsequent overload entries.  */
   if (num_ovld_stanzas >= MAXOVLDSTANZAS)
     {
-      (*diag) ("too many stanza headers.\n");
+      diag (pos, "too many stanza headers.\n");
       return PC_PARSEFAIL;
     }
 
@@ -2039,7 +2085,7 @@ parse_ovld_stanza (void)
   consume_whitespace ();
   if (linebuf[pos] != ',')
     {
-      (*diag) ("missing comma at column %d.\n", pos + 1);
+      diag (pos, "missing comma.\n");
       return PC_PARSEFAIL;
     }
   safe_inc_pos ();
@@ -2048,14 +2094,14 @@ parse_ovld_stanza (void)
   stanza->extern_name = match_identifier ();
   if (!stanza->extern_name)
     {
-      (*diag) ("missing external name at column %d.\n", pos + 1);
+      diag (pos, "missing external name.\n");
       return PC_PARSEFAIL;
     }
 
   consume_whitespace ();
   if (linebuf[pos] != ',')
     {
-      (*diag) ("missing comma at column %d.\n", pos + 1);
+      diag (pos, "missing comma.\n");
       return PC_PARSEFAIL;
     }
   safe_inc_pos ();
@@ -2064,7 +2110,7 @@ parse_ovld_stanza (void)
   stanza->intern_name = match_identifier ();
   if (!stanza->intern_name)
     {
-      (*diag) ("missing internal name at column %d.\n", pos + 1);
+      diag (pos, "missing internal name.\n");
       return PC_PARSEFAIL;
     }
 
@@ -2076,7 +2122,7 @@ parse_ovld_stanza (void)
       stanza->ifdef = match_identifier ();
       if (!stanza->ifdef)
 	{
-	  (*diag) ("missing ifdef token at column %d.\n", pos + 1);
+	  diag (pos, "missing ifdef token.\n");
 	  return PC_PARSEFAIL;
 	}
       consume_whitespace ();
@@ -2086,7 +2132,7 @@ parse_ovld_stanza (void)
 
   if (linebuf[pos] != ']')
     {
-      (*diag) ("ill-formed stanza header at column %d.\n", pos + 1);
+      diag (pos, "ill-formed stanza header.\n");
       return PC_PARSEFAIL;
     }
   safe_inc_pos ();
@@ -2094,7 +2140,7 @@ parse_ovld_stanza (void)
   consume_whitespace ();
   if (linebuf[pos] != '\n' && pos != LINELEN - 1)
     {
-      (*diag) ("garbage after stanza header.\n");
+      diag (pos, "garbage after stanza header.\n");
       return PC_PARSEFAIL;
     }
 
@@ -2943,6 +2989,10 @@ main (int argc, const char **argv)
       exit (1);
     }
 
+  /* Allocate some buffers.  */
+  for (int i = 0; i < MAXLINES; i++)
+    lines[i] = (char *) malloc (LINELEN);
+
   /* Initialize the balanced trees containing built-in function ids,
      overload function ids, and function type declaration ids.  */
   rbt_new (&bif_rbt);
-- 
2.27.0


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCHv5 00/18] Replace the Power target-specific builtin machinery
  2021-09-01 16:13 [PATCHv5 00/18] Replace the Power target-specific builtin machinery Bill Schmidt
                   ` (17 preceding siblings ...)
  2021-09-01 16:13 ` [PATCH 18/18] rs6000: Add escape-newline support for builtins files Bill Schmidt
@ 2021-09-13 13:33 ` Bill Schmidt
  18 siblings, 0 replies; 52+ messages in thread
From: Bill Schmidt @ 2021-09-13 13:33 UTC (permalink / raw)
  To: gcc-patches; +Cc: dje.gcc, segher

Ping.

Message-Id: <cover.1630511334.git.wschmidt@linux.ibm.com>

Thanks!
Bill

On 9/1/21 11:13 AM, Bill Schmidt via Gcc-patches wrote:
> Hi!
>
> Original patch series here:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-April/568840.html
>
> V2 patch series here:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572231.html
>
> V3 patch series here:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-June/573020.html
>
> V4 patch series here:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-July/576284.html
>
> Thanks for all the reviews so far!  We're into the home stretch.  I needed
> to rebase this series again in order to pick up some changes from upstream.
>
> Patch 01/18 is a reposting of V4 patch 19/34, addressing some of the
> comments.  Full refactoring of this stuff will be done later, after this
> patch series can burn in a little.  This wasn't yet formally approved.
>
> Patch 02/18 is new, and is a minor bug fix.
>
> Patches 03/18 through 17/18 correspond to V4 patches 20/34 through 34/34.
> These were adjusted for upstream changes, and I did some formatting
> cleanups.  I also provided better descriptions for some of the patches.
>
> Patch 18/18 is new, and improves the parser to handle escape-newline
> input.  With that in place, it cleans up all the long lines in the
> input files.
>
> Bootstrapped and tested on powerpc64le-linux-gnu (P10) and
> powerpc64-linux-gnu (32- and 64-bit, P8).  There are no regressions for
> little endian.  There are a small handful of big-endian regressions that
> have crept in, and I'll post patches for those after I work through them.
> But no need to hold up reviews on the rest of this in the meantime.
>
> Thanks again for all of the helpful reviews so far!
>
> Bill
>
>
> Bill Schmidt (18):
>    rs6000: Handle overloads during program parsing
>    rs6000: Move __builtin_mffsl to the [always] stanza
>    rs6000: Handle gimple folding of target built-ins
>    rs6000: Handle some recent MMA builtin changes
>    rs6000: Support for vectorizing built-in functions
>    rs6000: Builtin expansion, part 1
>    rs6000: Builtin expansion, part 2
>    rs6000: Builtin expansion, part 3
>    rs6000: Builtin expansion, part 4
>    rs6000: Builtin expansion, part 5
>    rs6000: Builtin expansion, part 6
>    rs6000: Update rs6000_builtin_decl
>    rs6000: Miscellaneous uses of rs6000_builtins_decl_x
>    rs6000: Debug support
>    rs6000: Update altivec.h for automated interfaces
>    rs6000: Test case adjustments
>    rs6000: Enable the new builtin support
>    rs6000: Add escape-newline support for builtins files
>
>   gcc/config/rs6000/altivec.h                   |  519 +--
>   gcc/config/rs6000/rs6000-builtin-new.def      |  442 ++-
>   gcc/config/rs6000/rs6000-c.c                  | 1088 ++++++
>   gcc/config/rs6000/rs6000-call.c               | 3132 +++++++++++++++--
>   gcc/config/rs6000/rs6000-gen-builtins.c       |  312 +-
>   gcc/config/rs6000/rs6000.c                    |  272 +-
>   .../powerpc/bfp/scalar-extract-exp-2.c        |    2 +-
>   .../powerpc/bfp/scalar-extract-sig-2.c        |    2 +-
>   .../powerpc/bfp/scalar-insert-exp-2.c         |    2 +-
>   .../powerpc/bfp/scalar-insert-exp-5.c         |    2 +-
>   .../powerpc/bfp/scalar-insert-exp-8.c         |    2 +-
>   .../powerpc/bfp/scalar-test-neg-2.c           |    2 +-
>   .../powerpc/bfp/scalar-test-neg-3.c           |    2 +-
>   .../powerpc/bfp/scalar-test-neg-5.c           |    2 +-
>   .../gcc.target/powerpc/byte-in-set-2.c        |    2 +-
>   gcc/testsuite/gcc.target/powerpc/cmpb-2.c     |    2 +-
>   gcc/testsuite/gcc.target/powerpc/cmpb32-2.c   |    2 +-
>   .../gcc.target/powerpc/crypto-builtin-2.c     |   14 +-
>   .../powerpc/fold-vec-splat-floatdouble.c      |    4 +-
>   .../powerpc/fold-vec-splat-longlong.c         |   10 +-
>   .../powerpc/fold-vec-splat-misc-invalid.c     |    8 +-
>   .../gcc.target/powerpc/int_128bit-runnable.c  |    6 +-
>   .../gcc.target/powerpc/p8vector-builtin-8.c   |    1 +
>   gcc/testsuite/gcc.target/powerpc/pr80315-1.c  |    2 +-
>   gcc/testsuite/gcc.target/powerpc/pr80315-2.c  |    2 +-
>   gcc/testsuite/gcc.target/powerpc/pr80315-3.c  |    2 +-
>   gcc/testsuite/gcc.target/powerpc/pr80315-4.c  |    2 +-
>   gcc/testsuite/gcc.target/powerpc/pr88100.c    |   12 +-
>   .../gcc.target/powerpc/pragma_misc9.c         |    2 +-
>   .../gcc.target/powerpc/pragma_power8.c        |    2 +
>   .../gcc.target/powerpc/pragma_power9.c        |    3 +
>   .../powerpc/test_fpscr_drn_builtin_error.c    |    4 +-
>   .../powerpc/test_fpscr_rn_builtin_error.c     |   12 +-
>   gcc/testsuite/gcc.target/powerpc/test_mffsl.c |    3 +-
>   gcc/testsuite/gcc.target/powerpc/vec-gnb-2.c  |    2 +-
>   .../gcc.target/powerpc/vsu/vec-all-nez-7.c    |    2 +-
>   .../gcc.target/powerpc/vsu/vec-any-eqz-7.c    |    2 +-
>   .../gcc.target/powerpc/vsu/vec-cmpnez-7.c     |    2 +-
>   .../gcc.target/powerpc/vsu/vec-cntlz-lsbb-2.c |    2 +-
>   .../gcc.target/powerpc/vsu/vec-cnttz-lsbb-2.c |    2 +-
>   .../gcc.target/powerpc/vsu/vec-xl-len-13.c    |    2 +-
>   .../gcc.target/powerpc/vsu/vec-xst-len-12.c   |    2 +-
>   42 files changed, 4803 insertions(+), 1089 deletions(-)
>


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 01/18] rs6000: Handle overloads during program parsing
  2021-09-01 16:13 ` [PATCH 01/18] rs6000: Handle overloads during program parsing Bill Schmidt
@ 2021-09-13 17:17   ` will schmidt
  2021-09-13 23:53   ` Segher Boessenkool
  1 sibling, 0 replies; 52+ messages in thread
From: will schmidt @ 2021-09-13 17:17 UTC (permalink / raw)
  To: Bill Schmidt, gcc-patches; +Cc: dje.gcc, segher

On Wed, 2021-09-01 at 11:13 -0500, Bill Schmidt via Gcc-patches wrote:

Hi, 
  Just a couple cosmetic nits noted below, the majority if which is also in
the original code this is based on.  
THanks
-Will


> Although this patch looks quite large, the changes are fairly minimal.
> Most of it is duplicating the large function that does the overload
> resolution using the automatically generated data structures instead of
> the old hand-generated ones.  This doesn't make the patch terribly easy to
> review, unfortunately.  Just be aware that generally we aren't changing
> the logic and functionality of overload handling.

ok


> 
> 2021-08-31  Bill Schmidt  <wschmidt@linux.ibm.com>
> 
> gcc/
> 	* config/rs6000/rs6000-c.c (rs6000-builtins.h): New include.
> 	(altivec_resolve_new_overloaded_builtin): New forward decl.
> 	(rs6000_new_builtin_type_compatible): New function.
> 	(altivec_resolve_overloaded_builtin): Call
> 	altivec_resolve_new_overloaded_builtin.
> 	(altivec_build_new_resolved_builtin): New function.
> 	(altivec_resolve_new_overloaded_builtin): Likewise.
> 	* config/rs6000/rs6000-call.c (rs6000_new_builtin_is_supported):
> 	Likewise.
> 	* config/rs6000/rs6000-gen-builtins.c (write_decls): Remove _p from
> 	name of rs6000_new_builtin_is_supported.


ok

> ---
>  gcc/config/rs6000/rs6000-c.c            | 1088 +++++++++++++++++++++++
>  gcc/config/rs6000/rs6000-call.c         |   53 ++
>  gcc/config/rs6000/rs6000-gen-builtins.c |    2 +-
>  3 files changed, 1142 insertions(+), 1 deletion(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
> index afcb5bb6e39..aafb4e6a98f 100644
> --- a/gcc/config/rs6000/rs6000-c.c
> +++ b/gcc/config/rs6000/rs6000-c.c
> @@ -35,6 +35,9 @@
>  #include "langhooks.h"
>  #include "c/c-tree.h"
> 
> +#include "rs6000-builtins.h"
> +
> +static tree altivec_resolve_new_overloaded_builtin (location_t, tree, void *);
> 
> 
>  /* Handle the machine specific pragma longcall.  Its syntax is
> @@ -811,6 +814,30 @@ is_float128_p (tree t)
>  	      && t == long_double_type_node));
>  }
> 
> +static bool
> +rs6000_new_builtin_type_compatible (tree t, tree u)
> +{
> +  if (t == error_mark_node)
> +    return false;
> +
> +  if (INTEGRAL_TYPE_P (t) && INTEGRAL_TYPE_P (u))
> +    return true;
> +
> +  if (TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128
> +      && is_float128_p (t) && is_float128_p (u))
> +    return true;
> +
> +  if (POINTER_TYPE_P (t) && POINTER_TYPE_P (u))
> +    {
> +      t = TREE_TYPE (t);
> +      u = TREE_TYPE (u);
> +      if (TYPE_READONLY (u))
> +	t = build_qualified_type (t, TYPE_QUAL_CONST);
> +    }
> +
> +  return lang_hooks.types_compatible_p (t, u);
> +}
> +

ok

>  static inline bool
>  rs6000_builtin_type_compatible (tree t, int id)
>  {
> @@ -927,6 +954,10 @@ tree
>  altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,
>  				    void *passed_arglist)
>  {
> +  if (new_builtins_are_live)
> +    return altivec_resolve_new_overloaded_builtin (loc, fndecl,
> +						   passed_arglist);
> +
>    vec<tree, va_gc> *arglist = static_cast<vec<tree, va_gc> *> (passed_arglist);
>    unsigned int nargs = vec_safe_length (arglist);
>    enum rs6000_builtins fcode

ok

> @@ -1930,3 +1961,1060 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,
>      return error_mark_node;
>    }
>  }
> +
> +/* Build a tree for a function call to an Altivec non-overloaded builtin.
> +   The overloaded builtin that matched the types and args is described
> +   by DESC.  The N arguments are given in ARGS, respectively.
> +
> +   Actually the only thing it does is calling fold_convert on ARGS, with
> +   a small exception for vec_{all,any}_{ge,le} predicates. */
> +
> +static tree
> +altivec_build_new_resolved_builtin (tree *args, int n, tree fntype,
> +				    tree ret_type,
> +				    rs6000_gen_builtins bif_id,
> +				    rs6000_gen_builtins ovld_id)
> +{
> +  tree argtypes = TYPE_ARG_TYPES (fntype);
> +  tree arg_type[MAX_OVLD_ARGS];
> +  tree fndecl = rs6000_builtin_decls_x[bif_id];
> +  tree call;
> +
> +  for (int i = 0; i < n; i++)
> +    arg_type[i] = TREE_VALUE (argtypes), argtypes = TREE_CHAIN (argtypes);
> +
> +  /* The AltiVec overloading implementation is overall gross, but this
> +     is particularly disgusting.  The vec_{all,any}_{ge,le} builtins
> +     are completely different for floating-point vs. integer vector
> +     types, because the former has vcmpgefp, but the latter should use
> +     vcmpgtXX.
> +
> +     In practice, the second and third arguments are swapped, and the
> +     condition (LT vs. EQ, which is recognizable by bit 1 of the first
> +     argument) is reversed.  Patch the arguments here before building
> +     the resolved CALL_EXPR.  */
> +  if (n == 3
> +      && ovld_id == RS6000_OVLD_VEC_CMPGE_P
> +      && bif_id != RS6000_BIF_VCMPGEFP_P
> +      && bif_id != RS6000_BIF_XVCMPGEDP_P)
> +    {
> +      std::swap (args[1], args[2]);
> +      std::swap (arg_type[1], arg_type[2]);
> +
> +      args[0] = fold_build2 (BIT_XOR_EXPR, TREE_TYPE (args[0]), args[0],
> +			     build_int_cst (NULL_TREE, 2));
> +    }
> +
> +  /* If the number of arguments to an overloaded function increases,
> +     we must expand this switch.  */
> +  gcc_assert (MAX_OVLD_ARGS <= 4);

Ok.   


> +
> +  switch (n)
> +    {
> +    case 0:
> +      call = build_call_expr (fndecl, 0);
> +      break;
> +    case 1:
> +      call = build_call_expr (fndecl, 1,
> +			      fully_fold_convert (arg_type[0], args[0]));
> +      break;
> +    case 2:
> +      call = build_call_expr (fndecl, 2,
> +			      fully_fold_convert (arg_type[0], args[0]),
> +			      fully_fold_convert (arg_type[1], args[1]));
> +      break;
> +    case 3:
> +      call = build_call_expr (fndecl, 3,
> +			      fully_fold_convert (arg_type[0], args[0]),
> +			      fully_fold_convert (arg_type[1], args[1]),
> +			      fully_fold_convert (arg_type[2], args[2]));
> +      break;
> +    case 4:
> +      call = build_call_expr (fndecl, 4,
> +			      fully_fold_convert (arg_type[0], args[0]),
> +			      fully_fold_convert (arg_type[1], args[1]),
> +			      fully_fold_convert (arg_type[2], args[2]),
> +			      fully_fold_convert (arg_type[3], args[3]));
> +      break;
> +    default:
> +      gcc_unreachable ();
> +    }
> +  return fold_convert (ret_type, call);
> +}
> +
> +/* Implementation of the resolve_overloaded_builtin target hook, to
> +   support Altivec's overloaded builtins.  */
> +
> +static tree
> +altivec_resolve_new_overloaded_builtin (location_t loc, tree fndecl,
> +					void *passed_arglist)
> +{
> +  vec<tree, va_gc> *arglist = static_cast<vec<tree, va_gc> *> (passed_arglist);
> +  unsigned int nargs = vec_safe_length (arglist);
> +  enum rs6000_gen_builtins fcode
> +    = (enum rs6000_gen_builtins) DECL_MD_FUNCTION_CODE (fndecl);
> +  tree fnargs = TYPE_ARG_TYPES (TREE_TYPE (fndecl));
> +  tree types[MAX_OVLD_ARGS], args[MAX_OVLD_ARGS];
> +  unsigned int n;
> +
> +  /* Return immediately if this isn't an overload.  */
> +  if (fcode <= RS6000_OVLD_NONE)
> +    return NULL_TREE;
> +
> +  unsigned int adj_fcode = fcode - RS6000_OVLD_NONE;
> +
> +  if (TARGET_DEBUG_BUILTIN)
> +    fprintf (stderr, "altivec_resolve_overloaded_builtin, code = %4d, %s\n",
> +	     (int) fcode, IDENTIFIER_POINTER (DECL_NAME (fndecl)));
> +
> +  /* vec_lvsl and vec_lvsr are deprecated for use with LE element order.  */
> +  if (fcode == RS6000_OVLD_VEC_LVSL && !BYTES_BIG_ENDIAN)
> +    warning (OPT_Wdeprecated,
> +	     "%<vec_lvsl%> is deprecated for little endian; use "
> +	     "assignment for unaligned loads and stores");
> +  else if (fcode == RS6000_OVLD_VEC_LVSR && !BYTES_BIG_ENDIAN)
> +    warning (OPT_Wdeprecated,
> +	     "%<vec_lvsr%> is deprecated for little endian; use "
> +	     "assignment for unaligned loads and stores");
> +
> +  if (fcode == RS6000_OVLD_VEC_MUL)
> +    {
> +      /* vec_mul needs to be special cased because there are no instructions
> +	 for it for the {un}signed char, {un}signed short, and {un}signed int
> +	 types.  */
> +      if (nargs != 2)
> +	{
> +	  error ("builtin %qs only accepts 2 arguments", "vec_mul");
> +	  return error_mark_node;
> +	}
> +
> +      tree arg0 = (*arglist)[0];
> +      tree arg0_type = TREE_TYPE (arg0);
> +      tree arg1 = (*arglist)[1];
> +      tree arg1_type = TREE_TYPE (arg1);
> +
> +      /* Both arguments must be vectors and the types must be compatible.  */
> +      if (TREE_CODE (arg0_type) != VECTOR_TYPE)
> +	goto bad;
> +      if (!lang_hooks.types_compatible_p (arg0_type, arg1_type))
> +	goto bad;
> +
> +      switch (TYPE_MODE (TREE_TYPE (arg0_type)))
> +	{
> +	  case E_QImode:
> +	  case E_HImode:
> +	  case E_SImode:
> +	  case E_DImode:
> +	  case E_TImode:
> +	    {
> +	      /* For scalar types just use a multiply expression.  */
> +	      return fold_build2_loc (loc, MULT_EXPR, TREE_TYPE (arg0), arg0,
> +				      fold_convert (TREE_TYPE (arg0), arg1));
> +	    }
> +	  case E_SFmode:
> +	    {
> +	      /* For floats use the xvmulsp instruction directly.  */
> +	      tree call = rs6000_builtin_decls_x[RS6000_BIF_XVMULSP];
> +	      return build_call_expr (call, 2, arg0, arg1);
> +	    }
> +	  case E_DFmode:
> +	    {
> +	      /* For doubles use the xvmuldp instruction directly.  */
> +	      tree call = rs6000_builtin_decls_x[RS6000_BIF_XVMULDP];
> +	      return build_call_expr (call, 2, arg0, arg1);
> +	    }
> +	  /* Other types are errors.  */
> +	  default:
> +	    goto bad;
> +	}
> +    }
> +
> +  if (fcode == RS6000_OVLD_VEC_CMPNE)
> +    {
> +      /* vec_cmpne needs to be special cased because there are no instructions
> +	 for it (prior to power 9).  */
> +      if (nargs != 2)
> +	{
> +	  error ("builtin %qs only accepts 2 arguments", "vec_cmpne");
> +	  return error_mark_node;
> +	}
> +
> +      tree arg0 = (*arglist)[0];
> +      tree arg0_type = TREE_TYPE (arg0);
> +      tree arg1 = (*arglist)[1];
> +      tree arg1_type = TREE_TYPE (arg1);
> +
> +      /* Both arguments must be vectors and the types must be compatible.  */
> +      if (TREE_CODE (arg0_type) != VECTOR_TYPE)
> +	goto bad;
> +      if (!lang_hooks.types_compatible_p (arg0_type, arg1_type))
> +	goto bad;
> +
> +      /* Power9 instructions provide the most efficient implementation of
> +	 ALTIVEC_BUILTIN_VEC_CMPNE if the mode is not DImode or TImode
> +	 or SFmode or DFmode.  */
> +      if (!TARGET_P9_VECTOR
> +	  || (TYPE_MODE (TREE_TYPE (arg0_type)) == DImode)
> +	  || (TYPE_MODE (TREE_TYPE (arg0_type)) == TImode)
> +	  || (TYPE_MODE (TREE_TYPE (arg0_type)) == SFmode)
> +	  || (TYPE_MODE (TREE_TYPE (arg0_type)) == DFmode))
> +	{
> +	  switch (TYPE_MODE (TREE_TYPE (arg0_type)))
> +	    {
> +	      /* vec_cmpneq (va, vb) == vec_nor (vec_cmpeq (va, vb),
> +		 vec_cmpeq (va, vb)).  */
> +	      /* Note:  vec_nand also works but opt changes vec_nand's
> +		 to vec_nor's anyway.  */
> +	    case E_QImode:
> +	    case E_HImode:
> +	    case E_SImode:
> +	    case E_DImode:
> +	    case E_TImode:
> +	    case E_SFmode:
> +	    case E_DFmode:
> +	      {
> +		/* call = vec_cmpeq (va, vb)
> +		   result = vec_nor (call, call).  */
> +		vec<tree, va_gc> *params = make_tree_vector ();
> +		vec_safe_push (params, arg0);
> +		vec_safe_push (params, arg1);
> +		tree call = altivec_resolve_new_overloaded_builtin
> +		  (loc, rs6000_builtin_decls_x[RS6000_OVLD_VEC_CMPEQ],
> +		   params);
> +		/* Use save_expr to ensure that operands used more than once
> +		   that may have side effects (like calls) are only evaluated
> +		   once.  */
> +		call = save_expr (call);
> +		params = make_tree_vector ();
> +		vec_safe_push (params, call);
> +		vec_safe_push (params, call);
> +		return altivec_resolve_new_overloaded_builtin
> +		  (loc, rs6000_builtin_decls_x[RS6000_OVLD_VEC_NOR], params);
> +	      }
> +	      /* Other types are errors.  */
> +	    default:
> +	      goto bad;
> +	    }
> +	}
> +      /* else, fall through and process the Power9 alternative below */
> +    }
> +
> +  if (fcode == RS6000_OVLD_VEC_ADDE || fcode == RS6000_OVLD_VEC_SUBE)
> +    {
> +      /* vec_adde needs to be special cased because there is no instruction
> +	  for the {un}signed int version.  */
> +      if (nargs != 3)
> +	{
> +	  const char *name;
> +	  name = fcode == RS6000_OVLD_VEC_ADDE ? "vec_adde" : "vec_sube";
> +	  error ("builtin %qs only accepts 3 arguments", name);
> +	  return error_mark_node;
> +	}
> +
> +      tree arg0 = (*arglist)[0];
> +      tree arg0_type = TREE_TYPE (arg0);
> +      tree arg1 = (*arglist)[1];
> +      tree arg1_type = TREE_TYPE (arg1);
> +      tree arg2 = (*arglist)[2];
> +      tree arg2_type = TREE_TYPE (arg2);
> +
> +      /* All 3 arguments must be vectors of (signed or unsigned) (int or
> +	 __int128) and the types must be compatible.  */
> +      if (TREE_CODE (arg0_type) != VECTOR_TYPE)
> +	goto bad;
> +      if (!lang_hooks.types_compatible_p (arg0_type, arg1_type)
> +	  || !lang_hooks.types_compatible_p (arg1_type, arg2_type))
> +	goto bad;
> +
> +      switch (TYPE_MODE (TREE_TYPE (arg0_type)))
> +	{
> +	  /* For {un}signed ints,
> +	     vec_adde (va, vb, carryv) == vec_add (vec_add (va, vb),
> +						   vec_and (carryv, 1)).
> +	     vec_sube (va, vb, carryv) == vec_sub (vec_sub (va, vb),
> +						   vec_and (carryv, 1)).  */

Also commented out in the original code.   Since it's dead code, maybe
worth enhancing the comment to clarify why this is disabled?  

> +	  case E_SImode:
> +	    {
> +	      tree add_sub_builtin;
> +
> +	      vec<tree, va_gc> *params = make_tree_vector ();
> +	      vec_safe_push (params, arg0);
> +	      vec_safe_push (params, arg1);
> +
> +	      if (fcode == RS6000_OVLD_VEC_ADDE)
> +		add_sub_builtin = rs6000_builtin_decls_x[RS6000_OVLD_VEC_ADD];
> +	      else
> +		add_sub_builtin = rs6000_builtin_decls_x[RS6000_OVLD_VEC_SUB];
> +
> +	      tree call
> +		= altivec_resolve_new_overloaded_builtin (loc,
> +							  add_sub_builtin,
> +							  params);
> +	      tree const1 = build_int_cstu (TREE_TYPE (arg0_type), 1);
> +	      tree ones_vector = build_vector_from_val (arg0_type, const1);
> +	      tree and_expr = fold_build2_loc (loc, BIT_AND_EXPR, arg0_type,
> +					       arg2, ones_vector);
> +	      params = make_tree_vector ();
> +	      vec_safe_push (params, call);
> +	      vec_safe_push (params, and_expr);
> +	      return altivec_resolve_new_overloaded_builtin (loc,
> +							     add_sub_builtin,
> +							     params);
> +	    }
> +	  /* For {un}signed __int128s use the vaddeuqm/vsubeuqm instruction
> +	     directly.  */
> +	  case E_TImode:
> +	    break;
> +
> +	  /* Types other than {un}signed int and {un}signed __int128
> +		are errors.  */
> +	  default:
> +	    goto bad;
> +	}
> +    }
> +
> +  if (fcode == RS6000_OVLD_VEC_ADDEC || fcode == RS6000_OVLD_VEC_SUBEC)
> +    {
> +      /* vec_addec and vec_subec needs to be special cased because there is
> +	 no instruction for the {un}signed int version.  */
> +      if (nargs != 3)
> +	{
> +	  const char *name;
> +	  name = fcode == RS6000_OVLD_VEC_ADDEC ? "vec_addec" : "vec_subec";
> +	  error ("builtin %qs only accepts 3 arguments", name);
> +	  return error_mark_node;
> +	}
> +
> +      tree arg0 = (*arglist)[0];
> +      tree arg0_type = TREE_TYPE (arg0);
> +      tree arg1 = (*arglist)[1];
> +      tree arg1_type = TREE_TYPE (arg1);
> +      tree arg2 = (*arglist)[2];
> +      tree arg2_type = TREE_TYPE (arg2);
> +
> +      /* All 3 arguments must be vectors of (signed or unsigned) (int or
> +	 __int128) and the types must be compatible.  */
> +      if (TREE_CODE (arg0_type) != VECTOR_TYPE)
> +	goto bad;
> +      if (!lang_hooks.types_compatible_p (arg0_type, arg1_type)
> +	  || !lang_hooks.types_compatible_p (arg1_type, arg2_type))
> +	goto bad;
> +
> +      switch (TYPE_MODE (TREE_TYPE (arg0_type)))
> +	{
> +	  /* For {un}signed ints,
> +	      vec_addec (va, vb, carryv) ==
> +				vec_or (vec_addc (va, vb),
> +					vec_addc (vec_add (va, vb),
> +						  vec_and (carryv, 0x1))).  */

similar here.

> +	  case E_SImode:
> +	    {
> +	    /* Use save_expr to ensure that operands used more than once
> +		that may have side effects (like calls) are only evaluated
> +		once.  */
> +	    tree as_builtin;
> +	    tree as_c_builtin;
> +
> +	    arg0 = save_expr (arg0);
> +	    arg1 = save_expr (arg1);
> +	    vec<tree, va_gc> *params = make_tree_vector ();
> +	    vec_safe_push (params, arg0);
> +	    vec_safe_push (params, arg1);
> +
> +	    if (fcode == RS6000_OVLD_VEC_ADDEC)
> +	      as_c_builtin = rs6000_builtin_decls_x[RS6000_OVLD_VEC_ADDC];
> +	    else
> +	      as_c_builtin = rs6000_builtin_decls_x[RS6000_OVLD_VEC_SUBC];
> +
> +	    tree call1 = altivec_resolve_new_overloaded_builtin (loc,
> +								 as_c_builtin,
> +								 params);
> +	    params = make_tree_vector ();
> +	    vec_safe_push (params, arg0);
> +	    vec_safe_push (params, arg1);
> +
> +

extra blank line?


> +	    if (fcode == RS6000_OVLD_VEC_ADDEC)
> +	      as_builtin = rs6000_builtin_decls_x[RS6000_OVLD_VEC_ADD];
> +	    else
> +	      as_builtin = rs6000_builtin_decls_x[RS6000_OVLD_VEC_SUB];
> +
> +	    tree call2 = altivec_resolve_new_overloaded_builtin (loc,
> +								 as_builtin,
> +								 params);
> +	    tree const1 = build_int_cstu (TREE_TYPE (arg0_type), 1);
> +	    tree ones_vector = build_vector_from_val (arg0_type, const1);
> +	    tree and_expr = fold_build2_loc (loc, BIT_AND_EXPR, arg0_type,
> +					     arg2, ones_vector);
> +	    params = make_tree_vector ();
> +	    vec_safe_push (params, call2);
> +	    vec_safe_push (params, and_expr);
> +	    call2 = altivec_resolve_new_overloaded_builtin (loc, as_c_builtin,
> +							    params);
> +	    params = make_tree_vector ();
> +	    vec_safe_push (params, call1);
> +	    vec_safe_push (params, call2);
> +	    tree or_builtin = rs6000_builtin_decls_x[RS6000_OVLD_VEC_OR];
> +	    return altivec_resolve_new_overloaded_builtin (loc, or_builtin,
> +							   params);
> +	    }
> +	  /* For {un}signed __int128s use the vaddecuq/vsubbecuq
> +	     instructions.  This occurs through normal processing.  */
> +	  case E_TImode:
> +	    break;
> +
> +	  /* Types other than {un}signed int and {un}signed __int128
> +		are errors.  */
> +	  default:
> +	    goto bad;
> +	}
> +    }

ok

> +
> +  /* For now treat vec_splats and vec_promote as the same.  */
> +  if (fcode == RS6000_OVLD_VEC_SPLATS || fcode == RS6000_OVLD_VEC_PROMOTE)
> +    {
> +      tree type, arg;
> +      int size;
> +      int i;
> +      bool unsigned_p;
> +      vec<constructor_elt, va_gc> *vec;
> +      const char *name;
> +      name = fcode == RS6000_OVLD_VEC_SPLATS ? "vec_splats" : "vec_promote";
> +
> +      if (fcode == RS6000_OVLD_VEC_SPLATS && nargs != 1)
> +	{
> +	  error ("builtin %qs only accepts 1 argument", name);
> +	  return error_mark_node;
> +	}
> +      if (fcode == RS6000_OVLD_VEC_PROMOTE && nargs != 2)
> +	{
> +	  error ("builtin %qs only accepts 2 arguments", name);
> +	  return error_mark_node;
> +	}
> +      /* Ignore promote's element argument.  */
> +      if (fcode == RS6000_OVLD_VEC_PROMOTE
> +	  && !INTEGRAL_TYPE_P (TREE_TYPE ((*arglist)[1])))
> +	goto bad;
> +
> +      arg = (*arglist)[0];
> +      type = TREE_TYPE (arg);
> +      if (!SCALAR_FLOAT_TYPE_P (type)
> +	  && !INTEGRAL_TYPE_P (type))
> +	goto bad;
> +      unsigned_p = TYPE_UNSIGNED (type);
> +      switch (TYPE_MODE (type))
> +	{
> +	  case E_TImode:
> +	    type = (unsigned_p ? unsigned_V1TI_type_node : V1TI_type_node);
> +	    size = 1;
> +	    break;
> +	  case E_DImode:
> +	    type = (unsigned_p ? unsigned_V2DI_type_node : V2DI_type_node);
> +	    size = 2;
> +	    break;
> +	  case E_SImode:
> +	    type = (unsigned_p ? unsigned_V4SI_type_node : V4SI_type_node);
> +	    size = 4;
> +	    break;
> +	  case E_HImode:
> +	    type = (unsigned_p ? unsigned_V8HI_type_node : V8HI_type_node);
> +	    size = 8;
> +	    break;
> +	  case E_QImode:
> +	    type = (unsigned_p ? unsigned_V16QI_type_node : V16QI_type_node);
> +	    size = 16;
> +	    break;
> +	  case E_SFmode:
> +	    type = V4SF_type_node;
> +	    size = 4;
> +	    break;
> +	  case E_DFmode:
> +	    type = V2DF_type_node;
> +	    size = 2;
> +	    break;
> +	  default:
> +	    goto bad;
> +	}
> +      arg = save_expr (fold_convert (TREE_TYPE (type), arg));
> +      vec_alloc (vec, size);
> +      for (i = 0; i < size; i++)
> +	{
> +	  constructor_elt elt = {NULL_TREE, arg};
> +	  vec->quick_push (elt);
> +	}
> +      return build_constructor (type, vec);
> +    }
> +
> +  /* For now use pointer tricks to do the extraction, unless we are on VSX
> +     extracting a double from a constant offset.  */
> +  if (fcode == RS6000_OVLD_VEC_EXTRACT)
> +    {
> +      tree arg1;
> +      tree arg1_type;
> +      tree arg2;
> +      tree arg1_inner_type;
> +      tree decl, stmt;
> +      tree innerptrtype;
> +      machine_mode mode;
> +
> +      /* No second argument. */
> +      if (nargs != 2)
> +	{
> +	  error ("builtin %qs only accepts 2 arguments", "vec_extract");
> +	  return error_mark_node;
> +	}
> +
> +      arg2 = (*arglist)[1];
> +      arg1 = (*arglist)[0];
> +      arg1_type = TREE_TYPE (arg1);
> +
> +      if (TREE_CODE (arg1_type) != VECTOR_TYPE)
> +	goto bad;
> +      if (!INTEGRAL_TYPE_P (TREE_TYPE (arg2)))
> +	goto bad;
> +
> +      /* See if we can optimize vec_extracts with the current VSX instruction
> +	 set.  */
> +      mode = TYPE_MODE (arg1_type);
> +      if (VECTOR_MEM_VSX_P (mode))
> +
> +	{
> +	  tree call = NULL_TREE;
> +	  int nunits = GET_MODE_NUNITS (mode);
> +
> +	  arg2 = fold_for_warn (arg2);
> +
> +	  /* If the second argument is an integer constant, generate
> +	     the built-in code if we can.  We need 64-bit and direct
> +	     move to extract the small integer vectors.  */
> +	  if (TREE_CODE (arg2) == INTEGER_CST)
> +	    {
> +	      wide_int selector = wi::to_wide (arg2);
> +	      selector = wi::umod_trunc (selector, nunits);
> +	      arg2 = wide_int_to_tree (TREE_TYPE (arg2), selector);
> +	      switch (mode)
> +		{
> +		default:
> +		  break;
> +
> +		case E_V1TImode:
> +		  call = rs6000_builtin_decls_x[RS6000_BIF_VEC_EXT_V1TI];
> +		  break;
> +
> +		case E_V2DFmode:
> +		  call = rs6000_builtin_decls_x[RS6000_BIF_VEC_EXT_V2DF];
> +		  break;
> +
> +		case E_V2DImode:
> +		  call = rs6000_builtin_decls_x[RS6000_BIF_VEC_EXT_V2DI];
> +		  break;
> +
> +		case E_V4SFmode:
> +		  call = rs6000_builtin_decls_x[RS6000_BIF_VEC_EXT_V4SF];
> +		  break;
> +
> +		case E_V4SImode:
> +		  if (TARGET_DIRECT_MOVE_64BIT)
> +		    call = rs6000_builtin_decls_x[RS6000_BIF_VEC_EXT_V4SI];
> +		  break;
> +
> +		case E_V8HImode:
> +		  if (TARGET_DIRECT_MOVE_64BIT)
> +		    call = rs6000_builtin_decls_x[RS6000_BIF_VEC_EXT_V8HI];
> +		  break;
> +
> +		case E_V16QImode:
> +		  if (TARGET_DIRECT_MOVE_64BIT)
> +		    call = rs6000_builtin_decls_x[RS6000_BIF_VEC_EXT_V16QI];
> +		  break;
> +		}
> +	    }
> +
> +	  /* If the second argument is variable, we can optimize it if we are
> +	     generating 64-bit code on a machine with direct move.  */
> +	  else if (TREE_CODE (arg2) != INTEGER_CST && TARGET_DIRECT_MOVE_64BIT)
> +	    {
> +	      switch (mode)
> +		{
> +		default:
> +		  break;
> +
> +		case E_V2DFmode:
> +		  call = rs6000_builtin_decls_x[RS6000_BIF_VEC_EXT_V2DF];
> +		  break;
> +
> +		case E_V2DImode:
> +		  call = rs6000_builtin_decls_x[RS6000_BIF_VEC_EXT_V2DI];
> +		  break;
> +
> +		case E_V4SFmode:
> +		  call = rs6000_builtin_decls_x[RS6000_BIF_VEC_EXT_V4SF];
> +		  break;
> +
> +		case E_V4SImode:
> +		  call = rs6000_builtin_decls_x[RS6000_BIF_VEC_EXT_V4SI];
> +		  break;
> +
> +		case E_V8HImode:
> +		  call = rs6000_builtin_decls_x[RS6000_BIF_VEC_EXT_V8HI];
> +		  break;
> +
> +		case E_V16QImode:
> +		  call = rs6000_builtin_decls_x[RS6000_BIF_VEC_EXT_V16QI];
> +		  break;
> +		}
> +	    }
> +
> +	  if (call)
> +	    {
> +	      tree result = build_call_expr (call, 2, arg1, arg2);
> +	      /* Coerce the result to vector element type.  May be no-op.  */
> +	      arg1_inner_type = TREE_TYPE (arg1_type);
> +	      result = fold_convert (arg1_inner_type, result);
> +	      return result;
> +	    }
> +	}
> +
> +      /* Build *(((arg1_inner_type*)&(vector type){arg1})+arg2). */
> +      arg1_inner_type = TREE_TYPE (arg1_type);
> +      arg2 = build_binary_op (loc, BIT_AND_EXPR, arg2,
> +			      build_int_cst (TREE_TYPE (arg2),
> +					     TYPE_VECTOR_SUBPARTS (arg1_type)
> +					     - 1), 0);
> +      decl = build_decl (loc, VAR_DECL, NULL_TREE, arg1_type);
> +      DECL_EXTERNAL (decl) = 0;
> +      TREE_PUBLIC (decl) = 0;
> +      DECL_CONTEXT (decl) = current_function_decl;
> +      TREE_USED (decl) = 1;
> +      TREE_TYPE (decl) = arg1_type;
> +      TREE_READONLY (decl) = TYPE_READONLY (arg1_type);
> +      if (c_dialect_cxx ())
> +	{
> +	  stmt = build4 (TARGET_EXPR, arg1_type, decl, arg1,
> +			 NULL_TREE, NULL_TREE);
> +	  SET_EXPR_LOCATION (stmt, loc);
> +	}
> +      else
> +	{
> +	  DECL_INITIAL (decl) = arg1;
> +	  stmt = build1 (DECL_EXPR, arg1_type, decl);
> +	  TREE_ADDRESSABLE (decl) = 1;
> +	  SET_EXPR_LOCATION (stmt, loc);
> +	  stmt = build1 (COMPOUND_LITERAL_EXPR, arg1_type, stmt);
> +	}
> +
> +      innerptrtype = build_pointer_type (arg1_inner_type);
> +
> +      stmt = build_unary_op (loc, ADDR_EXPR, stmt, 0);
> +      stmt = convert (innerptrtype, stmt);
> +      stmt = build_binary_op (loc, PLUS_EXPR, stmt, arg2, 1);
> +      stmt = build_indirect_ref (loc, stmt, RO_NULL);
> +
> +      /* PR83660: We mark this as having side effects so that
> +	 downstream in fold_build_cleanup_point_expr () it will get a
> +	 CLEANUP_POINT_EXPR.  If it does not we can run into an ICE
> +	 later in gimplify_cleanup_point_expr ().  Potentially this
> +	 causes missed optimization because there actually is no side
> +	 effect.  */
> +      if (c_dialect_cxx ())
> +	TREE_SIDE_EFFECTS (stmt) = 1;
> +
> +      return stmt;
> +    }

ok

> +
> +  /* For now use pointer tricks to do the insertion, unless we are on VSX
> +     inserting a double to a constant offset..  */

Too many ending periods. :-)   (also in original)

> +  if (fcode == RS6000_OVLD_VEC_INSERT)
> +    {
> +      tree arg0;
> +      tree arg1;
> +      tree arg2;
> +      tree arg1_type;
> +      tree decl, stmt;
> +      machine_mode mode;
> +
> +      /* No second or third arguments. */
> +      if (nargs != 3)
> +	{
> +	  error ("builtin %qs only accepts 3 arguments", "vec_insert");
> +	  return error_mark_node;
> +	}
> +
> +      arg0 = (*arglist)[0];
> +      arg1 = (*arglist)[1];
> +      arg1_type = TREE_TYPE (arg1);
> +      arg2 = fold_for_warn ((*arglist)[2]);
> +
> +      if (TREE_CODE (arg1_type) != VECTOR_TYPE)
> +	goto bad;
> +      if (!INTEGRAL_TYPE_P (TREE_TYPE (arg2)))
> +	goto bad;
> +
> +      /* If we can use the VSX xxpermdi instruction, use that for insert.  */
> +      mode = TYPE_MODE (arg1_type);
> +      if ((mode == V2DFmode || mode == V2DImode) && VECTOR_UNIT_VSX_P (mode)
> +	  && TREE_CODE (arg2) == INTEGER_CST)
> +	{
> +	  wide_int selector = wi::to_wide (arg2);
> +	  selector = wi::umod_trunc (selector, 2);
> +	  tree call = NULL_TREE;
> +
> +	  arg2 = wide_int_to_tree (TREE_TYPE (arg2), selector);
> +	  if (mode == V2DFmode)
> +	    call = rs6000_builtin_decls_x[RS6000_BIF_VEC_SET_V2DF];
> +	  else if (mode == V2DImode)
> +	    call = rs6000_builtin_decls_x[RS6000_BIF_VEC_SET_V2DI];
> +
> +	  /* Note, __builtin_vec_insert_<xxx> has vector and scalar types
> +	     reversed.  */
> +	  if (call)
> +	    return build_call_expr (call, 3, arg1, arg0, arg2);
> +	}
> +      else if (mode == V1TImode && VECTOR_UNIT_VSX_P (mode)
> +	       && TREE_CODE (arg2) == INTEGER_CST)
> +	{
> +	  tree call = rs6000_builtin_decls_x[RS6000_BIF_VEC_SET_V1TI];
> +	  wide_int selector = wi::zero(32);
> +
> +	  arg2 = wide_int_to_tree (TREE_TYPE (arg2), selector);
> +	  /* Note, __builtin_vec_insert_<xxx> has vector and scalar types
> +	     reversed.  */
> +	  return build_call_expr (call, 3, arg1, arg0, arg2);
> +	}
> +
> +      /* Build *(((arg1_inner_type*)&(vector type){arg1})+arg2) = arg0 with
> +	 VIEW_CONVERT_EXPR.  i.e.:
> +	 D.3192 = v1;
> +	 _1 = n & 3;
> +	 VIEW_CONVERT_EXPR<int[4]>(D.3192)[_1] = i;
> +	 v1 = D.3192;
> +	 D.3194 = v1;  */
> +      if (TYPE_VECTOR_SUBPARTS (arg1_type) == 1)
> +	arg2 = build_int_cst (TREE_TYPE (arg2), 0);
> +      else
> +	arg2 = build_binary_op (loc, BIT_AND_EXPR, arg2,
> +				build_int_cst (TREE_TYPE (arg2),
> +					       TYPE_VECTOR_SUBPARTS (arg1_type)
> +					       - 1), 0);
> +      decl = build_decl (loc, VAR_DECL, NULL_TREE, arg1_type);
> +      DECL_EXTERNAL (decl) = 0;
> +      TREE_PUBLIC (decl) = 0;
> +      DECL_CONTEXT (decl) = current_function_decl;
> +      TREE_USED (decl) = 1;
> +      TREE_TYPE (decl) = arg1_type;
> +      TREE_READONLY (decl) = TYPE_READONLY (arg1_type);
> +      TREE_ADDRESSABLE (decl) = 1;
> +      if (c_dialect_cxx ())
> +	{
> +	  stmt = build4 (TARGET_EXPR, arg1_type, decl, arg1,
> +			 NULL_TREE, NULL_TREE);
> +	  SET_EXPR_LOCATION (stmt, loc);
> +	}
> +      else
> +	{
> +	  DECL_INITIAL (decl) = arg1;
> +	  stmt = build1 (DECL_EXPR, arg1_type, decl);
> +	  SET_EXPR_LOCATION (stmt, loc);
> +	  stmt = build1 (COMPOUND_LITERAL_EXPR, arg1_type, stmt);
> +	}
> +
> +      if (TARGET_VSX)
> +	{
> +	  stmt = build_array_ref (loc, stmt, arg2);
> +	  stmt = fold_build2 (MODIFY_EXPR, TREE_TYPE (arg0), stmt,
> +			      convert (TREE_TYPE (stmt), arg0));
> +	  stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
> +	}
> +      else
> +	{
> +	  tree arg1_inner_type;
> +	  tree innerptrtype;
> +	  arg1_inner_type = TREE_TYPE (arg1_type);
> +	  innerptrtype = build_pointer_type (arg1_inner_type);
> +
> +	  stmt = build_unary_op (loc, ADDR_EXPR, stmt, 0);
> +	  stmt = convert (innerptrtype, stmt);
> +	  stmt = build_binary_op (loc, PLUS_EXPR, stmt, arg2, 1);
> +	  stmt = build_indirect_ref (loc, stmt, RO_NULL);
> +	  stmt = build2 (MODIFY_EXPR, TREE_TYPE (stmt), stmt,
> +			 convert (TREE_TYPE (stmt), arg0));
> +	  stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
> +	}
> +      return stmt;
> +    }
> +
> +  for (n = 0;
> +       !VOID_TYPE_P (TREE_VALUE (fnargs)) && n < nargs;
> +       fnargs = TREE_CHAIN (fnargs), n++)
> +    {
> +      tree decl_type = TREE_VALUE (fnargs);
> +      tree arg = (*arglist)[n];
> +      tree type;
> +
> +      if (arg == error_mark_node)
> +	return error_mark_node;
> +
> +      if (n >= MAX_OVLD_ARGS)
> +	abort ();
> +
> +      arg = default_conversion (arg);
> +
> +      /* The C++ front-end converts float * to const void * using
> +	 NOP_EXPR<const void *> (NOP_EXPR<void *> (x)).  */
> +      type = TREE_TYPE (arg);
> +      if (POINTER_TYPE_P (type)
> +	  && TREE_CODE (arg) == NOP_EXPR
> +	  && lang_hooks.types_compatible_p (TREE_TYPE (arg),
> +					    const_ptr_type_node)
> +	  && lang_hooks.types_compatible_p (TREE_TYPE (TREE_OPERAND (arg, 0)),
> +					    ptr_type_node))
> +	{
> +	  arg = TREE_OPERAND (arg, 0);
> +	  type = TREE_TYPE (arg);
> +	}
> +
> +      /* Remove the const from the pointers to simplify the overload
> +	 matching further down.  */
> +      if (POINTER_TYPE_P (decl_type)
> +	  && POINTER_TYPE_P (type)
> +	  && TYPE_QUALS (TREE_TYPE (type)) != 0)
> +	{
> +	  if (TYPE_READONLY (TREE_TYPE (type))
> +	      && !TYPE_READONLY (TREE_TYPE (decl_type)))
> +	    warning (0, "passing argument %d of %qE discards qualifiers from "
> +		     "pointer target type", n + 1, fndecl);
> +	  type = build_pointer_type (build_qualified_type (TREE_TYPE (type),
> +							   0));
> +	  arg = fold_convert (type, arg);
> +	}
> +
> +      /* For RS6000_OVLD_VEC_LXVL, convert any const * to its non constant
> +	 equivalent to simplify the overload matching below.  */
> +      if (fcode == RS6000_OVLD_VEC_LXVL)
> +	{
> +	  if (POINTER_TYPE_P (type)
> +	      && TYPE_READONLY (TREE_TYPE (type)))
> +	    {
> +	      type = build_pointer_type (build_qualified_type (
> +						TREE_TYPE (type),0));
> +	      arg = fold_convert (type, arg);
> +	    }
> +	}
> +
> +      args[n] = arg;
> +      types[n] = type;
> +    }
> +
> +  /* If the number of arguments did not match the prototype, return NULL
> +     and the generic code will issue the appropriate error message.  */
> +  if (!VOID_TYPE_P (TREE_VALUE (fnargs)) || n < nargs)
> +    return NULL;
> +
> +  if (fcode == RS6000_OVLD_VEC_STEP)
> +    {
> +      if (TREE_CODE (types[0]) != VECTOR_TYPE)
> +	goto bad;
> +
> +      return build_int_cst (NULL_TREE, TYPE_VECTOR_SUBPARTS (types[0]));
> +    }
> +
> +  {
> +    bool unsupported_builtin = false;
> +    enum rs6000_gen_builtins overloaded_code;
> +    bool supported = false;
> +    ovlddata *instance = rs6000_overload_info[adj_fcode].first_instance;
> +    gcc_assert (instance != NULL);
> +
> +    /* Need to special case __builtin_cmpb because the overloaded forms
> +       of this function take (unsigned int, unsigned int) or (unsigned
> +       long long int, unsigned long long int).  Since C conventions
> +       allow the respective argument types to be implicitly coerced into
> +       each other, the default handling does not provide adequate
> +       discrimination between the desired forms of the function.  */
> +    if (fcode == RS6000_OVLD_SCAL_CMPB)
> +      {
> +	machine_mode arg1_mode = TYPE_MODE (types[0]);
> +	machine_mode arg2_mode = TYPE_MODE (types[1]);
> +
> +	if (nargs != 2)
> +	  {
> +	    error ("builtin %qs only accepts 2 arguments", "__builtin_cmpb");
> +	    return error_mark_node;
> +	  }
> +
> +	/* If any supplied arguments are wider than 32 bits, resolve to
> +	   64-bit variant of built-in function.  */
> +	if ((GET_MODE_PRECISION (arg1_mode) > 32)
> +	    || (GET_MODE_PRECISION (arg2_mode) > 32))
> +	  {
> +	    /* Assure all argument and result types are compatible with
> +	       the built-in function represented by RS6000_BIF_CMPB.  */
> +	    overloaded_code = RS6000_BIF_CMPB;
> +	  }
> +	else
> +	  {
> +	    /* Assure all argument and result types are compatible with
> +	       the built-in function represented by RS6000_BIF_CMPB_32.  */
> +	    overloaded_code = RS6000_BIF_CMPB_32;
> +	  }
> +
> +	while (instance && instance->bifid != overloaded_code)
> +	  instance = instance->next;
> +
> +	gcc_assert (instance != NULL);
> +	tree fntype = rs6000_builtin_info_x[instance->bifid].fntype;
> +	tree parmtype0 = TREE_VALUE (TYPE_ARG_TYPES (fntype));
> +	tree parmtype1 = TREE_VALUE (TREE_CHAIN (TYPE_ARG_TYPES (fntype)));
> +
> +	if (rs6000_new_builtin_type_compatible (types[0], parmtype0)
> +	    && rs6000_new_builtin_type_compatible (types[1], parmtype1))
> +	  {
> +	    if (rs6000_builtin_decl (instance->bifid, false) != error_mark_node
> +		&& rs6000_new_builtin_is_supported (instance->bifid))
> +	      {
> +		tree ret_type = TREE_TYPE (instance->fntype);
> +		return altivec_build_new_resolved_builtin (args, n, fntype,
> +							   ret_type,
> +							   instance->bifid,
> +							   fcode);
> +	      }
> +	    else
> +	      unsupported_builtin = true;
> +	  }
> +      }
> +    else if (fcode == RS6000_OVLD_VEC_VSIE)

OK, noting that this is foo_VEC_VSIEDP in the original code. (DP
indicator dropped).


> +      {
> +	machine_mode arg1_mode = TYPE_MODE (types[0]);
> +
> +	if (nargs != 2)
> +	  {
> +	    error ("builtin %qs only accepts 2 arguments",
> +		   "scalar_insert_exp");
> +	    return error_mark_node;
> +	  }
> +
> +	/* If supplied first argument is wider than 64 bits, resolve to
> +	   128-bit variant of built-in function.  */
> +	if (GET_MODE_PRECISION (arg1_mode) > 64)
> +	  {
> +	    /* If first argument is of float variety, choose variant
> +	       that expects __ieee128 argument.  Otherwise, expect
> +	       __int128 argument.  */

Could use some "a" and/or "the" in the comment there.   similar below. 
This matches comment in original code, so nbd. :-)


> +	    if (GET_MODE_CLASS (arg1_mode) == MODE_FLOAT)
> +	      overloaded_code = RS6000_BIF_VSIEQPF;
> +	    else
> +	      overloaded_code = RS6000_BIF_VSIEQP;
> +	  }
> +	else
> +	  {
> +	    /* If first argument is of float variety, choose variant
> +	       that expects double argument.  Otherwise, expect
> +	       long long int argument.  */
> +	    if (GET_MODE_CLASS (arg1_mode) == MODE_FLOAT)
> +	      overloaded_code = RS6000_BIF_VSIEDPF;
> +	    else
> +	      overloaded_code = RS6000_BIF_VSIEDP;
> +	  }
> +
> +	while (instance && instance->bifid != overloaded_code)
> +	  instance = instance->next;
> +
> +	gcc_assert (instance != NULL);
> +	tree fntype = rs6000_builtin_info_x[instance->bifid].fntype;
> +	tree parmtype0 = TREE_VALUE (TYPE_ARG_TYPES (fntype));
> +	tree parmtype1 = TREE_VALUE (TREE_CHAIN (TYPE_ARG_TYPES (fntype)));
> +
> +	if (rs6000_new_builtin_type_compatible (types[0], parmtype0)
> +	    && rs6000_new_builtin_type_compatible (types[1], parmtype1))
> +	  {
> +	    if (rs6000_builtin_decl (instance->bifid, false) != error_mark_node
> +		&& rs6000_new_builtin_is_supported (instance->bifid))
> +	      {
> +		tree ret_type = TREE_TYPE (instance->fntype);
> +		return altivec_build_new_resolved_builtin (args, n, fntype,
> +							   ret_type,
> +							   instance->bifid,
> +							   fcode);
> +	      }
> +	    else
> +	      unsupported_builtin = true;
> +	  }
> +      }
> +    else
> +      {
> +	/* Functions with no arguments can have only one overloaded
> +	   instance.  */
> +	gcc_assert (n > 0 || !instance->next);
> +
> +	for (; instance != NULL; instance = instance->next)
> +	  {
> +	    bool mismatch = false;
> +	    tree nextparm = TYPE_ARG_TYPES (instance->fntype);
> +
> +	    for (unsigned int arg_i = 0;
> +		 arg_i < nargs && nextparm != NULL;
> +		 arg_i++)
> +	      {
> +		tree parmtype = TREE_VALUE (nextparm);
> +		if (!rs6000_new_builtin_type_compatible (types[arg_i],
> +							 parmtype))
> +		  {
> +		    mismatch = true;
> +		    break;
> +		  }
> +		nextparm = TREE_CHAIN (nextparm);
> +	      }
> +
> +	    if (mismatch)
> +	      continue;
> +
> +	    supported = rs6000_new_builtin_is_supported (instance->bifid);
> +	    if (rs6000_builtin_decl (instance->bifid, false) != error_mark_node
> +		&& supported)
> +	      {
> +		tree fntype = rs6000_builtin_info_x[instance->bifid].fntype;
> +		tree ret_type = TREE_TYPE (instance->fntype);
> +		return altivec_build_new_resolved_builtin (args, n, fntype,
> +							   ret_type,
> +							   instance->bifid,
> +							   fcode);
> +	      }
> +	    else
> +	      {
> +		unsupported_builtin = true;
> +		break;
> +	      }
> +	  }
> +      }
> +
> +    if (unsupported_builtin)
> +      {
> +	const char *name = rs6000_overload_info[adj_fcode].ovld_name;
> +	if (!supported)
> +	  {
> +	    const char *internal_name
> +	      = rs6000_builtin_info_x[instance->bifid].bifname;
> +	    /* An error message making reference to the name of the
> +	       non-overloaded function has already been issued.  Add
> +	       clarification of the previous message.  */
> +	    rich_location richloc (line_table, input_location);
> +	    inform (&richloc, "builtin %qs requires builtin %qs",
> +		    name, internal_name);
> +	  }
> +	else
> +	  error ("%qs is not supported in this compiler configuration", name);
> +	/* If an error-representing  result tree was returned from
> +	   altivec_build_resolved_builtin above, use it.  */

Extra space after error-representing.  Also in original code.


> +	/*
> +	return (result != NULL) ? result : error_mark_node;
> +	*/
> +	return error_mark_node;
> +      }
> +  }
> + bad:
> +  {
> +    const char *name = rs6000_overload_info[adj_fcode].ovld_name;
> +    error ("invalid parameter combination for AltiVec intrinsic %qs", name);
> +    return error_mark_node;
> +  }
> +}

ok


> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index e8625d17d18..2c68aa3580c 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -12971,6 +12971,59 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>    return false;
>  }
> 
> +/* Check whether a builtin function is supported in this target
> +   configuration.  */
> +bool
> +rs6000_new_builtin_is_supported (enum rs6000_gen_builtins fncode)
> +{
> +  switch (rs6000_builtin_info_x[(size_t) fncode].enable)
> +    {
> +    default:
> +      gcc_unreachable ();
> +    case ENB_ALWAYS:
> +      return true;
> +    case ENB_P5:
> +      return TARGET_POPCNTB;
> +    case ENB_P6:
> +      return TARGET_CMPB;
> +    case ENB_ALTIVEC:
> +      return TARGET_ALTIVEC;
> +    case ENB_CELL:
> +      return TARGET_ALTIVEC && rs6000_cpu == PROCESSOR_CELL;
> +    case ENB_VSX:
> +      return TARGET_VSX;
> +    case ENB_P7:
> +      return TARGET_POPCNTD;
> +    case ENB_P7_64:
> +      return TARGET_POPCNTD && TARGET_POWERPC64;
> +    case ENB_P8:
> +      return TARGET_DIRECT_MOVE;
> +    case ENB_P8V:
> +      return TARGET_P8_VECTOR;
> +    case ENB_P9:
> +      return TARGET_MODULO;
> +    case ENB_P9_64:
> +      return TARGET_MODULO && TARGET_POWERPC64;
> +    case ENB_P9V:
> +      return TARGET_P9_VECTOR;
> +    case ENB_IEEE128_HW:
> +      return TARGET_FLOAT128_HW;
> +    case ENB_DFP:
> +      return TARGET_DFP;
> +    case ENB_CRYPTO:
> +      return TARGET_CRYPTO;
> +    case ENB_HTM:
> +      return TARGET_HTM;
> +    case ENB_P10:
> +      return TARGET_POWER10;
> +    case ENB_P10_64:
> +      return TARGET_POWER10 && TARGET_POWERPC64;
> +    case ENB_MMA:
> +      return TARGET_MMA;
> +    }
> +  gcc_unreachable ();
> +}

ok

> +
>  /* Expand an expression EXP that calls a built-in function,
>     with result going to TARGET if that's convenient
>     (and in mode MODE if that's convenient).
> diff --git a/gcc/config/rs6000/rs6000-gen-builtins.c b/gcc/config/rs6000/rs6000-gen-builtins.c
> index f3d6156400a..f65932e1cd5 100644
> --- a/gcc/config/rs6000/rs6000-gen-builtins.c
> +++ b/gcc/config/rs6000/rs6000-gen-builtins.c
> @@ -2314,7 +2314,7 @@ write_decls (void)
> 
>    fprintf (header_file, "extern void rs6000_init_generated_builtins ();\n\n");
>    fprintf (header_file,
> -	   "extern bool rs6000_new_builtin_is_supported_p "
> +	   "extern bool rs6000_new_builtin_is_supported "
>  	   "(rs6000_gen_builtins);\n");
>    fprintf (header_file,
>  	   "extern tree rs6000_builtin_decl (unsigned, "


ok.

Thanks
-Will









^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 02/18] rs6000: Move __builtin_mffsl to the [always] stanza
  2021-09-01 16:13 ` [PATCH 02/18] rs6000: Move __builtin_mffsl to the [always] stanza Bill Schmidt
@ 2021-09-13 17:53   ` will schmidt
  2021-09-16 22:52   ` Segher Boessenkool
  1 sibling, 0 replies; 52+ messages in thread
From: will schmidt @ 2021-09-13 17:53 UTC (permalink / raw)
  To: Bill Schmidt, gcc-patches; +Cc: dje.gcc, segher

On Wed, 2021-09-01 at 11:13 -0500, Bill Schmidt via Gcc-patches wrote:
> I over-restricted use of __builtin_mffsl, since I was unaware that it
> automatically uses mffs when mffsl is not available.  Paul Clarke
> pointed
> this out in discussion of his SSE 4.1 compatibility patches.
> 
> 2021-08-31  Bill Schmidt  <wschmidt@linux.ibm.com>
> 
> gcc/
> 	* config/rs6000/rs6000-call.c (__builtin_mffsl): Move from
> [power9]
> 	to [always].
> ---
>  gcc/config/rs6000/rs6000-builtin-new.def | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtin-new.def
> b/gcc/config/rs6000/rs6000-builtin-new.def
> index 6a28d5189f8..a8c6b9e988f 100644
> --- a/gcc/config/rs6000/rs6000-builtin-new.def
> +++ b/gcc/config/rs6000/rs6000-builtin-new.def
> @@ -208,6 +208,12 @@
>    double __builtin_mffs ();
>      MFFS rs6000_mffs {}
> 
> +; Although the mffsl instruction is only available on POWER9 and
> later
> +; processors, this builtin automatically falls back to mffs on older
> +; platforms.  Thus it appears here in the [always] stanza.
> +  double __builtin_mffsl ();
> +    MFFSL rs6000_mffsl {}
> +
>  ; This thing really assumes long double == __ibm128, and I'm told it
> has
>  ; been used as such within libgcc.  Given that __builtin_pack_ibm128
>  ; exists for the same purpose, this should really not be used at
> all.
> @@ -2784,9 +2790,6 @@
>    signed long long __builtin_darn_raw ();
>      DARN_RAW darn_raw {}
> 
> -  double __builtin_mffsl ();
> -    MFFSL rs6000_mffsl {}
> -
>    const signed int __builtin_dtstsfi_eq_dd (const int<6>,
> _Decimal64);
>      TSTSFI_EQ_DD dfptstsfi_eq_dd {}
> 


Looks reasonable,
Thanks
-Will


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/18] rs6000: Handle gimple folding of target built-ins
  2021-09-01 16:13 ` [PATCH 03/18] rs6000: Handle gimple folding of target built-ins Bill Schmidt
@ 2021-09-13 18:42   ` will schmidt
  2021-09-14 22:36     ` Bill Schmidt
  2021-09-16 22:58   ` Segher Boessenkool
  1 sibling, 1 reply; 52+ messages in thread
From: will schmidt @ 2021-09-13 18:42 UTC (permalink / raw)
  To: Bill Schmidt, gcc-patches; +Cc: dje.gcc, segher

On Wed, 2021-09-01 at 11:13 -0500, Bill Schmidt via Gcc-patches wrote:
> This is another patch that looks bigger than it really is.  Because we
> have a new namespace for the builtins, allowing us to have both the old
> and new builtin infrastructure supported at once, we need versions of
> these functions that use the new builtin namespace.  Otherwise the code is
> unchanged.
> 
> 2021-08-31  Bill Schmidt  <wschmidt@linux.ibm.com>
> 
> gcc/
> 	* config/rs6000/rs6000-call.c (rs6000_gimple_fold_new_builtin):
> 	New forward decl.
> 	(rs6000_gimple_fold_builtin): Call rs6000_gimple_fold_new_builtin.
> 	(rs6000_new_builtin_valid_without_lhs): New function.
> 	(rs6000_gimple_fold_new_mma_builtin): Likewise.
> 	(rs6000_gimple_fold_new_builtin): Likewise.
> ---
>  gcc/config/rs6000/rs6000-call.c | 1165 +++++++++++++++++++++++++++++++
>  1 file changed, 1165 insertions(+)
> 
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index 2c68aa3580c..eae4e15df1e 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -190,6 +190,7 @@ static tree builtin_function_type (machine_mode, machine_mode,
>  static void rs6000_common_init_builtins (void);
>  static void htm_init_builtins (void);
>  static void mma_init_builtins (void);
> +static bool rs6000_gimple_fold_new_builtin (gimple_stmt_iterator *gsi);
> 
> 
>  /* Hash table to keep track of the argument types for builtin functions.  */
> @@ -12024,6 +12025,9 @@ rs6000_gimple_fold_mma_builtin (gimple_stmt_iterator *gsi)
>  bool
>  rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>  {
> +  if (new_builtins_are_live)
> +    return rs6000_gimple_fold_new_builtin (gsi);
> +
>    gimple *stmt = gsi_stmt (*gsi);
>    tree fndecl = gimple_call_fndecl (stmt);
>    gcc_checking_assert (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_MD);

ok

> @@ -12971,6 +12975,35 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>    return false;
>  }
> 
> +/*  Helper function to sort out which built-ins may be valid without having
> +    a LHS.  */
> +static bool
> +rs6000_new_builtin_valid_without_lhs (enum rs6000_gen_builtins fn_code,
> +				      tree fndecl)
> +{
> +  if (TREE_TYPE (TREE_TYPE (fndecl)) == void_type_node)
> +    return true;

Is that a better or improved version of the code as seen in
rs6000_builtin_valid_without_lhs ? 
That is
>  if (rs6000_builtin_info[fn_code].attr & RS6000_BTC_VOID)
>    return true;

ok either way.


> +
> +  switch (fn_code)
> +    {
> +    case RS6000_BIF_STVX_V16QI:
> +    case RS6000_BIF_STVX_V8HI:
> +    case RS6000_BIF_STVX_V4SI:
> +    case RS6000_BIF_STVX_V4SF:
> +    case RS6000_BIF_STVX_V2DI:
> +    case RS6000_BIF_STVX_V2DF:
> +    case RS6000_BIF_STXVW4X_V16QI:
> +    case RS6000_BIF_STXVW4X_V8HI:
> +    case RS6000_BIF_STXVW4X_V4SF:
> +    case RS6000_BIF_STXVW4X_V4SI:
> +    case RS6000_BIF_STXVD2X_V2DF:
> +    case RS6000_BIF_STXVD2X_V2DI:
> +      return true;
> +    default:
> +      return false;
> +    }
> +}
> +
>  /* Check whether a builtin function is supported in this target
>     configuration.  */
>  bool
> @@ -13024,6 +13057,1138 @@ rs6000_new_builtin_is_supported (enum rs6000_gen_builtins fncode)
>    gcc_unreachable ();
>  }
> 
> +/* Expand the MMA built-ins early, so that we can convert the pass-by-reference
> +   __vector_quad arguments into pass-by-value arguments, leading to more
> +   efficient code generation.  */
> +static bool
> +rs6000_gimple_fold_new_mma_builtin (gimple_stmt_iterator *gsi,
> +				    rs6000_gen_builtins fn_code)
> +{
> +  gimple *stmt = gsi_stmt (*gsi);
> +  size_t fncode = (size_t) fn_code;
> +
> +  if (!bif_is_mma (rs6000_builtin_info_x[fncode]))
> +    return false;
> +
> +  /* Each call that can be gimple-expanded has an associated built-in
> +     function that it will expand into.  If this one doesn't, we have
> +     already expanded it!  */
> +  if (rs6000_builtin_info_x[fncode].assoc_bif == RS6000_BIF_NONE)
> +    return false;
> +
> +  bifdata *bd = &rs6000_builtin_info_x[fncode];
> +  unsigned nopnds = bd->nargs;
> +  gimple_seq new_seq = NULL;
> +  gimple *new_call;
> +  tree new_decl;
> +
> +  /* Compatibility built-ins; we used to call these
> +     __builtin_mma_{dis,}assemble_pair, but now we call them
> +     __builtin_vsx_{dis,}assemble_pair.  Handle the old versions.  */
> +  if (fncode == RS6000_BIF_ASSEMBLE_PAIR)
> +    fncode = RS6000_BIF_ASSEMBLE_PAIR_V;
> +  else if (fncode == RS6000_BIF_DISASSEMBLE_PAIR)
> +    fncode = RS6000_BIF_DISASSEMBLE_PAIR_V;
> +
> +  if (fncode == RS6000_BIF_DISASSEMBLE_ACC
> +      || fncode == RS6000_BIF_DISASSEMBLE_PAIR_V)
> +    {
> +      /* This is an MMA disassemble built-in function.  */
> +      push_gimplify_context (true);
> +      unsigned nvec = (fncode == RS6000_BIF_DISASSEMBLE_ACC) ? 4 : 2;
> +      tree dst_ptr = gimple_call_arg (stmt, 0);
> +      tree src_ptr = gimple_call_arg (stmt, 1);
> +      tree src_type = TREE_TYPE (src_ptr);
> +      tree src = create_tmp_reg_or_ssa_name (TREE_TYPE (src_type));
> +      gimplify_assign (src, build_simple_mem_ref (src_ptr), &new_seq);
> +
> +      /* If we are not disassembling an accumulator/pair or our destination is
> +	 another accumulator/pair, then just copy the entire thing as is.  */
> +      if ((fncode == RS6000_BIF_DISASSEMBLE_ACC
> +	   && TREE_TYPE (TREE_TYPE (dst_ptr)) == vector_quad_type_node)
> +	  || (fncode == RS6000_BIF_DISASSEMBLE_PAIR_V
> +	      && TREE_TYPE (TREE_TYPE (dst_ptr)) == vector_pair_type_node))
> +	{
> +	  tree dst = build_simple_mem_ref (build1 (VIEW_CONVERT_EXPR,
> +						   src_type, dst_ptr));
> +	  gimplify_assign (dst, src, &new_seq);
> +	  pop_gimplify_context (NULL);
> +	  gsi_replace_with_seq (gsi, new_seq, true);
> +	  return true;
> +	}
> +
> +      /* If we're disassembling an accumulator into a different type, we need
> +	 to emit a xxmfacc instruction now, since we cannot do it later.  */
> +      if (fncode == RS6000_BIF_DISASSEMBLE_ACC)
> +	{
> +	  new_decl = rs6000_builtin_decls_x[RS6000_BIF_XXMFACC_INTERNAL];
> +	  new_call = gimple_build_call (new_decl, 1, src);
> +	  src = create_tmp_reg_or_ssa_name (vector_quad_type_node);
> +	  gimple_call_set_lhs (new_call, src);
> +	  gimple_seq_add_stmt (&new_seq, new_call);
> +	}
> +
> +      /* Copy the accumulator/pair vector by vector.  */
> +      new_decl
> +	= rs6000_builtin_decls_x[rs6000_builtin_info_x[fncode].assoc_bif];
> +      tree dst_type = build_pointer_type_for_mode (unsigned_V16QI_type_node,
> +						   ptr_mode, true);
> +      tree dst_base = build1 (VIEW_CONVERT_EXPR, dst_type, dst_ptr);
> +      for (unsigned i = 0; i < nvec; i++)
> +	{
> +	  unsigned index = WORDS_BIG_ENDIAN ? i : nvec - 1 - i;
> +	  tree dst = build2 (MEM_REF, unsigned_V16QI_type_node, dst_base,
> +			     build_int_cst (dst_type, index * 16));
> +	  tree dstssa = create_tmp_reg_or_ssa_name (unsigned_V16QI_type_node);
> +	  new_call = gimple_build_call (new_decl, 2, src,
> +					build_int_cstu (uint16_type_node, i));
> +	  gimple_call_set_lhs (new_call, dstssa);
> +	  gimple_seq_add_stmt (&new_seq, new_call);
> +	  gimplify_assign (dst, dstssa, &new_seq);
> +	}
> +      pop_gimplify_context (NULL);
> +      gsi_replace_with_seq (gsi, new_seq, true);
> +      return true;
> +    }
> +
> +  /* Convert this built-in into an internal version that uses pass-by-value
> +     arguments.  The internal built-in is found in the assoc_bif field.  */
> +  new_decl = rs6000_builtin_decls_x[rs6000_builtin_info_x[fncode].assoc_bif];
> +  tree lhs, op[MAX_MMA_OPERANDS];
> +  tree acc = gimple_call_arg (stmt, 0);
> +  push_gimplify_context (true);
> +
> +  if (bif_is_quad (*bd))
> +    {
> +      /* This built-in has a pass-by-reference accumulator input, so load it
> +	 into a temporary accumulator for use as a pass-by-value input.  */
> +      op[0] = create_tmp_reg_or_ssa_name (vector_quad_type_node);
> +      for (unsigned i = 1; i < nopnds; i++)
> +	op[i] = gimple_call_arg (stmt, i);
> +      gimplify_assign (op[0], build_simple_mem_ref (acc), &new_seq);
> +    }
> +  else
> +    {
> +      /* This built-in does not use its pass-by-reference accumulator argument
> +	 as an input argument, so remove it from the input list.  */
> +      nopnds--;
> +      for (unsigned i = 0; i < nopnds; i++)
> +	op[i] = gimple_call_arg (stmt, i + 1);
> +    }
> +
> +  switch (nopnds)
> +    {
> +    case 0:
> +      new_call = gimple_build_call (new_decl, 0);
> +      break;
> +    case 1:
> +      new_call = gimple_build_call (new_decl, 1, op[0]);
> +      break;
> +    case 2:
> +      new_call = gimple_build_call (new_decl, 2, op[0], op[1]);
> +      break;
> +    case 3:
> +      new_call = gimple_build_call (new_decl, 3, op[0], op[1], op[2]);
> +      break;
> +    case 4:
> +      new_call = gimple_build_call (new_decl, 4, op[0], op[1], op[2], op[3]);
> +      break;
> +    case 5:
> +      new_call = gimple_build_call (new_decl, 5, op[0], op[1], op[2], op[3],
> +				    op[4]);
> +      break;
> +    case 6:
> +      new_call = gimple_build_call (new_decl, 6, op[0], op[1], op[2], op[3],
> +				    op[4], op[5]);
> +      break;
> +    case 7:
> +      new_call = gimple_build_call (new_decl, 7, op[0], op[1], op[2], op[3],
> +				    op[4], op[5], op[6]);
> +      break;
> +    default:
> +      gcc_unreachable ();
> +    }
> +
> +  if (fncode == RS6000_BIF_BUILD_PAIR || fncode == RS6000_BIF_ASSEMBLE_PAIR_V)
> +    lhs = create_tmp_reg_or_ssa_name (vector_pair_type_node);
> +  else
> +    lhs = create_tmp_reg_or_ssa_name (vector_quad_type_node);
> +  gimple_call_set_lhs (new_call, lhs);
> +  gimple_seq_add_stmt (&new_seq, new_call);
> +  gimplify_assign (build_simple_mem_ref (acc), lhs, &new_seq);
> +  pop_gimplify_context (NULL);
> +  gsi_replace_with_seq (gsi, new_seq, true);
> +
> +  return true;
> +}

ok

> +
> +/* Fold a machine-dependent built-in in GIMPLE.  (For folding into
> +   a constant, use rs6000_fold_builtin.)  */
> +static bool
> +rs6000_gimple_fold_new_builtin (gimple_stmt_iterator *gsi)
> +{
> +  gimple *stmt = gsi_stmt (*gsi);
> +  tree fndecl = gimple_call_fndecl (stmt);
> +  gcc_checking_assert (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_MD);
> +  enum rs6000_gen_builtins fn_code
> +    = (enum rs6000_gen_builtins) DECL_MD_FUNCTION_CODE (fndecl);
> +  tree arg0, arg1, lhs, temp;
> +  enum tree_code bcode;
> +  gimple *g;
> +
> +  size_t uns_fncode = (size_t) fn_code;
> +  enum insn_code icode = rs6000_builtin_info_x[uns_fncode].icode;
> +  const char *fn_name1 = rs6000_builtin_info_x[uns_fncode].bifname;
> +  const char *fn_name2 = (icode != CODE_FOR_nothing)
> +			  ? get_insn_name ((int) icode)
> +			  : "nothing";
> +
> +  if (TARGET_DEBUG_BUILTIN)
> +      fprintf (stderr, "rs6000_gimple_fold_new_builtin %d %s %s\n",
> +	       fn_code, fn_name1, fn_name2);
> +
> +  if (!rs6000_fold_gimple)
> +    return false;
> +
> +  /* Prevent gimple folding for code that does not have a LHS, unless it is
> +     allowed per the rs6000_new_builtin_valid_without_lhs helper function.  */
> +  if (!gimple_call_lhs (stmt)
> +      && !rs6000_new_builtin_valid_without_lhs (fn_code, fndecl))
> +    return false;
> +
> +  /* Don't fold invalid builtins, let rs6000_expand_builtin diagnose it.  */
> +  if (!rs6000_new_builtin_is_supported (fn_code))
> +    return false;
> +
> +  if (rs6000_gimple_fold_new_mma_builtin (gsi, fn_code))
> +    return true;
> +
> +  switch (fn_code)
> +    {
> +    /* Flavors of vec_add.  We deliberately don't expand
> +       RS6000_BIF_VADDUQM as it gets lowered from V1TImode to
> +       TImode, resulting in much poorer code generation.  */
> +    case RS6000_BIF_VADDUBM:
> +    case RS6000_BIF_VADDUHM:
> +    case RS6000_BIF_VADDUWM:
> +    case RS6000_BIF_VADDUDM:
> +    case RS6000_BIF_VADDFP:
> +    case RS6000_BIF_XVADDDP:
> +    case RS6000_BIF_XVADDSP:
> +      bcode = PLUS_EXPR;
> +    do_binary:
> +      arg0 = gimple_call_arg (stmt, 0);
> +      arg1 = gimple_call_arg (stmt, 1);
> +      lhs = gimple_call_lhs (stmt);
> +      if (INTEGRAL_TYPE_P (TREE_TYPE (TREE_TYPE (lhs)))
> +	  && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (TREE_TYPE (lhs))))
> +	{
> +	  /* Ensure the binary operation is performed in a type
> +	     that wraps if it is integral type.  */
> +	  gimple_seq stmts = NULL;
> +	  tree type = unsigned_type_for (TREE_TYPE (lhs));
> +	  tree uarg0 = gimple_build (&stmts, VIEW_CONVERT_EXPR,
> +				     type, arg0);
> +	  tree uarg1 = gimple_build (&stmts, VIEW_CONVERT_EXPR,
> +				     type, arg1);
> +	  tree res = gimple_build (&stmts, gimple_location (stmt), bcode,
> +				   type, uarg0, uarg1);
> +	  gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
> +	  g = gimple_build_assign (lhs, VIEW_CONVERT_EXPR,
> +				   build1 (VIEW_CONVERT_EXPR,
> +					   TREE_TYPE (lhs), res));
> +	  gsi_replace (gsi, g, true);
> +	  return true;
> +	}
> +      g = gimple_build_assign (lhs, bcode, arg0, arg1);
> +      gimple_set_location (g, gimple_location (stmt));
> +      gsi_replace (gsi, g, true);
> +      return true;
> +    /* Flavors of vec_sub.  We deliberately don't expand
> +       P8V_BUILTIN_VSUBUQM. */


Is there a new name for which to reference VSUBUQM in that comment?


> +    case RS6000_BIF_VSUBUBM:
> +    case RS6000_BIF_VSUBUHM:
> +    case RS6000_BIF_VSUBUWM:
> +    case RS6000_BIF_VSUBUDM:
> +    case RS6000_BIF_VSUBFP:
> +    case RS6000_BIF_XVSUBDP:
> +    case RS6000_BIF_XVSUBSP:
> +      bcode = MINUS_EXPR;
> +      goto do_binary;
> +    case RS6000_BIF_XVMULSP:
> +    case RS6000_BIF_XVMULDP:
> +      arg0 = gimple_call_arg (stmt, 0);
> +      arg1 = gimple_call_arg (stmt, 1);
> +      lhs = gimple_call_lhs (stmt);
> +      g = gimple_build_assign (lhs, MULT_EXPR, arg0, arg1);
> +      gimple_set_location (g, gimple_location (stmt));
> +      gsi_replace (gsi, g, true);
> +      return true;
> +    /* Even element flavors of vec_mul (signed). */
> +    case RS6000_BIF_VMULESB:
> +    case RS6000_BIF_VMULESH:
> +    case RS6000_BIF_VMULESW:
> +    /* Even element flavors of vec_mul (unsigned).  */
> +    case RS6000_BIF_VMULEUB:
> +    case RS6000_BIF_VMULEUH:
> +    case RS6000_BIF_VMULEUW:
> +      arg0 = gimple_call_arg (stmt, 0);
> +      arg1 = gimple_call_arg (stmt, 1);
> +      lhs = gimple_call_lhs (stmt);
> +      g = gimple_build_assign (lhs, VEC_WIDEN_MULT_EVEN_EXPR, arg0, arg1);
> +      gimple_set_location (g, gimple_location (stmt));
> +      gsi_replace (gsi, g, true);
> +      return true;
> +    /* Odd element flavors of vec_mul (signed).  */
> +    case RS6000_BIF_VMULOSB:
> +    case RS6000_BIF_VMULOSH:
> +    case RS6000_BIF_VMULOSW:
> +    /* Odd element flavors of vec_mul (unsigned). */
> +    case RS6000_BIF_VMULOUB:
> +    case RS6000_BIF_VMULOUH:
> +    case RS6000_BIF_VMULOUW:
> +      arg0 = gimple_call_arg (stmt, 0);
> +      arg1 = gimple_call_arg (stmt, 1);
> +      lhs = gimple_call_lhs (stmt);
> +      g = gimple_build_assign (lhs, VEC_WIDEN_MULT_ODD_EXPR, arg0, arg1);
> +      gimple_set_location (g, gimple_location (stmt));
> +      gsi_replace (gsi, g, true);
> +      return true;
> +    /* Flavors of vec_div (Integer).  */
> +    case RS6000_BIF_DIV_V2DI:
> +    case RS6000_BIF_UDIV_V2DI:
> +      arg0 = gimple_call_arg (stmt, 0);
> +      arg1 = gimple_call_arg (stmt, 1);
> +      lhs = gimple_call_lhs (stmt);
> +      g = gimple_build_assign (lhs, TRUNC_DIV_EXPR, arg0, arg1);
> +      gimple_set_location (g, gimple_location (stmt));
> +      gsi_replace (gsi, g, true);
> +      return true;
> +    /* Flavors of vec_div (Float).  */
> +    case RS6000_BIF_XVDIVSP:
> +    case RS6000_BIF_XVDIVDP:
> +      arg0 = gimple_call_arg (stmt, 0);
> +      arg1 = gimple_call_arg (stmt, 1);
> +      lhs = gimple_call_lhs (stmt);
> +      g = gimple_build_assign (lhs, RDIV_EXPR, arg0, arg1);
> +      gimple_set_location (g, gimple_location (stmt));
> +      gsi_replace (gsi, g, true);
> +      return true;
> +    /* Flavors of vec_and.  */
> +    case RS6000_BIF_VAND_V16QI_UNS:
> +    case RS6000_BIF_VAND_V16QI:
> +    case RS6000_BIF_VAND_V8HI_UNS:
> +    case RS6000_BIF_VAND_V8HI:
> +    case RS6000_BIF_VAND_V4SI_UNS:
> +    case RS6000_BIF_VAND_V4SI:
> +    case RS6000_BIF_VAND_V2DI_UNS:
> +    case RS6000_BIF_VAND_V2DI:
> +    case RS6000_BIF_VAND_V4SF:
> +    case RS6000_BIF_VAND_V2DF:
> +      arg0 = gimple_call_arg (stmt, 0);
> +      arg1 = gimple_call_arg (stmt, 1);
> +      lhs = gimple_call_lhs (stmt);
> +      g = gimple_build_assign (lhs, BIT_AND_EXPR, arg0, arg1);
> +      gimple_set_location (g, gimple_location (stmt));
> +      gsi_replace (gsi, g, true);
> +      return true;
> +    /* Flavors of vec_andc.  */
> +    case RS6000_BIF_VANDC_V16QI_UNS:
> +    case RS6000_BIF_VANDC_V16QI:
> +    case RS6000_BIF_VANDC_V8HI_UNS:
> +    case RS6000_BIF_VANDC_V8HI:
> +    case RS6000_BIF_VANDC_V4SI_UNS:
> +    case RS6000_BIF_VANDC_V4SI:
> +    case RS6000_BIF_VANDC_V2DI_UNS:
> +    case RS6000_BIF_VANDC_V2DI:
> +    case RS6000_BIF_VANDC_V4SF:
> +    case RS6000_BIF_VANDC_V2DF:
> +      arg0 = gimple_call_arg (stmt, 0);
> +      arg1 = gimple_call_arg (stmt, 1);
> +      lhs = gimple_call_lhs (stmt);
> +      temp = create_tmp_reg_or_ssa_name (TREE_TYPE (arg1));
> +      g = gimple_build_assign (temp, BIT_NOT_EXPR, arg1);
> +      gimple_set_location (g, gimple_location (stmt));
> +      gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +      g = gimple_build_assign (lhs, BIT_AND_EXPR, arg0, temp);
> +      gimple_set_location (g, gimple_location (stmt));
> +      gsi_replace (gsi, g, true);
> +      return true;
> +    /* Flavors of vec_nand.  */
> +    case RS6000_BIF_NAND_V16QI_UNS:
> +    case RS6000_BIF_NAND_V16QI:
> +    case RS6000_BIF_NAND_V8HI_UNS:
> +    case RS6000_BIF_NAND_V8HI:
> +    case RS6000_BIF_NAND_V4SI_UNS:
> +    case RS6000_BIF_NAND_V4SI:
> +    case RS6000_BIF_NAND_V2DI_UNS:
> +    case RS6000_BIF_NAND_V2DI:
> +    case RS6000_BIF_NAND_V4SF:
> +    case RS6000_BIF_NAND_V2DF:
> +      arg0 = gimple_call_arg (stmt, 0);
> +      arg1 = gimple_call_arg (stmt, 1);
> +      lhs = gimple_call_lhs (stmt);
> +      temp = create_tmp_reg_or_ssa_name (TREE_TYPE (arg1));
> +      g = gimple_build_assign (temp, BIT_AND_EXPR, arg0, arg1);
> +      gimple_set_location (g, gimple_location (stmt));
> +      gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +      g = gimple_build_assign (lhs, BIT_NOT_EXPR, temp);
> +      gimple_set_location (g, gimple_location (stmt));
> +      gsi_replace (gsi, g, true);
> +      return true;
> +    /* Flavors of vec_or.  */
> +    case RS6000_BIF_VOR_V16QI_UNS:
> +    case RS6000_BIF_VOR_V16QI:
> +    case RS6000_BIF_VOR_V8HI_UNS:
> +    case RS6000_BIF_VOR_V8HI:
> +    case RS6000_BIF_VOR_V4SI_UNS:
> +    case RS6000_BIF_VOR_V4SI:
> +    case RS6000_BIF_VOR_V2DI_UNS:
> +    case RS6000_BIF_VOR_V2DI:
> +    case RS6000_BIF_VOR_V4SF:
> +    case RS6000_BIF_VOR_V2DF:
> +      arg0 = gimple_call_arg (stmt, 0);
> +      arg1 = gimple_call_arg (stmt, 1);
> +      lhs = gimple_call_lhs (stmt);
> +      g = gimple_build_assign (lhs, BIT_IOR_EXPR, arg0, arg1);
> +      gimple_set_location (g, gimple_location (stmt));
> +      gsi_replace (gsi, g, true);
> +      return true;
> +    /* flavors of vec_orc.  */
> +    case RS6000_BIF_ORC_V16QI_UNS:
> +    case RS6000_BIF_ORC_V16QI:
> +    case RS6000_BIF_ORC_V8HI_UNS:
> +    case RS6000_BIF_ORC_V8HI:
> +    case RS6000_BIF_ORC_V4SI_UNS:
> +    case RS6000_BIF_ORC_V4SI:
> +    case RS6000_BIF_ORC_V2DI_UNS:
> +    case RS6000_BIF_ORC_V2DI:
> +    case RS6000_BIF_ORC_V4SF:
> +    case RS6000_BIF_ORC_V2DF:
> +      arg0 = gimple_call_arg (stmt, 0);
> +      arg1 = gimple_call_arg (stmt, 1);
> +      lhs = gimple_call_lhs (stmt);
> +      temp = create_tmp_reg_or_ssa_name (TREE_TYPE (arg1));
> +      g = gimple_build_assign (temp, BIT_NOT_EXPR, arg1);
> +      gimple_set_location (g, gimple_location (stmt));
> +      gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +      g = gimple_build_assign (lhs, BIT_IOR_EXPR, arg0, temp);
> +      gimple_set_location (g, gimple_location (stmt));
> +      gsi_replace (gsi, g, true);
> +      return true;
> +    /* Flavors of vec_xor.  */
> +    case RS6000_BIF_VXOR_V16QI_UNS:
> +    case RS6000_BIF_VXOR_V16QI:
> +    case RS6000_BIF_VXOR_V8HI_UNS:
> +    case RS6000_BIF_VXOR_V8HI:
> +    case RS6000_BIF_VXOR_V4SI_UNS:
> +    case RS6000_BIF_VXOR_V4SI:
> +    case RS6000_BIF_VXOR_V2DI_UNS:
> +    case RS6000_BIF_VXOR_V2DI:
> +    case RS6000_BIF_VXOR_V4SF:
> +    case RS6000_BIF_VXOR_V2DF:
> +      arg0 = gimple_call_arg (stmt, 0);
> +      arg1 = gimple_call_arg (stmt, 1);
> +      lhs = gimple_call_lhs (stmt);
> +      g = gimple_build_assign (lhs, BIT_XOR_EXPR, arg0, arg1);
> +      gimple_set_location (g, gimple_location (stmt));
> +      gsi_replace (gsi, g, true);
> +      return true;
> +    /* Flavors of vec_nor.  */
> +    case RS6000_BIF_VNOR_V16QI_UNS:
> +    case RS6000_BIF_VNOR_V16QI:
> +    case RS6000_BIF_VNOR_V8HI_UNS:
> +    case RS6000_BIF_VNOR_V8HI:
> +    case RS6000_BIF_VNOR_V4SI_UNS:
> +    case RS6000_BIF_VNOR_V4SI:
> +    case RS6000_BIF_VNOR_V2DI_UNS:
> +    case RS6000_BIF_VNOR_V2DI:
> +    case RS6000_BIF_VNOR_V4SF:
> +    case RS6000_BIF_VNOR_V2DF:
> +      arg0 = gimple_call_arg (stmt, 0);
> +      arg1 = gimple_call_arg (stmt, 1);
> +      lhs = gimple_call_lhs (stmt);
> +      temp = create_tmp_reg_or_ssa_name (TREE_TYPE (arg1));
> +      g = gimple_build_assign (temp, BIT_IOR_EXPR, arg0, arg1);
> +      gimple_set_location (g, gimple_location (stmt));
> +      gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +      g = gimple_build_assign (lhs, BIT_NOT_EXPR, temp);
> +      gimple_set_location (g, gimple_location (stmt));
> +      gsi_replace (gsi, g, true);
> +      return true;
> +    /* flavors of vec_abs.  */
> +    case RS6000_BIF_ABS_V16QI:
> +    case RS6000_BIF_ABS_V8HI:
> +    case RS6000_BIF_ABS_V4SI:
> +    case RS6000_BIF_ABS_V4SF:
> +    case RS6000_BIF_ABS_V2DI:
> +    case RS6000_BIF_XVABSDP:
> +    case RS6000_BIF_XVABSSP:
> +      arg0 = gimple_call_arg (stmt, 0);
> +      if (INTEGRAL_TYPE_P (TREE_TYPE (TREE_TYPE (arg0)))
> +	  && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (TREE_TYPE (arg0))))
> +	return false;
> +      lhs = gimple_call_lhs (stmt);
> +      g = gimple_build_assign (lhs, ABS_EXPR, arg0);
> +      gimple_set_location (g, gimple_location (stmt));
> +      gsi_replace (gsi, g, true);
> +      return true;
> +    /* flavors of vec_min.  */
> +    case RS6000_BIF_XVMINDP:
> +    case RS6000_BIF_XVMINSP:
> +    case RS6000_BIF_VMINSD:
> +    case RS6000_BIF_VMINUD:
> +    case RS6000_BIF_VMINSB:
> +    case RS6000_BIF_VMINSH:
> +    case RS6000_BIF_VMINSW:
> +    case RS6000_BIF_VMINUB:
> +    case RS6000_BIF_VMINUH:
> +    case RS6000_BIF_VMINUW:
> +    case RS6000_BIF_VMINFP:
> +      arg0 = gimple_call_arg (stmt, 0);
> +      arg1 = gimple_call_arg (stmt, 1);
> +      lhs = gimple_call_lhs (stmt);
> +      g = gimple_build_assign (lhs, MIN_EXPR, arg0, arg1);
> +      gimple_set_location (g, gimple_location (stmt));
> +      gsi_replace (gsi, g, true);
> +      return true;
> +    /* flavors of vec_max.  */
> +    case RS6000_BIF_XVMAXDP:
> +    case RS6000_BIF_XVMAXSP:
> +    case RS6000_BIF_VMAXSD:
> +    case RS6000_BIF_VMAXUD:
> +    case RS6000_BIF_VMAXSB:
> +    case RS6000_BIF_VMAXSH:
> +    case RS6000_BIF_VMAXSW:
> +    case RS6000_BIF_VMAXUB:
> +    case RS6000_BIF_VMAXUH:
> +    case RS6000_BIF_VMAXUW:
> +    case RS6000_BIF_VMAXFP:
> +      arg0 = gimple_call_arg (stmt, 0);
> +      arg1 = gimple_call_arg (stmt, 1);
> +      lhs = gimple_call_lhs (stmt);
> +      g = gimple_build_assign (lhs, MAX_EXPR, arg0, arg1);
> +      gimple_set_location (g, gimple_location (stmt));
> +      gsi_replace (gsi, g, true);
> +      return true;
> +    /* Flavors of vec_eqv.  */
> +    case RS6000_BIF_EQV_V16QI:
> +    case RS6000_BIF_EQV_V8HI:
> +    case RS6000_BIF_EQV_V4SI:
> +    case RS6000_BIF_EQV_V4SF:
> +    case RS6000_BIF_EQV_V2DF:
> +    case RS6000_BIF_EQV_V2DI:
> +      arg0 = gimple_call_arg (stmt, 0);
> +      arg1 = gimple_call_arg (stmt, 1);
> +      lhs = gimple_call_lhs (stmt);
> +      temp = create_tmp_reg_or_ssa_name (TREE_TYPE (arg1));
> +      g = gimple_build_assign (temp, BIT_XOR_EXPR, arg0, arg1);
> +      gimple_set_location (g, gimple_location (stmt));
> +      gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +      g = gimple_build_assign (lhs, BIT_NOT_EXPR, temp);
> +      gimple_set_location (g, gimple_location (stmt));
> +      gsi_replace (gsi, g, true);
> +      return true;
> +    /* Flavors of vec_rotate_left.  */
> +    case RS6000_BIF_VRLB:
> +    case RS6000_BIF_VRLH:
> +    case RS6000_BIF_VRLW:
> +    case RS6000_BIF_VRLD:
> +      arg0 = gimple_call_arg (stmt, 0);
> +      arg1 = gimple_call_arg (stmt, 1);
> +      lhs = gimple_call_lhs (stmt);
> +      g = gimple_build_assign (lhs, LROTATE_EXPR, arg0, arg1);
> +      gimple_set_location (g, gimple_location (stmt));
> +      gsi_replace (gsi, g, true);
> +      return true;
> +  /* Flavors of vector shift right algebraic.
> +     vec_sra{b,h,w} -> vsra{b,h,w}.  */
> +    case RS6000_BIF_VSRAB:
> +    case RS6000_BIF_VSRAH:
> +    case RS6000_BIF_VSRAW:
> +    case RS6000_BIF_VSRAD:
> +      {
> +	arg0 = gimple_call_arg (stmt, 0);
> +	arg1 = gimple_call_arg (stmt, 1);
> +	lhs = gimple_call_lhs (stmt);
> +	tree arg1_type = TREE_TYPE (arg1);
> +	tree unsigned_arg1_type = unsigned_type_for (TREE_TYPE (arg1));
> +	tree unsigned_element_type = unsigned_type_for (TREE_TYPE (arg1_type));
> +	location_t loc = gimple_location (stmt);
> +	/* Force arg1 into the range valid matching the arg0 type.  */
> +	/* Build a vector consisting of the max valid bit-size values.  */
> +	int n_elts = VECTOR_CST_NELTS (arg1);
> +	tree element_size = build_int_cst (unsigned_element_type,
> +					   128 / n_elts);
> +	tree_vector_builder elts (unsigned_arg1_type, n_elts, 1);
> +	for (int i = 0; i < n_elts; i++)
> +	  elts.safe_push (element_size);
> +	tree modulo_tree = elts.build ();
> +	/* Modulo the provided shift value against that vector.  */
> +	gimple_seq stmts = NULL;
> +	tree unsigned_arg1 = gimple_build (&stmts, VIEW_CONVERT_EXPR,
> +					   unsigned_arg1_type, arg1);
> +	tree new_arg1 = gimple_build (&stmts, loc, TRUNC_MOD_EXPR,
> +				      unsigned_arg1_type, unsigned_arg1,
> +				      modulo_tree);
> +	gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
> +	/* And finally, do the shift.  */
> +	g = gimple_build_assign (lhs, RSHIFT_EXPR, arg0, new_arg1);
> +	gimple_set_location (g, loc);
> +	gsi_replace (gsi, g, true);
> +	return true;
> +      }
> +   /* Flavors of vector shift left.
> +      builtin_altivec_vsl{b,h,w} -> vsl{b,h,w}.  */
> +    case RS6000_BIF_VSLB:
> +    case RS6000_BIF_VSLH:
> +    case RS6000_BIF_VSLW:
> +    case RS6000_BIF_VSLD:
> +      {
> +	location_t loc;
> +	gimple_seq stmts = NULL;
> +	arg0 = gimple_call_arg (stmt, 0);
> +	tree arg0_type = TREE_TYPE (arg0);
> +	if (INTEGRAL_TYPE_P (TREE_TYPE (arg0_type))
> +	    && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (arg0_type)))
> +	  return false;
> +	arg1 = gimple_call_arg (stmt, 1);
> +	tree arg1_type = TREE_TYPE (arg1);
> +	tree unsigned_arg1_type = unsigned_type_for (TREE_TYPE (arg1));
> +	tree unsigned_element_type = unsigned_type_for (TREE_TYPE (arg1_type));
> +	loc = gimple_location (stmt);
> +	lhs = gimple_call_lhs (stmt);
> +	/* Force arg1 into the range valid matching the arg0 type.  */
> +	/* Build a vector consisting of the max valid bit-size values.  */
> +	int n_elts = VECTOR_CST_NELTS (arg1);
> +	int tree_size_in_bits = TREE_INT_CST_LOW (size_in_bytes (arg1_type))
> +				* BITS_PER_UNIT;
> +	tree element_size = build_int_cst (unsigned_element_type,
> +					   tree_size_in_bits / n_elts);
> +	tree_vector_builder elts (unsigned_type_for (arg1_type), n_elts, 1);
> +	for (int i = 0; i < n_elts; i++)
> +	  elts.safe_push (element_size);
> +	tree modulo_tree = elts.build ();
> +	/* Modulo the provided shift value against that vector.  */
> +	tree unsigned_arg1 = gimple_build (&stmts, VIEW_CONVERT_EXPR,
> +					   unsigned_arg1_type, arg1);
> +	tree new_arg1 = gimple_build (&stmts, loc, TRUNC_MOD_EXPR,
> +				      unsigned_arg1_type, unsigned_arg1,
> +				      modulo_tree);
> +	gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
> +	/* And finally, do the shift.  */
> +	g = gimple_build_assign (lhs, LSHIFT_EXPR, arg0, new_arg1);
> +	gimple_set_location (g, gimple_location (stmt));
> +	gsi_replace (gsi, g, true);
> +	return true;
> +      }
> +    /* Flavors of vector shift right.  */
> +    case RS6000_BIF_VSRB:
> +    case RS6000_BIF_VSRH:
> +    case RS6000_BIF_VSRW:
> +    case RS6000_BIF_VSRD:
> +      {
> +	arg0 = gimple_call_arg (stmt, 0);
> +	arg1 = gimple_call_arg (stmt, 1);
> +	lhs = gimple_call_lhs (stmt);
> +	tree arg1_type = TREE_TYPE (arg1);
> +	tree unsigned_arg1_type = unsigned_type_for (TREE_TYPE (arg1));
> +	tree unsigned_element_type = unsigned_type_for (TREE_TYPE (arg1_type));
> +	location_t loc = gimple_location (stmt);
> +	gimple_seq stmts = NULL;
> +	/* Convert arg0 to unsigned.  */
> +	tree arg0_unsigned
> +	  = gimple_build (&stmts, VIEW_CONVERT_EXPR,
> +			  unsigned_type_for (TREE_TYPE (arg0)), arg0);
> +	/* Force arg1 into the range valid matching the arg0 type.  */
> +	/* Build a vector consisting of the max valid bit-size values.  */
> +	int n_elts = VECTOR_CST_NELTS (arg1);
> +	tree element_size = build_int_cst (unsigned_element_type,
> +					   128 / n_elts);
> +	tree_vector_builder elts (unsigned_arg1_type, n_elts, 1);
> +	for (int i = 0; i < n_elts; i++)
> +	  elts.safe_push (element_size);
> +	tree modulo_tree = elts.build ();
> +	/* Modulo the provided shift value against that vector.  */
> +	tree unsigned_arg1 = gimple_build (&stmts, VIEW_CONVERT_EXPR,
> +					   unsigned_arg1_type, arg1);
> +	tree new_arg1 = gimple_build (&stmts, loc, TRUNC_MOD_EXPR,
> +				      unsigned_arg1_type, unsigned_arg1,
> +				      modulo_tree);
> +	/* Do the shift.  */
> +	tree res
> +	  = gimple_build (&stmts, RSHIFT_EXPR,
> +			  TREE_TYPE (arg0_unsigned), arg0_unsigned, new_arg1);
> +	/* Convert result back to the lhs type.  */
> +	res = gimple_build (&stmts, VIEW_CONVERT_EXPR, TREE_TYPE (lhs), res);
> +	gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
> +	replace_call_with_value (gsi, res);
> +	return true;
> +      }
> +    /* Vector loads.  */
> +    case RS6000_BIF_LVX_V16QI:
> +    case RS6000_BIF_LVX_V8HI:
> +    case RS6000_BIF_LVX_V4SI:
> +    case RS6000_BIF_LVX_V4SF:
> +    case RS6000_BIF_LVX_V2DI:
> +    case RS6000_BIF_LVX_V2DF:
> +    case RS6000_BIF_LVX_V1TI:
> +      {
> +	arg0 = gimple_call_arg (stmt, 0);  // offset
> +	arg1 = gimple_call_arg (stmt, 1);  // address
> +	lhs = gimple_call_lhs (stmt);
> +	location_t loc = gimple_location (stmt);
> +	/* Since arg1 may be cast to a different type, just use ptr_type_node
> +	   here instead of trying to enforce TBAA on pointer types.  */
> +	tree arg1_type = ptr_type_node;
> +	tree lhs_type = TREE_TYPE (lhs);
> +	/* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  Create
> +	   the tree using the value from arg0.  The resulting type will match
> +	   the type of arg1.  */
> +	gimple_seq stmts = NULL;
> +	tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg0);
> +	tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR,
> +				       arg1_type, arg1, temp_offset);
> +	/* Mask off any lower bits from the address.  */
> +	tree aligned_addr = gimple_build (&stmts, loc, BIT_AND_EXPR,
> +					  arg1_type, temp_addr,
> +					  build_int_cst (arg1_type, -16));
> +	gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
> +	if (!is_gimple_mem_ref_addr (aligned_addr))
> +	  {
> +	    tree t = make_ssa_name (TREE_TYPE (aligned_addr));
> +	    gimple *g = gimple_build_assign (t, aligned_addr);
> +	    gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +	    aligned_addr = t;
> +	  }
> +	/* Use the build2 helper to set up the mem_ref.  The MEM_REF could also
> +	   take an offset, but since we've already incorporated the offset
> +	   above, here we just pass in a zero.  */
> +	gimple *g
> +	  = gimple_build_assign (lhs, build2 (MEM_REF, lhs_type, aligned_addr,
> +					      build_int_cst (arg1_type, 0)));
> +	gimple_set_location (g, loc);
> +	gsi_replace (gsi, g, true);
> +	return true;
> +      }
> +    /* Vector stores.  */
> +    case RS6000_BIF_STVX_V16QI:
> +    case RS6000_BIF_STVX_V8HI:
> +    case RS6000_BIF_STVX_V4SI:
> +    case RS6000_BIF_STVX_V4SF:
> +    case RS6000_BIF_STVX_V2DI:
> +    case RS6000_BIF_STVX_V2DF:
> +      {
> +	arg0 = gimple_call_arg (stmt, 0); /* Value to be stored.  */
> +	arg1 = gimple_call_arg (stmt, 1); /* Offset.  */
> +	tree arg2 = gimple_call_arg (stmt, 2); /* Store-to address.  */
> +	location_t loc = gimple_location (stmt);
> +	tree arg0_type = TREE_TYPE (arg0);
> +	/* Use ptr_type_node (no TBAA) for the arg2_type.
> +	   FIXME: (Richard)  "A proper fix would be to transition this type as
> +	   seen from the frontend to GIMPLE, for example in a similar way we
> +	   do for MEM_REFs by piggy-backing that on an extra argument, a
> +	   constant zero pointer of the alias pointer type to use (which would
> +	   also serve as a type indicator of the store itself).  I'd use a
> +	   target specific internal function for this (not sure if we can have
> +	   those target specific, but I guess if it's folded away then that's
> +	   fine) and get away with the overload set."  */
> +	tree arg2_type = ptr_type_node;
> +	/* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  Create
> +	   the tree using the value from arg0.  The resulting type will match
> +	   the type of arg2.  */
> +	gimple_seq stmts = NULL;
> +	tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg1);
> +	tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR,
> +				       arg2_type, arg2, temp_offset);
> +	/* Mask off any lower bits from the address.  */
> +	tree aligned_addr = gimple_build (&stmts, loc, BIT_AND_EXPR,
> +					  arg2_type, temp_addr,
> +					  build_int_cst (arg2_type, -16));
> +	gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
> +	if (!is_gimple_mem_ref_addr (aligned_addr))
> +	  {
> +	    tree t = make_ssa_name (TREE_TYPE (aligned_addr));
> +	    gimple *g = gimple_build_assign (t, aligned_addr);
> +	    gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +	    aligned_addr = t;
> +	  }
> +	/* The desired gimple result should be similar to:
> +	   MEM[(__vector floatD.1407 *)_1] = vf1D.2697;  */
> +	gimple *g
> +	  = gimple_build_assign (build2 (MEM_REF, arg0_type, aligned_addr,
> +					 build_int_cst (arg2_type, 0)), arg0);
> +	gimple_set_location (g, loc);
> +	gsi_replace (gsi, g, true);
> +	return true;
> +      }
> +
> +    /* unaligned Vector loads.  */
> +    case RS6000_BIF_LXVW4X_V16QI:
> +    case RS6000_BIF_LXVW4X_V8HI:
> +    case RS6000_BIF_LXVW4X_V4SF:
> +    case RS6000_BIF_LXVW4X_V4SI:
> +    case RS6000_BIF_LXVD2X_V2DF:
> +    case RS6000_BIF_LXVD2X_V2DI:
> +      {
> +	arg0 = gimple_call_arg (stmt, 0);  // offset
> +	arg1 = gimple_call_arg (stmt, 1);  // address
> +	lhs = gimple_call_lhs (stmt);
> +	location_t loc = gimple_location (stmt);
> +	/* Since arg1 may be cast to a different type, just use ptr_type_node
> +	   here instead of trying to enforce TBAA on pointer types.  */
> +	tree arg1_type = ptr_type_node;
> +	tree lhs_type = TREE_TYPE (lhs);
> +	/* In GIMPLE the type of the MEM_REF specifies the alignment.  The
> +	  required alignment (power) is 4 bytes regardless of data type.  */
> +	tree align_ltype = build_aligned_type (lhs_type, 4);
> +	/* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  Create
> +	   the tree using the value from arg0.  The resulting type will match
> +	   the type of arg1.  */
> +	gimple_seq stmts = NULL;
> +	tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg0);
> +	tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR,
> +				       arg1_type, arg1, temp_offset);
> +	gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
> +	if (!is_gimple_mem_ref_addr (temp_addr))
> +	  {
> +	    tree t = make_ssa_name (TREE_TYPE (temp_addr));
> +	    gimple *g = gimple_build_assign (t, temp_addr);
> +	    gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +	    temp_addr = t;
> +	  }
> +	/* Use the build2 helper to set up the mem_ref.  The MEM_REF could also
> +	   take an offset, but since we've already incorporated the offset
> +	   above, here we just pass in a zero.  */
> +	gimple *g;
> +	g = gimple_build_assign (lhs, build2 (MEM_REF, align_ltype, temp_addr,
> +					      build_int_cst (arg1_type, 0)));
> +	gimple_set_location (g, loc);
> +	gsi_replace (gsi, g, true);
> +	return true;
> +      }
> +
> +    /* unaligned Vector stores.  */
> +    case RS6000_BIF_STXVW4X_V16QI:
> +    case RS6000_BIF_STXVW4X_V8HI:
> +    case RS6000_BIF_STXVW4X_V4SF:
> +    case RS6000_BIF_STXVW4X_V4SI:
> +    case RS6000_BIF_STXVD2X_V2DF:
> +    case RS6000_BIF_STXVD2X_V2DI:
> +      {
> +	arg0 = gimple_call_arg (stmt, 0); /* Value to be stored.  */
> +	arg1 = gimple_call_arg (stmt, 1); /* Offset.  */
> +	tree arg2 = gimple_call_arg (stmt, 2); /* Store-to address.  */
> +	location_t loc = gimple_location (stmt);
> +	tree arg0_type = TREE_TYPE (arg0);
> +	/* Use ptr_type_node (no TBAA) for the arg2_type.  */
> +	tree arg2_type = ptr_type_node;
> +	/* In GIMPLE the type of the MEM_REF specifies the alignment.  The
> +	   required alignment (power) is 4 bytes regardless of data type.  */
> +	tree align_stype = build_aligned_type (arg0_type, 4);
> +	/* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  Create
> +	   the tree using the value from arg1.  */
> +	gimple_seq stmts = NULL;
> +	tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg1);
> +	tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR,
> +				       arg2_type, arg2, temp_offset);
> +	gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
> +	if (!is_gimple_mem_ref_addr (temp_addr))
> +	  {
> +	    tree t = make_ssa_name (TREE_TYPE (temp_addr));
> +	    gimple *g = gimple_build_assign (t, temp_addr);
> +	    gsi_insert_before (gsi, g, GSI_SAME_STMT);
> +	    temp_addr = t;
> +	  }
> +	gimple *g;
> +	g = gimple_build_assign (build2 (MEM_REF, align_stype, temp_addr,
> +					 build_int_cst (arg2_type, 0)), arg0);
> +	gimple_set_location (g, loc);
> +	gsi_replace (gsi, g, true);
> +	return true;
> +      }
> +
> +    /* Vector Fused multiply-add (fma).  */
> +    case RS6000_BIF_VMADDFP:
> +    case RS6000_BIF_XVMADDDP:
> +    case RS6000_BIF_XVMADDSP:

I notice that XVMADDSP was missing in the original. 


> +    case RS6000_BIF_VMLADDUHM:
> +      {
> +	arg0 = gimple_call_arg (stmt, 0);
> +	arg1 = gimple_call_arg (stmt, 1);
> +	tree arg2 = gimple_call_arg (stmt, 2);
> +	lhs = gimple_call_lhs (stmt);
> +	gcall *g = gimple_build_call_internal (IFN_FMA, 3, arg0, arg1, arg2);
> +	gimple_call_set_lhs (g, lhs);
> +	gimple_call_set_nothrow (g, true);
> +	gimple_set_location (g, gimple_location (stmt));
> +	gsi_replace (gsi, g, true);
> +	return true;
> +      }
> +
> +    /* Vector compares; EQ, NE, GE, GT, LE.  */
> +    case RS6000_BIF_VCMPEQUB:
> +    case RS6000_BIF_VCMPEQUH:
> +    case RS6000_BIF_VCMPEQUW:
> +    case RS6000_BIF_VCMPEQUD:
> +    /* We deliberately omit RS6000_BIF_VCMPEQUT for now, because gimple
> +       folding produces worse code for 128-bit compares.  */

ok

> +      fold_compare_helper (gsi, EQ_EXPR, stmt);
> +      return true;
> +
> +    case RS6000_BIF_VCMPNEB:
> +    case RS6000_BIF_VCMPNEH:
> +    case RS6000_BIF_VCMPNEW:
> +    /* We deliberately omit RS6000_BIF_VCMPNET for now, because gimple
> +       folding produces worse code for 128-bit compares.  */
> +      fold_compare_helper (gsi, NE_EXPR, stmt);
> +      return true;
> +
> +    case RS6000_BIF_CMPGE_16QI:
> +    case RS6000_BIF_CMPGE_U16QI:
> +    case RS6000_BIF_CMPGE_8HI:
> +    case RS6000_BIF_CMPGE_U8HI:
> +    case RS6000_BIF_CMPGE_4SI:
> +    case RS6000_BIF_CMPGE_U4SI:
> +    case RS6000_BIF_CMPGE_2DI:
> +    case RS6000_BIF_CMPGE_U2DI:
> +    /* We deliberately omit RS6000_BIF_CMPGE_1TI and RS6000_BIF_CMPGE_U1TI
> +       for now, because gimple folding produces worse code for 128-bit
> +       compares.  */
> +      fold_compare_helper (gsi, GE_EXPR, stmt);
> +      return true;
> +
> +    case RS6000_BIF_VCMPGTSB:
> +    case RS6000_BIF_VCMPGTUB:
> +    case RS6000_BIF_VCMPGTSH:
> +    case RS6000_BIF_VCMPGTUH:
> +    case RS6000_BIF_VCMPGTSW:
> +    case RS6000_BIF_VCMPGTUW:
> +    case RS6000_BIF_VCMPGTUD:
> +    case RS6000_BIF_VCMPGTSD:
> +    /* We deliberately omit RS6000_BIF_VCMPGTUT and RS6000_BIF_VCMPGTST
> +       for now, because gimple folding produces worse code for 128-bit
> +       compares.  */
> +      fold_compare_helper (gsi, GT_EXPR, stmt);
> +      return true;
> +
> +    case RS6000_BIF_CMPLE_16QI:
> +    case RS6000_BIF_CMPLE_U16QI:
> +    case RS6000_BIF_CMPLE_8HI:
> +    case RS6000_BIF_CMPLE_U8HI:
> +    case RS6000_BIF_CMPLE_4SI:
> +    case RS6000_BIF_CMPLE_U4SI:
> +    case RS6000_BIF_CMPLE_2DI:
> +    case RS6000_BIF_CMPLE_U2DI:
> +    /* We deliberately omit RS6000_BIF_CMPLE_1TI and RS6000_BIF_CMPLE_U1TI
> +       for now, because gimple folding produces worse code for 128-bit
> +       compares.  */
> +      fold_compare_helper (gsi, LE_EXPR, stmt);
> +      return true;
> +
> +    /* flavors of vec_splat_[us]{8,16,32}.  */
> +    case RS6000_BIF_VSPLTISB:
> +    case RS6000_BIF_VSPLTISH:
> +    case RS6000_BIF_VSPLTISW:
> +      {
> +	arg0 = gimple_call_arg (stmt, 0);
> +	lhs = gimple_call_lhs (stmt);
> +
> +	/* Only fold the vec_splat_*() if the lower bits of arg 0 is a
> +	   5-bit signed constant in range -16 to +15.  */
> +	if (TREE_CODE (arg0) != INTEGER_CST
> +	    || !IN_RANGE (TREE_INT_CST_LOW (arg0), -16, 15))
> +	  return false;
> +	gimple_seq stmts = NULL;
> +	location_t loc = gimple_location (stmt);
> +	tree splat_value = gimple_convert (&stmts, loc,
> +					   TREE_TYPE (TREE_TYPE (lhs)), arg0);
> +	gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
> +	tree splat_tree = build_vector_from_val (TREE_TYPE (lhs), splat_value);
> +	g = gimple_build_assign (lhs, splat_tree);
> +	gimple_set_location (g, gimple_location (stmt));
> +	gsi_replace (gsi, g, true);
> +	return true;
> +      }
> +
> +    /* Flavors of vec_splat.  */
> +    /* a = vec_splat (b, 0x3) becomes a = { b[3],b[3],b[3],...};  */
> +    case RS6000_BIF_VSPLTB:
> +    case RS6000_BIF_VSPLTH:
> +    case RS6000_BIF_VSPLTW:
> +    case RS6000_BIF_XXSPLTD_V2DI:
> +    case RS6000_BIF_XXSPLTD_V2DF:
> +      {
> +	arg0 = gimple_call_arg (stmt, 0); /* input vector.  */
> +	arg1 = gimple_call_arg (stmt, 1); /* index into arg0.  */
> +	/* Only fold the vec_splat_*() if arg1 is both a constant value and
> +	   is a valid index into the arg0 vector.  */
> +	unsigned int n_elts = VECTOR_CST_NELTS (arg0);
> +	if (TREE_CODE (arg1) != INTEGER_CST
> +	    || TREE_INT_CST_LOW (arg1) > (n_elts -1))
> +	  return false;
> +	lhs = gimple_call_lhs (stmt);
> +	tree lhs_type = TREE_TYPE (lhs);
> +	tree arg0_type = TREE_TYPE (arg0);
> +	tree splat;
> +	if (TREE_CODE (arg0) == VECTOR_CST)
> +	  splat = VECTOR_CST_ELT (arg0, TREE_INT_CST_LOW (arg1));
> +	else
> +	  {
> +	    /* Determine (in bits) the length and start location of the
> +	       splat value for a call to the tree_vec_extract helper.  */
> +	    int splat_elem_size = TREE_INT_CST_LOW (size_in_bytes (arg0_type))
> +				  * BITS_PER_UNIT / n_elts;
> +	    int splat_start_bit = TREE_INT_CST_LOW (arg1) * splat_elem_size;
> +	    tree len = build_int_cst (bitsizetype, splat_elem_size);
> +	    tree start = build_int_cst (bitsizetype, splat_start_bit);
> +	    splat = tree_vec_extract (gsi, TREE_TYPE (lhs_type), arg0,
> +				      len, start);
> +	  }
> +	/* And finally, build the new vector.  */
> +	tree splat_tree = build_vector_from_val (lhs_type, splat);
> +	g = gimple_build_assign (lhs, splat_tree);
> +	gimple_set_location (g, gimple_location (stmt));
> +	gsi_replace (gsi, g, true);
> +	return true;
> +      }
> +
> +    /* vec_mergel (integrals).  */
> +    case RS6000_BIF_VMRGLH:
> +    case RS6000_BIF_VMRGLW:
> +    case RS6000_BIF_XXMRGLW_4SI:
> +    case RS6000_BIF_VMRGLB:
> +    case RS6000_BIF_VEC_MERGEL_V2DI:
> +    case RS6000_BIF_XXMRGLW_4SF:
> +    case RS6000_BIF_VEC_MERGEL_V2DF:
> +      fold_mergehl_helper (gsi, stmt, 1);
> +      return true;
> +    /* vec_mergeh (integrals).  */
> +    case RS6000_BIF_VMRGHH:
> +    case RS6000_BIF_VMRGHW:
> +    case RS6000_BIF_XXMRGHW_4SI:
> +    case RS6000_BIF_VMRGHB:
> +    case RS6000_BIF_VEC_MERGEH_V2DI:
> +    case RS6000_BIF_XXMRGHW_4SF:
> +    case RS6000_BIF_VEC_MERGEH_V2DF:
> +      fold_mergehl_helper (gsi, stmt, 0);
> +      return true;
> +
> +    /* Flavors of vec_mergee.  */
> +    case RS6000_BIF_VMRGEW_V4SI:
> +    case RS6000_BIF_VMRGEW_V2DI:
> +    case RS6000_BIF_VMRGEW_V4SF:
> +    case RS6000_BIF_VMRGEW_V2DF:
> +      fold_mergeeo_helper (gsi, stmt, 0);
> +      return true;
> +    /* Flavors of vec_mergeo.  */
> +    case RS6000_BIF_VMRGOW_V4SI:
> +    case RS6000_BIF_VMRGOW_V2DI:
> +    case RS6000_BIF_VMRGOW_V4SF:
> +    case RS6000_BIF_VMRGOW_V2DF:
> +      fold_mergeeo_helper (gsi, stmt, 1);
> +      return true;
> +
> +    /* d = vec_pack (a, b) */
> +    case RS6000_BIF_VPKUDUM:
> +    case RS6000_BIF_VPKUHUM:
> +    case RS6000_BIF_VPKUWUM:
> +      {
> +	arg0 = gimple_call_arg (stmt, 0);
> +	arg1 = gimple_call_arg (stmt, 1);
> +	lhs = gimple_call_lhs (stmt);
> +	gimple *g = gimple_build_assign (lhs, VEC_PACK_TRUNC_EXPR, arg0, arg1);
> +	gimple_set_location (g, gimple_location (stmt));
> +	gsi_replace (gsi, g, true);
> +	return true;
> +      }
> +
> +    /* d = vec_unpackh (a) */
> +    /* Note that the UNPACK_{HI,LO}_EXPR used in the gimple_build_assign call
> +       in this code is sensitive to endian-ness, and needs to be inverted to
> +       handle both LE and BE targets.  */
> +    case RS6000_BIF_VUPKHSB:
> +    case RS6000_BIF_VUPKHSH:
> +    case RS6000_BIF_VUPKHSW:
> +      {
> +	arg0 = gimple_call_arg (stmt, 0);
> +	lhs = gimple_call_lhs (stmt);
> +	if (BYTES_BIG_ENDIAN)
> +	  g = gimple_build_assign (lhs, VEC_UNPACK_HI_EXPR, arg0);
> +	else
> +	  g = gimple_build_assign (lhs, VEC_UNPACK_LO_EXPR, arg0);
> +	gimple_set_location (g, gimple_location (stmt));
> +	gsi_replace (gsi, g, true);
> +	return true;
> +      }
> +    /* d = vec_unpackl (a) */
> +    case RS6000_BIF_VUPKLSB:
> +    case RS6000_BIF_VUPKLSH:
> +    case RS6000_BIF_VUPKLSW:
> +      {
> +	arg0 = gimple_call_arg (stmt, 0);
> +	lhs = gimple_call_lhs (stmt);
> +	if (BYTES_BIG_ENDIAN)
> +	  g = gimple_build_assign (lhs, VEC_UNPACK_LO_EXPR, arg0);
> +	else
> +	  g = gimple_build_assign (lhs, VEC_UNPACK_HI_EXPR, arg0);
> +	gimple_set_location (g, gimple_location (stmt));
> +	gsi_replace (gsi, g, true);
> +	return true;
> +      }
> +    /* There is no gimple type corresponding with pixel, so just return.  */
> +    case RS6000_BIF_VUPKHPX:
> +    case RS6000_BIF_VUPKLPX:
> +      return false;
> +
> +    /* vec_perm.  */
> +    case RS6000_BIF_VPERM_16QI:
> +    case RS6000_BIF_VPERM_8HI:
> +    case RS6000_BIF_VPERM_4SI:
> +    case RS6000_BIF_VPERM_2DI:
> +    case RS6000_BIF_VPERM_4SF:
> +    case RS6000_BIF_VPERM_2DF:

> +    case RS6000_BIF_VPERM_16QI_UNS:
> +    case RS6000_BIF_VPERM_8HI_UNS:
> +    case RS6000_BIF_VPERM_4SI_UNS:
> +    case RS6000_BIF_VPERM_2DI_UNS:

Noting that the _UNS entries are new with respect to the original code.
ok.

> +      {
> +	arg0 = gimple_call_arg (stmt, 0);
> +	arg1 = gimple_call_arg (stmt, 1);
> +	tree permute = gimple_call_arg (stmt, 2);
> +	lhs = gimple_call_lhs (stmt);
> +	location_t loc = gimple_location (stmt);
> +	gimple_seq stmts = NULL;
> +	// convert arg0 and arg1 to match the type of the permute
> +	// for the VEC_PERM_EXPR operation.
> +	tree permute_type = (TREE_TYPE (permute));
> +	tree arg0_ptype = gimple_build (&stmts, loc, VIEW_CONVERT_EXPR,
> +					permute_type, arg0);
> +	tree arg1_ptype = gimple_build (&stmts, loc, VIEW_CONVERT_EXPR,
> +					permute_type, arg1);
> +	tree lhs_ptype = gimple_build (&stmts, loc, VEC_PERM_EXPR,
> +				      permute_type, arg0_ptype, arg1_ptype,
> +				      permute);
> +	// Convert the result back to the desired lhs type upon completion.
> +	tree temp = gimple_build (&stmts, loc, VIEW_CONVERT_EXPR,
> +				  TREE_TYPE (lhs), lhs_ptype);
> +	gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
> +	g = gimple_build_assign (lhs, temp);
> +	gimple_set_location (g, loc);
> +	gsi_replace (gsi, g, true);
> +	return true;
> +      }
> +
> +    default:
> +      if (TARGET_DEBUG_BUILTIN)
> +	fprintf (stderr, "gimple builtin intrinsic not matched:%d %s %s\n",
> +		 fn_code, fn_name1, fn_name2);
> +      break;
> +    }
> +
> +  return false;
> +}



lgtm
thanks
-Will


> +
>  /* Expand an expression EXP that calls a built-in function,
>     with result going to TARGET if that's convenient
>     (and in mode MODE if that's convenient).


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 04/18] rs6000: Handle some recent MMA builtin changes
  2021-09-01 16:13 ` [PATCH 04/18] rs6000: Handle some recent MMA builtin changes Bill Schmidt
@ 2021-09-13 19:02   ` will schmidt
  2021-09-16 23:38   ` Segher Boessenkool
  1 sibling, 0 replies; 52+ messages in thread
From: will schmidt @ 2021-09-13 19:02 UTC (permalink / raw)
  To: Bill Schmidt, gcc-patches; +Cc: dje.gcc, segher

On Wed, 2021-09-01 at 11:13 -0500, Bill Schmidt via Gcc-patches wrote:
> Peter Bergner recently added two new builtins __builtin_vsx_lxvp and
> __builtin_vsx_stxvp.  These happened to break a pattern in MMA builtins that
> I had been using to automate gimple folding of MMA builtins.  Previously,
> every MMA function that could be folded had an associated internal function
> that it was folded into.  The LXVP/STXVP builtins are just folded directly
> into memory operations.
> 
> Instead of relying on this pattern, this patch adds a new attribute to
> builtins called "mmaint," which is set for all MMA builtins that have an
> associated internal builtin.  The naming convention that adds _INTERNAL to
> the builtin index name remains.
> 
> The rest of the patch is just duplicating Peter's patch, using the new
> builtin infrastructure.
> 
> 2021-08-23  Bill Schmidt  <wschmidt@linux.ibm.com>
> 
> gcc/
> 	* config/rs6000/rs6000-builtin-new.def (ASSEMBLE_ACC): Add mmaint flag.
> 	(ASSEMBLE_PAIR): Likewise.
> 	(BUILD_ACC): Likewise.
> 	(DISASSEMBLE_ACC): Likewise.
> 	(DISASSEMBLE_PAIR): Likewise.
> 	(PMXVBF16GER2): Likewise.
> 	(PMXVBF16GER2NN): Likewise.
> 	(PMXVBF16GER2NP): Likewise.
> 	(PMXVBF16GER2PN): Likewise.
> 	(PMXVBF16GER2PP): Likewise.
> 	(PMXVF16GER2): Likewise.
> 	(PMXVF16GER2NN): Likewise.
> 	(PMXVF16GER2NP): Likewise.
> 	(PMXVF16GER2PN): Likewise.
> 	(PMXVF16GER2PP): Likewise.
> 	(PMXVF32GER): Likewise.
> 	(PMXVF32GERNN): Likewise.
> 	(PMXVF32GERNP): Likewise.
> 	(PMXVF32GERPN): Likewise.
> 	(PMXVF32GERPP): Likewise.
> 	(PMXVF64GER): Likewise.
> 	(PMXVF64GERNN): Likewise.
> 	(PMXVF64GERNP): Likewise.
> 	(PMXVF64GERPN): Likewise.
> 	(PMXVF64GERPP): Likewise.
> 	(PMXVI16GER2): Likewise.
> 	(PMXVI16GER2PP): Likewise.
> 	(PMXVI16GER2S): Likewise.
> 	(PMXVI16GER2SPP): Likewise.
> 	(PMXVI4GER8): Likewise.
> 	(PMXVI4GER8PP): Likewise.
> 	(PMXVI8GER4): Likewise.
> 	(PMXVI8GER4PP): Likewise.
> 	(PMXVI8GER4SPP): Likewise.
> 	(XVBF16GER2): Likewise.
> 	(XVBF16GER2NN): Likewise.
> 	(XVBF16GER2NP): Likewise.
> 	(XVBF16GER2PN): Likewise.
> 	(XVBF16GER2PP): Likewise.
> 	(XVF16GER2): Likewise.
> 	(XVF16GER2NN): Likewise.
> 	(XVF16GER2NP): Likewise.
> 	(XVF16GER2PN): Likewise.
> 	(XVF16GER2PP): Likewise.
> 	(XVF32GER): Likewise.
> 	(XVF32GERNN): Likewise.
> 	(XVF32GERNP): Likewise.
> 	(XVF32GERPN): Likewise.
> 	(XVF32GERPP): Likewise.
> 	(XVF64GER): Likewise.
> 	(XVF64GERNN): Likewise.
> 	(XVF64GERNP): Likewise.
> 	(XVF64GERPN): Likewise.
> 	(XVF64GERPP): Likewise.
> 	(XVI16GER2): Likewise.
> 	(XVI16GER2PP): Likewise.
> 	(XVI16GER2S): Likewise.
> 	(XVI16GER2SPP): Likewise.
> 	(XVI4GER8): Likewise.
> 	(XVI4GER8PP): Likewise.
> 	(XVI8GER4): Likewise.
> 	(XVI8GER4PP): Likewise.
> 	(XVI8GER4SPP): Likewise.
> 	(XXMFACC): Likewise.
> 	(XXMTACC): Likewise.
> 	(XXSETACCZ): Likewise.
> 	(ASSEMBLE_PAIR_V): Likewise.
> 	(BUILD_PAIR): Likewise.
> 	(DISASSEMBLE_PAIR_V): Likewise.
> 	(LXVP): New.
> 	(STXVP): New.

ok

> 	* config/rs6000/rs6000-call.c
> 	(rs6000_gimple_fold_new_mma_builtin): Handle RS6000_BIF_LXVP and
> 	RS6000_BIF_STXVP.
> 	* config/rs6000/rs6000-gen-builtins.c (attrinfo): Add ismmaint.
> 	(parse_bif_attrs): Handle ismmaint.
> 	(write_decls): Add bif_mmaint_bit and bif_is_mmaint.
> 	(write_bif_static_init): Handle ismmaint.

ok

> ---
>  gcc/config/rs6000/rs6000-builtin-new.def | 145 ++++++++++++-----------
>  gcc/config/rs6000/rs6000-call.c          |  38 +++++-
>  gcc/config/rs6000/rs6000-gen-builtins.c  |  38 +++---
>  3 files changed, 135 insertions(+), 86 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtin-new.def b/gcc/config/rs6000/rs6000-builtin-new.def
> index a8c6b9e988f..1966516551e 100644
> --- a/gcc/config/rs6000/rs6000-builtin-new.def
> +++ b/gcc/config/rs6000/rs6000-builtin-new.def
> @@ -129,6 +129,7 @@
>  ;   mma      Needs special handling for MMA
>  ;   quad     MMA instruction using a register quad as an input operand
>  ;   pair     MMA instruction using a register pair as an input operand
> +;   mmaint   MMA instruction expanding to internal call at GIMPLE time
>  ;   no32bit  Not valid for TARGET_32BIT
>  ;   32bit    Requires different handling for TARGET_32BIT
>  ;   cpu      This is a "cpu_is" or "cpu_supports" builtin
> @@ -3584,415 +3585,421 @@
> 
>  [mma]
>    void __builtin_mma_assemble_acc (v512 *, vuc, vuc, vuc, vuc);
> -    ASSEMBLE_ACC nothing {mma}
> +    ASSEMBLE_ACC nothing {mma,mmaint}
> 
>    v512 __builtin_mma_assemble_acc_internal (vuc, vuc, vuc, vuc);
>      ASSEMBLE_ACC_INTERNAL mma_assemble_acc {mma}
> 
>    void __builtin_mma_assemble_pair (v256 *, vuc, vuc);
> -    ASSEMBLE_PAIR nothing {mma}
> +    ASSEMBLE_PAIR nothing {mma,mmaint}
> 
>    v256 __builtin_mma_assemble_pair_internal (vuc, vuc);
>      ASSEMBLE_PAIR_INTERNAL vsx_assemble_pair {mma}
> 
>    void __builtin_mma_build_acc (v512 *, vuc, vuc, vuc, vuc);
> -    BUILD_ACC nothing {mma}
> +    BUILD_ACC nothing {mma,mmaint}
> 
>    v512 __builtin_mma_build_acc_internal (vuc, vuc, vuc, vuc);
>      BUILD_ACC_INTERNAL mma_assemble_acc {mma}
> 
>    void __builtin_mma_disassemble_acc (void *, v512 *);
> -    DISASSEMBLE_ACC nothing {mma,quad}
> +    DISASSEMBLE_ACC nothing {mma,quad,mmaint}
> 
>    vuc __builtin_mma_disassemble_acc_internal (v512, const int<2>);
>      DISASSEMBLE_ACC_INTERNAL mma_disassemble_acc {mma}
> 
>    void __builtin_mma_disassemble_pair (void *, v256 *);
> -    DISASSEMBLE_PAIR nothing {mma,pair}
> +    DISASSEMBLE_PAIR nothing {mma,pair,mmaint}
> 
>    vuc __builtin_mma_disassemble_pair_internal (v256, const int<2>);
>      DISASSEMBLE_PAIR_INTERNAL vsx_disassemble_pair {mma}
> 
>    void __builtin_mma_pmxvbf16ger2 (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
> -    PMXVBF16GER2 nothing {mma}
> +    PMXVBF16GER2 nothing {mma,mmaint}
> 
>    v512 __builtin_mma_pmxvbf16ger2_internal (vuc, vuc, const int<4>, const int<4>, const int<2>);
>      PMXVBF16GER2_INTERNAL mma_pmxvbf16ger2 {mma}
> 
>    void __builtin_mma_pmxvbf16ger2nn (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
> -    PMXVBF16GER2NN nothing {mma,quad}
> +    PMXVBF16GER2NN nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_pmxvbf16ger2nn_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<2>);
>      PMXVBF16GER2NN_INTERNAL mma_pmxvbf16ger2nn {mma,quad}
> 
>    void __builtin_mma_pmxvbf16ger2np (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
> -    PMXVBF16GER2NP nothing {mma,quad}
> +    PMXVBF16GER2NP nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_pmxvbf16ger2np_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<2>);
>      PMXVBF16GER2NP_INTERNAL mma_pmxvbf16ger2np {mma,quad}
> 
>    void __builtin_mma_pmxvbf16ger2pn (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
> -    PMXVBF16GER2PN nothing {mma,quad}
> +    PMXVBF16GER2PN nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_pmxvbf16ger2pn_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<2>);
>      PMXVBF16GER2PN_INTERNAL mma_pmxvbf16ger2pn {mma,quad}
> 
>    void __builtin_mma_pmxvbf16ger2pp (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
> -    PMXVBF16GER2PP nothing {mma,quad}
> +    PMXVBF16GER2PP nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_pmxvbf16ger2pp_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<2>);
>      PMXVBF16GER2PP_INTERNAL mma_pmxvbf16ger2pp {mma,quad}
> 
>    void __builtin_mma_pmxvf16ger2 (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
> -    PMXVF16GER2 nothing {mma}
> +    PMXVF16GER2 nothing {mma,mmaint}
> 
>    v512 __builtin_mma_pmxvf16ger2_internal (vuc, vuc, const int<4>, const int<4>, const int<2>);
>      PMXVF16GER2_INTERNAL mma_pmxvf16ger2 {mma}
> 
>    void __builtin_mma_pmxvf16ger2nn (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
> -    PMXVF16GER2NN nothing {mma,quad}
> +    PMXVF16GER2NN nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_pmxvf16ger2nn_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<2>);
>      PMXVF16GER2NN_INTERNAL mma_pmxvf16ger2nn {mma,quad}
> 
>    void __builtin_mma_pmxvf16ger2np (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
> -    PMXVF16GER2NP nothing {mma,quad}
> +    PMXVF16GER2NP nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_pmxvf16ger2np_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<2>);
>      PMXVF16GER2NP_INTERNAL mma_pmxvf16ger2np {mma,quad}
> 
>    void __builtin_mma_pmxvf16ger2pn (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
> -    PMXVF16GER2PN nothing {mma,quad}
> +    PMXVF16GER2PN nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_pmxvf16ger2pn_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<2>);
>      PMXVF16GER2PN_INTERNAL mma_pmxvf16ger2pn {mma,quad}
> 
>    void __builtin_mma_pmxvf16ger2pp (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
> -    PMXVF16GER2PP nothing {mma,quad}
> +    PMXVF16GER2PP nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_pmxvf16ger2pp_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<2>);
>      PMXVF16GER2PP_INTERNAL mma_pmxvf16ger2pp {mma,quad}
> 
>    void __builtin_mma_pmxvf32ger (v512 *, vuc, vuc, const int<4>, const int<4>);
> -    PMXVF32GER nothing {mma}
> +    PMXVF32GER nothing {mma,mmaint}
> 
>    v512 __builtin_mma_pmxvf32ger_internal (vuc, vuc, const int<4>, const int<4>);
>      PMXVF32GER_INTERNAL mma_pmxvf32ger {mma}
> 
>    void __builtin_mma_pmxvf32gernn (v512 *, vuc, vuc, const int<4>, const int<4>);
> -    PMXVF32GERNN nothing {mma,quad}
> +    PMXVF32GERNN nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_pmxvf32gernn_internal (v512, vuc, vuc, const int<4>, const int<4>);
>      PMXVF32GERNN_INTERNAL mma_pmxvf32gernn {mma,quad}
> 
>    void __builtin_mma_pmxvf32gernp (v512 *, vuc, vuc, const int<4>, const int<4>);
> -    PMXVF32GERNP nothing {mma,quad}
> +    PMXVF32GERNP nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_pmxvf32gernp_internal (v512, vuc, vuc, const int<4>, const int<4>);
>      PMXVF32GERNP_INTERNAL mma_pmxvf32gernp {mma,quad}
> 
>    void __builtin_mma_pmxvf32gerpn (v512 *, vuc, vuc, const int<4>, const int<4>);
> -    PMXVF32GERPN nothing {mma,quad}
> +    PMXVF32GERPN nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_pmxvf32gerpn_internal (v512, vuc, vuc, const int<4>, const int<4>);
>      PMXVF32GERPN_INTERNAL mma_pmxvf32gerpn {mma,quad}
> 
>    void __builtin_mma_pmxvf32gerpp (v512 *, vuc, vuc, const int<4>, const int<4>);
> -    PMXVF32GERPP nothing {mma,quad}
> +    PMXVF32GERPP nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_pmxvf32gerpp_internal (v512, vuc, vuc, const int<4>, const int<4>);
>      PMXVF32GERPP_INTERNAL mma_pmxvf32gerpp {mma,quad}
> 
>    void __builtin_mma_pmxvf64ger (v512 *, v256, vuc, const int<4>, const int<2>);
> -    PMXVF64GER nothing {mma,pair}
> +    PMXVF64GER nothing {mma,pair,mmaint}
> 
>    v512 __builtin_mma_pmxvf64ger_internal (v256, vuc, const int<4>, const int<2>);
>      PMXVF64GER_INTERNAL mma_pmxvf64ger {mma,pair}
> 
>    void __builtin_mma_pmxvf64gernn (v512 *, v256, vuc, const int<4>, const int<2>);
> -    PMXVF64GERNN nothing {mma,pair,quad}
> +    PMXVF64GERNN nothing {mma,pair,quad,mmaint}
> 
>    v512 __builtin_mma_pmxvf64gernn_internal (v512, v256, vuc, const int<4>, const int<2>);
>      PMXVF64GERNN_INTERNAL mma_pmxvf64gernn {mma,pair,quad}
> 
>    void __builtin_mma_pmxvf64gernp (v512 *, v256, vuc, const int<4>, const int<2>);
> -    PMXVF64GERNP nothing {mma,pair,quad}
> +    PMXVF64GERNP nothing {mma,pair,quad,mmaint}
> 
>    v512 __builtin_mma_pmxvf64gernp_internal (v512, v256, vuc, const int<4>, const int<2>);
>      PMXVF64GERNP_INTERNAL mma_pmxvf64gernp {mma,pair,quad}
> 
>    void __builtin_mma_pmxvf64gerpn (v512 *, v256, vuc, const int<4>, const int<2>);
> -    PMXVF64GERPN nothing {mma,pair,quad}
> +    PMXVF64GERPN nothing {mma,pair,quad,mmaint}
> 
>    v512 __builtin_mma_pmxvf64gerpn_internal (v512, v256, vuc, const int<4>, const int<2>);
>      PMXVF64GERPN_INTERNAL mma_pmxvf64gerpn {mma,pair,quad}
> 
>    void __builtin_mma_pmxvf64gerpp (v512 *, v256, vuc, const int<4>, const int<2>);
> -    PMXVF64GERPP nothing {mma,pair,quad}
> +    PMXVF64GERPP nothing {mma,pair,quad,mmaint}
> 
>    v512 __builtin_mma_pmxvf64gerpp_internal (v512, v256, vuc, const int<4>, const int<2>);
>      PMXVF64GERPP_INTERNAL mma_pmxvf64gerpp {mma,pair,quad}
> 
>    void __builtin_mma_pmxvi16ger2 (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
> -    PMXVI16GER2 nothing {mma}
> +    PMXVI16GER2 nothing {mma,mmaint}
> 
>    v512 __builtin_mma_pmxvi16ger2_internal (vuc, vuc, const int<4>, const int<4>, const int<2>);
>      PMXVI16GER2_INTERNAL mma_pmxvi16ger2 {mma}
> 
>    void __builtin_mma_pmxvi16ger2pp (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
> -    PMXVI16GER2PP nothing {mma,quad}
> +    PMXVI16GER2PP nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_pmxvi16ger2pp_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<2>);
>      PMXVI16GER2PP_INTERNAL mma_pmxvi16ger2pp {mma,quad}
> 
>    void __builtin_mma_pmxvi16ger2s (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
> -    PMXVI16GER2S nothing {mma}
> +    PMXVI16GER2S nothing {mma,mmaint}
> 
>    v512 __builtin_mma_pmxvi16ger2s_internal (vuc, vuc, const int<4>, const int<4>, const int<2>);
>      PMXVI16GER2S_INTERNAL mma_pmxvi16ger2s {mma}
> 
>    void __builtin_mma_pmxvi16ger2spp (v512 *, vuc, vuc, const int<4>, const int<4>, const int<2>);
> -    PMXVI16GER2SPP nothing {mma,quad}
> +    PMXVI16GER2SPP nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_pmxvi16ger2spp_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<2>);
>      PMXVI16GER2SPP_INTERNAL mma_pmxvi16ger2spp {mma,quad}
> 
>    void __builtin_mma_pmxvi4ger8 (v512 *, vuc, vuc, const int<4>, const int<4>, const int<8>);
> -    PMXVI4GER8 nothing {mma}
> +    PMXVI4GER8 nothing {mma,mmaint}
> 
>    v512 __builtin_mma_pmxvi4ger8_internal (vuc, vuc, const int<4>, const int<4>, const int<8>);
>      PMXVI4GER8_INTERNAL mma_pmxvi4ger8 {mma}
> 
>    void __builtin_mma_pmxvi4ger8pp (v512 *, vuc, vuc, const int<4>, const int<4>, const int<4>);
> -    PMXVI4GER8PP nothing {mma,quad}
> +    PMXVI4GER8PP nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_pmxvi4ger8pp_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<4>);
>      PMXVI4GER8PP_INTERNAL mma_pmxvi4ger8pp {mma,quad}
> 
>    void __builtin_mma_pmxvi8ger4 (v512 *, vuc, vuc, const int<4>, const int<4>, const int<4>);
> -    PMXVI8GER4 nothing {mma}
> +    PMXVI8GER4 nothing {mma,mmaint}
> 
>    v512 __builtin_mma_pmxvi8ger4_internal (vuc, vuc, const int<4>, const int<4>, const int<4>);
>      PMXVI8GER4_INTERNAL mma_pmxvi8ger4 {mma}
> 
>    void __builtin_mma_pmxvi8ger4pp (v512 *, vuc, vuc, const int<4>, const int<4>, const int<4>);
> -    PMXVI8GER4PP nothing {mma,quad}
> +    PMXVI8GER4PP nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_pmxvi8ger4pp_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<4>);
>      PMXVI8GER4PP_INTERNAL mma_pmxvi8ger4pp {mma,quad}
> 
>    void __builtin_mma_pmxvi8ger4spp (v512 *, vuc, vuc, const int<4>, const int<4>, const int<4>);
> -    PMXVI8GER4SPP nothing {mma,quad}
> +    PMXVI8GER4SPP nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_pmxvi8ger4spp_internal (v512, vuc, vuc, const int<4>, const int<4>, const int<4>);
>      PMXVI8GER4SPP_INTERNAL mma_pmxvi8ger4spp {mma,quad}
> 
>    void __builtin_mma_xvbf16ger2 (v512 *, vuc, vuc);
> -    XVBF16GER2 nothing {mma}
> +    XVBF16GER2 nothing {mma,mmaint}
> 
>    v512 __builtin_mma_xvbf16ger2_internal (vuc, vuc);
>      XVBF16GER2_INTERNAL mma_xvbf16ger2 {mma}
> 
>    void __builtin_mma_xvbf16ger2nn (v512 *, vuc, vuc);
> -    XVBF16GER2NN nothing {mma,quad}
> +    XVBF16GER2NN nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_xvbf16ger2nn_internal (v512, vuc, vuc);
>      XVBF16GER2NN_INTERNAL mma_xvbf16ger2nn {mma,quad}
> 
>    void __builtin_mma_xvbf16ger2np (v512 *, vuc, vuc);
> -    XVBF16GER2NP nothing {mma,quad}
> +    XVBF16GER2NP nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_xvbf16ger2np_internal (v512, vuc, vuc);
>      XVBF16GER2NP_INTERNAL mma_xvbf16ger2np {mma,quad}
> 
>    void __builtin_mma_xvbf16ger2pn (v512 *, vuc, vuc);
> -    XVBF16GER2PN nothing {mma,quad}
> +    XVBF16GER2PN nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_xvbf16ger2pn_internal (v512, vuc, vuc);
>      XVBF16GER2PN_INTERNAL mma_xvbf16ger2pn {mma,quad}
> 
>    void __builtin_mma_xvbf16ger2pp (v512 *, vuc, vuc);
> -    XVBF16GER2PP nothing {mma,quad}
> +    XVBF16GER2PP nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_xvbf16ger2pp_internal (v512, vuc, vuc);
>      XVBF16GER2PP_INTERNAL mma_xvbf16ger2pp {mma,quad}
> 
>    void __builtin_mma_xvf16ger2 (v512 *, vuc, vuc);
> -    XVF16GER2 nothing {mma}
> +    XVF16GER2 nothing {mma,mmaint}
> 
>    v512 __builtin_mma_xvf16ger2_internal (vuc, vuc);
>      XVF16GER2_INTERNAL mma_xvf16ger2 {mma}
> 
>    void __builtin_mma_xvf16ger2nn (v512 *, vuc, vuc);
> -    XVF16GER2NN nothing {mma,quad}
> +    XVF16GER2NN nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_xvf16ger2nn_internal (v512, vuc, vuc);
>      XVF16GER2NN_INTERNAL mma_xvf16ger2nn {mma,quad}
> 
>    void __builtin_mma_xvf16ger2np (v512 *, vuc, vuc);
> -    XVF16GER2NP nothing {mma,quad}
> +    XVF16GER2NP nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_xvf16ger2np_internal (v512, vuc, vuc);
>      XVF16GER2NP_INTERNAL mma_xvf16ger2np {mma,quad}
> 
>    void __builtin_mma_xvf16ger2pn (v512 *, vuc, vuc);
> -    XVF16GER2PN nothing {mma,quad}
> +    XVF16GER2PN nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_xvf16ger2pn_internal (v512, vuc, vuc);
>      XVF16GER2PN_INTERNAL mma_xvf16ger2pn {mma,quad}
> 
>    void __builtin_mma_xvf16ger2pp (v512 *, vuc, vuc);
> -    XVF16GER2PP nothing {mma,quad}
> +    XVF16GER2PP nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_xvf16ger2pp_internal (v512, vuc, vuc);
>      XVF16GER2PP_INTERNAL mma_xvf16ger2pp {mma,quad}
> 
>    void __builtin_mma_xvf32ger (v512 *, vuc, vuc);
> -    XVF32GER nothing {mma}
> +    XVF32GER nothing {mma,mmaint}
> 
>    v512 __builtin_mma_xvf32ger_internal (vuc, vuc);
>      XVF32GER_INTERNAL mma_xvf32ger {mma}
> 
>    void __builtin_mma_xvf32gernn (v512 *, vuc, vuc);
> -    XVF32GERNN nothing {mma,quad}
> +    XVF32GERNN nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_xvf32gernn_internal (v512, vuc, vuc);
>      XVF32GERNN_INTERNAL mma_xvf32gernn {mma,quad}
> 
>    void __builtin_mma_xvf32gernp (v512 *, vuc, vuc);
> -    XVF32GERNP nothing {mma,quad}
> +    XVF32GERNP nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_xvf32gernp_internal (v512, vuc, vuc);
>      XVF32GERNP_INTERNAL mma_xvf32gernp {mma,quad}
> 
>    void __builtin_mma_xvf32gerpn (v512 *, vuc, vuc);
> -    XVF32GERPN nothing {mma,quad}
> +    XVF32GERPN nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_xvf32gerpn_internal (v512, vuc, vuc);
>      XVF32GERPN_INTERNAL mma_xvf32gerpn {mma,quad}
> 
>    void __builtin_mma_xvf32gerpp (v512 *, vuc, vuc);
> -    XVF32GERPP nothing {mma,quad}
> +    XVF32GERPP nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_xvf32gerpp_internal (v512, vuc, vuc);
>      XVF32GERPP_INTERNAL mma_xvf32gerpp {mma,quad}
> 
>    void __builtin_mma_xvf64ger (v512 *, v256, vuc);
> -    XVF64GER nothing {mma,pair}
> +    XVF64GER nothing {mma,pair,mmaint}
> 
>    v512 __builtin_mma_xvf64ger_internal (v256, vuc);
>      XVF64GER_INTERNAL mma_xvf64ger {mma,pair}
> 
>    void __builtin_mma_xvf64gernn (v512 *, v256, vuc);
> -    XVF64GERNN nothing {mma,pair,quad}
> +    XVF64GERNN nothing {mma,pair,quad,mmaint}
> 
>    v512 __builtin_mma_xvf64gernn_internal (v512, v256, vuc);
>      XVF64GERNN_INTERNAL mma_xvf64gernn {mma,pair,quad}
> 
>    void __builtin_mma_xvf64gernp (v512 *, v256, vuc);
> -    XVF64GERNP nothing {mma,pair,quad}
> +    XVF64GERNP nothing {mma,pair,quad,mmaint}
> 
>    v512 __builtin_mma_xvf64gernp_internal (v512, v256, vuc);
>      XVF64GERNP_INTERNAL mma_xvf64gernp {mma,pair,quad}
> 
>    void __builtin_mma_xvf64gerpn (v512 *, v256, vuc);
> -    XVF64GERPN nothing {mma,pair,quad}
> +    XVF64GERPN nothing {mma,pair,quad,mmaint}
> 
>    v512 __builtin_mma_xvf64gerpn_internal (v512, v256, vuc);
>      XVF64GERPN_INTERNAL mma_xvf64gerpn {mma,pair,quad}
> 
>    void __builtin_mma_xvf64gerpp (v512 *, v256, vuc);
> -    XVF64GERPP nothing {mma,pair,quad}
> +    XVF64GERPP nothing {mma,pair,quad,mmaint}
> 
>    v512 __builtin_mma_xvf64gerpp_internal (v512, v256, vuc);
>      XVF64GERPP_INTERNAL mma_xvf64gerpp {mma,pair,quad}
> 
>    void __builtin_mma_xvi16ger2 (v512 *, vuc, vuc);
> -    XVI16GER2 nothing {mma}
> +    XVI16GER2 nothing {mma,mmaint}
> 
>    v512 __builtin_mma_xvi16ger2_internal (vuc, vuc);
>      XVI16GER2_INTERNAL mma_xvi16ger2 {mma}
> 
>    void __builtin_mma_xvi16ger2pp (v512 *, vuc, vuc);
> -    XVI16GER2PP nothing {mma,quad}
> +    XVI16GER2PP nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_xvi16ger2pp_internal (v512, vuc, vuc);
>      XVI16GER2PP_INTERNAL mma_xvi16ger2pp {mma,quad}
> 
>    void __builtin_mma_xvi16ger2s (v512 *, vuc, vuc);
> -    XVI16GER2S nothing {mma}
> +    XVI16GER2S nothing {mma,mmaint}
> 
>    v512 __builtin_mma_xvi16ger2s_internal (vuc, vuc);
>      XVI16GER2S_INTERNAL mma_xvi16ger2s {mma}
> 
>    void __builtin_mma_xvi16ger2spp (v512 *, vuc, vuc);
> -    XVI16GER2SPP nothing {mma,quad}
> +    XVI16GER2SPP nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_xvi16ger2spp_internal (v512, vuc, vuc);
>      XVI16GER2SPP_INTERNAL mma_xvi16ger2spp {mma,quad}
> 
>    void __builtin_mma_xvi4ger8 (v512 *, vuc, vuc);
> -    XVI4GER8 nothing {mma}
> +    XVI4GER8 nothing {mma,mmaint}
> 
>    v512 __builtin_mma_xvi4ger8_internal (vuc, vuc);
>      XVI4GER8_INTERNAL mma_xvi4ger8 {mma}
> 
>    void __builtin_mma_xvi4ger8pp (v512 *, vuc, vuc);
> -    XVI4GER8PP nothing {mma,quad}
> +    XVI4GER8PP nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_xvi4ger8pp_internal (v512, vuc, vuc);
>      XVI4GER8PP_INTERNAL mma_xvi4ger8pp {mma,quad}
> 
>    void __builtin_mma_xvi8ger4 (v512 *, vuc, vuc);
> -    XVI8GER4 nothing {mma}
> +    XVI8GER4 nothing {mma,mmaint}
> 
>    v512 __builtin_mma_xvi8ger4_internal (vuc, vuc);
>      XVI8GER4_INTERNAL mma_xvi8ger4 {mma}
> 
>    void __builtin_mma_xvi8ger4pp (v512 *, vuc, vuc);
> -    XVI8GER4PP nothing {mma,quad}
> +    XVI8GER4PP nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_xvi8ger4pp_internal (v512, vuc, vuc);
>      XVI8GER4PP_INTERNAL mma_xvi8ger4pp {mma,quad}
> 
>    void __builtin_mma_xvi8ger4spp (v512 *, vuc, vuc);
> -    XVI8GER4SPP nothing {mma,quad}
> +    XVI8GER4SPP nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_xvi8ger4spp_internal (v512, vuc, vuc);
>      XVI8GER4SPP_INTERNAL mma_xvi8ger4spp {mma,quad}
> 
>    void __builtin_mma_xxmfacc (v512 *);
> -    XXMFACC nothing {mma,quad}
> +    XXMFACC nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_xxmfacc_internal (v512);
>      XXMFACC_INTERNAL mma_xxmfacc {mma,quad}
> 
>    void __builtin_mma_xxmtacc (v512 *);
> -    XXMTACC nothing {mma,quad}
> +    XXMTACC nothing {mma,quad,mmaint}
> 
>    v512 __builtin_mma_xxmtacc_internal (v512);
>      XXMTACC_INTERNAL mma_xxmtacc {mma,quad}
> 
>    void __builtin_mma_xxsetaccz (v512 *);
> -    XXSETACCZ nothing {mma}
> +    XXSETACCZ nothing {mma,mmaint}
> 
>    v512 __builtin_mma_xxsetaccz_internal ();
>      XXSETACCZ_INTERNAL mma_xxsetaccz {mma}
> 
>    void __builtin_vsx_assemble_pair (v256 *, vuc, vuc);
> -    ASSEMBLE_PAIR_V nothing {mma}
> +    ASSEMBLE_PAIR_V nothing {mma,mmaint}
> 
>    v256 __builtin_vsx_assemble_pair_internal (vuc, vuc);
>      ASSEMBLE_PAIR_V_INTERNAL vsx_assemble_pair {mma}
> 
>    void __builtin_vsx_build_pair (v256 *, vuc, vuc);
> -    BUILD_PAIR nothing {mma}
> +    BUILD_PAIR nothing {mma,mmaint}
> 
>    v256 __builtin_vsx_build_pair_internal (vuc, vuc);
>      BUILD_PAIR_INTERNAL vsx_assemble_pair {mma}
> 
>    void __builtin_vsx_disassemble_pair (void *, v256 *);
> -    DISASSEMBLE_PAIR_V nothing {mma,pair}
> +    DISASSEMBLE_PAIR_V nothing {mma,pair,mmaint}
> 
>    vuc __builtin_vsx_disassemble_pair_internal (v256, const int<2>);
>      DISASSEMBLE_PAIR_V_INTERNAL vsx_disassemble_pair {mma}
> +
> +  v256 __builtin_vsx_lxvp (unsigned long, const v256 *);
> +    LXVP nothing {mma}
> +
> +  void __builtin_vsx_stxvp (v256, unsigned long, const v256 *);
> +    STXVP nothing {mma,pair}
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index eae4e15df1e..558f06cfd6c 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -13072,8 +13072,10 @@ rs6000_gimple_fold_new_mma_builtin (gimple_stmt_iterator *gsi,
> 
>    /* Each call that can be gimple-expanded has an associated built-in
>       function that it will expand into.  If this one doesn't, we have
> -     already expanded it!  */
> -  if (rs6000_builtin_info_x[fncode].assoc_bif == RS6000_BIF_NONE)
> +     already expanded it!  Exceptions: lxvp and stxvp.  */
> +  if (rs6000_builtin_info_x[fncode].assoc_bif == RS6000_BIF_NONE
> +      && fncode != RS6000_BIF_LXVP
> +      && fncode != RS6000_BIF_STXVP)
>      return false;
> 
>    bifdata *bd = &rs6000_builtin_info_x[fncode];
> @@ -13150,6 +13152,38 @@ rs6000_gimple_fold_new_mma_builtin (gimple_stmt_iterator *gsi,
>        gsi_replace_with_seq (gsi, new_seq, true);
>        return true;
>      }
> +  else if (fncode == RS6000_BIF_LXVP)
> +    {
> +      push_gimplify_context (true);
> +      tree offset = gimple_call_arg (stmt, 0);
> +      tree ptr = gimple_call_arg (stmt, 1);
> +      tree lhs = gimple_call_lhs (stmt);
> +      if (TREE_TYPE (TREE_TYPE (ptr)) != vector_pair_type_node)
> +	ptr = build1 (VIEW_CONVERT_EXPR,
> +		      build_pointer_type (vector_pair_type_node), ptr);
> +      tree mem = build_simple_mem_ref (build2 (POINTER_PLUS_EXPR,
> +					       TREE_TYPE (ptr), ptr, offset));
> +      gimplify_assign (lhs, mem, &new_seq);
> +      pop_gimplify_context (NULL);
> +      gsi_replace_with_seq (gsi, new_seq, true);
> +      return true;
> +    }
> +  else if (fncode == RS6000_BIF_STXVP)
> +    {
> +      push_gimplify_context (true);
> +      tree src = gimple_call_arg (stmt, 0);
> +      tree offset = gimple_call_arg (stmt, 1);
> +      tree ptr = gimple_call_arg (stmt, 2);
> +      if (TREE_TYPE (TREE_TYPE (ptr)) != vector_pair_type_node)
> +	ptr = build1 (VIEW_CONVERT_EXPR,
> +		      build_pointer_type (vector_pair_type_node), ptr);
> +      tree mem = build_simple_mem_ref (build2 (POINTER_PLUS_EXPR,
> +					       TREE_TYPE (ptr), ptr, offset));
> +      gimplify_assign (mem, src, &new_seq);
> +      pop_gimplify_context (NULL);
> +      gsi_replace_with_seq (gsi, new_seq, true);
> +      return true;
> +    }
> 

ok

>    /* Convert this built-in into an internal version that uses pass-by-value
>       arguments.  The internal built-in is found in the assoc_bif field.  */
> diff --git a/gcc/config/rs6000/rs6000-gen-builtins.c b/gcc/config/rs6000/rs6000-gen-builtins.c
> index f65932e1cd5..7f711210aff 100644
> --- a/gcc/config/rs6000/rs6000-gen-builtins.c
> +++ b/gcc/config/rs6000/rs6000-gen-builtins.c
> @@ -84,6 +84,7 @@ along with GCC; see the file COPYING3.  If not see
>       mma      Needs special handling for MMA instructions
>       quad     MMA instruction using a register quad as an input operand
>       pair     MMA instruction using a register pair as an input operand
> +     mmaint   MMA instruction expanding to internal call at GIMPLE time
>       no32bit  Not valid for TARGET_32BIT
>       32bit    Requires different handling for TARGET_32BIT
>       cpu      This is a "cpu_is" or "cpu_supports" builtin
> @@ -369,6 +370,7 @@ struct attrinfo
>    bool ismma;
>    bool isquad;
>    bool ispair;
> +  bool ismmaint;
>    bool isno32bit;
>    bool is32bit;
>    bool iscpu;
> @@ -1363,6 +1365,8 @@ parse_bif_attrs (attrinfo *attrptr)
>  	  attrptr->isquad = 1;
>  	else if (!strcmp (attrname, "pair"))
>  	  attrptr->ispair = 1;
> +	else if (!strcmp (attrname, "mmaint"))
> +	  attrptr->ismmaint = 1;
>  	else if (!strcmp (attrname, "no32bit"))
>  	  attrptr->isno32bit = 1;
>  	else if (!strcmp (attrname, "32bit"))
> @@ -1409,15 +1413,15 @@ parse_bif_attrs (attrinfo *attrptr)
>    (*diag) ("attribute set: init = %d, set = %d, extract = %d, nosoft = %d, "
>  	   "ldvec = %d, stvec = %d, reve = %d, pred = %d, htm = %d, "
>  	   "htmspr = %d, htmcr = %d, mma = %d, quad = %d, pair = %d, "
> -	   "no32bit = %d, 32bit = %d, cpu = %d, ldstmask = %d, lxvrse = %d, "
> -	   "lxvrze = %d, endian = %d.\n",
> +	   "mmaint = %d, no32bit = %d, 32bit = %d, cpu = %d, ldstmask = %d, "
> +	   "lxvrse = %d, lxvrze = %d, endian = %d.\n",
>  	   attrptr->isinit, attrptr->isset, attrptr->isextract,
>  	   attrptr->isnosoft, attrptr->isldvec, attrptr->isstvec,
>  	   attrptr->isreve, attrptr->ispred, attrptr->ishtm, attrptr->ishtmspr,
>  	   attrptr->ishtmcr, attrptr->ismma, attrptr->isquad, attrptr->ispair,
> -	   attrptr->isno32bit, attrptr->is32bit, attrptr->iscpu,
> -	   attrptr->isldstmask, attrptr->islxvrse, attrptr->islxvrze,
> -	   attrptr->isendian);
> +	   attrptr->ismmaint, attrptr->isno32bit, attrptr->is32bit,
> +	   attrptr->iscpu, attrptr->isldstmask, attrptr->islxvrse,
> +	   attrptr->islxvrze, attrptr->isendian);
>  #endif
> 
>    return PC_OK;
> @@ -2223,13 +2227,14 @@ write_decls (void)
>    fprintf (header_file, "#define bif_mma_bit\t\t(0x00000800)\n");
>    fprintf (header_file, "#define bif_quad_bit\t\t(0x00001000)\n");
>    fprintf (header_file, "#define bif_pair_bit\t\t(0x00002000)\n");
> -  fprintf (header_file, "#define bif_no32bit_bit\t\t(0x00004000)\n");
> -  fprintf (header_file, "#define bif_32bit_bit\t\t(0x00008000)\n");
> -  fprintf (header_file, "#define bif_cpu_bit\t\t(0x00010000)\n");
> -  fprintf (header_file, "#define bif_ldstmask_bit\t(0x00020000)\n");
> -  fprintf (header_file, "#define bif_lxvrse_bit\t\t(0x00040000)\n");
> -  fprintf (header_file, "#define bif_lxvrze_bit\t\t(0x00080000)\n");
> -  fprintf (header_file, "#define bif_endian_bit\t\t(0x00100000)\n");
> +  fprintf (header_file, "#define bif_mmaint_bit\t\t(0x00004000)\n");
> +  fprintf (header_file, "#define bif_no32bit_bit\t\t(0x00008000)\n");
> +  fprintf (header_file, "#define bif_32bit_bit\t\t(0x00010000)\n");
> +  fprintf (header_file, "#define bif_cpu_bit\t\t(0x00020000)\n");
> +  fprintf (header_file, "#define bif_ldstmask_bit\t(0x00040000)\n");
> +  fprintf (header_file, "#define bif_lxvrse_bit\t\t(0x00080000)\n");
> +  fprintf (header_file, "#define bif_lxvrze_bit\t\t(0x00100000)\n");
> +  fprintf (header_file, "#define bif_endian_bit\t\t(0x00200000)\n");
>    fprintf (header_file, "\n");

ok

>    fprintf (header_file,
>  	   "#define bif_is_init(x)\t\t((x).bifattrs & bif_init_bit)\n");
> @@ -2259,6 +2264,8 @@ write_decls (void)
>  	   "#define bif_is_quad(x)\t\t((x).bifattrs & bif_quad_bit)\n");
>    fprintf (header_file,
>  	   "#define bif_is_pair(x)\t\t((x).bifattrs & bif_pair_bit)\n");
> +  fprintf (header_file,
> +	   "#define bif_is_mmaint(x)\t\t((x).bifattrs & bif_mmaint_bit)\n");
>    fprintf (header_file,
>  	   "#define bif_is_no32bit(x)\t((x).bifattrs & bif_no32bit_bit)\n");
>    fprintf (header_file,
> @@ -2491,6 +2498,8 @@ write_bif_static_init (void)
>  	fprintf (init_file, " | bif_quad_bit");
>        if (bifp->attrs.ispair)
>  	fprintf (init_file, " | bif_pair_bit");
> +      if (bifp->attrs.ismmaint)
> +	fprintf (init_file, " | bif_mmaint_bit");
>        if (bifp->attrs.isno32bit)
>  	fprintf (init_file, " | bif_no32bit_bit");
>        if (bifp->attrs.is32bit)
> @@ -2537,10 +2546,9 @@ write_bif_static_init (void)
>  		: (bifp->kind == FNK_PURE ? "= pure"
>  		   : (bifp->kind == FNK_FPMATH ? "= fp, const"
>  		      : ""))));
> -      bool no_icode = !strcmp (bifp->patname, "nothing");
>        fprintf (init_file, "      /* assoc_bif */\tRS6000_BIF_%s%s\n",
> -	       bifp->attrs.ismma && no_icode ? bifp->idname : "NONE",
> -	       bifp->attrs.ismma && no_icode ? "_INTERNAL" : "");
> +	       bifp->attrs.ismmaint ? bifp->idname : "NONE",
> +	       bifp->attrs.ismmaint ? "_INTERNAL" : "");

Ok.  (I did look, it does appear there there are other references to
.ismma, so this is simply a different flag, versus a
replacement/rename.).


>        fprintf (init_file, "    },\n");
>      }
>    fprintf (init_file, "  };\n\n");

lgtm,
thanks
-Will





^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 05/18] rs6000: Support for vectorizing built-in functions
  2021-09-01 16:13 ` [PATCH 05/18] rs6000: Support for vectorizing built-in functions Bill Schmidt
@ 2021-09-13 19:29   ` will schmidt
  2021-09-17 12:17   ` Segher Boessenkool
  1 sibling, 0 replies; 52+ messages in thread
From: will schmidt @ 2021-09-13 19:29 UTC (permalink / raw)
  To: Bill Schmidt, gcc-patches; +Cc: dje.gcc, segher

On Wed, 2021-09-01 at 11:13 -0500, Bill Schmidt via Gcc-patches wrote:
> This patch just duplicates a couple of functions and adjusts them to use the
> new builtin names.  There's no logical change otherwise.
> 
> 2021-08-31  Bill Schmidt  <wschmidt@linux.ibm.com>
> 
> gcc/
> 	* config/rs6000/rs6000.c (rs6000-builtins.h): New include.
> 	(rs6000_new_builtin_vectorized_function): New function.
> 	(rs6000_new_builtin_md_vectorized_function): Likewise.
> 	(rs6000_builtin_vectorized_function): Call
> 	rs6000_new_builtin_vectorized_function.
> 	(rs6000_builtin_md_vectorized_function): Call
> 	rs6000_new_builtin_md_vectorized_function.

ok

> ---
>  gcc/config/rs6000/rs6000.c | 253 +++++++++++++++++++++++++++++++++++++
>  1 file changed, 253 insertions(+)
> 
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index b7ea1483da5..52c78c7500c 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -78,6 +78,7 @@
>  #include "case-cfn-macros.h"
>  #include "ppc-auxv.h"
>  #include "rs6000-internal.h"
> +#include "rs6000-builtins.h"
>  #include "opts.h"
> 
>  /* This file should be included last.  */
> @@ -5501,6 +5502,251 @@ rs6000_loop_unroll_adjust (unsigned nunroll, struct loop *loop)
>    return nunroll;
>  }
> 
> +/* Returns a function decl for a vectorized version of the builtin function
> +   with builtin function code FN and the result vector type TYPE, or NULL_TREE
> +   if it is not available.  */
> +
> +static tree
> +rs6000_new_builtin_vectorized_function (unsigned int fn, tree type_out,
> +					tree type_in)
> +{
> +  machine_mode in_mode, out_mode;
> +  int in_n, out_n;
> +
> +  if (TARGET_DEBUG_BUILTIN)
> +    fprintf (stderr, "rs6000_new_builtin_vectorized_function (%s, %s, %s)\n",
> +	     combined_fn_name (combined_fn (fn)),
> +	     GET_MODE_NAME (TYPE_MODE (type_out)),
> +	     GET_MODE_NAME (TYPE_MODE (type_in)));
> +
> +  if (TREE_CODE (type_out) != VECTOR_TYPE
> +      || TREE_CODE (type_in) != VECTOR_TYPE)
> +    return NULL_TREE;
> +
> +  out_mode = TYPE_MODE (TREE_TYPE (type_out));
> +  out_n = TYPE_VECTOR_SUBPARTS (type_out);
> +  in_mode = TYPE_MODE (TREE_TYPE (type_in));
> +  in_n = TYPE_VECTOR_SUBPARTS (type_in);
> +
> +  switch (fn)
> +    {
> +    CASE_CFN_COPYSIGN:
> +      if (VECTOR_UNIT_VSX_P (V2DFmode)
> +	  && out_mode == DFmode && out_n == 2
> +	  && in_mode == DFmode && in_n == 2)
> +	return rs6000_builtin_decls_x[RS6000_BIF_CPSGNDP];
> +      if (VECTOR_UNIT_VSX_P (V4SFmode)
> +	  && out_mode == SFmode && out_n == 4
> +	  && in_mode == SFmode && in_n == 4)
> +	return rs6000_builtin_decls_x[RS6000_BIF_CPSGNSP];
> +      if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
> +	  && out_mode == SFmode && out_n == 4
> +	  && in_mode == SFmode && in_n == 4)
> +	return rs6000_builtin_decls_x[RS6000_BIF_COPYSIGN_V4SF];
> +      break;
> +    CASE_CFN_CEIL:
> +      if (VECTOR_UNIT_VSX_P (V2DFmode)
> +	  && out_mode == DFmode && out_n == 2
> +	  && in_mode == DFmode && in_n == 2)
> +	return rs6000_builtin_decls_x[RS6000_BIF_XVRDPIP];
> +      if (VECTOR_UNIT_VSX_P (V4SFmode)
> +	  && out_mode == SFmode && out_n == 4
> +	  && in_mode == SFmode && in_n == 4)
> +	return rs6000_builtin_decls_x[RS6000_BIF_XVRSPIP];
> +      if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
> +	  && out_mode == SFmode && out_n == 4
> +	  && in_mode == SFmode && in_n == 4)
> +	return rs6000_builtin_decls_x[RS6000_BIF_VRFIP];
> +      break;
> +    CASE_CFN_FLOOR:
> +      if (VECTOR_UNIT_VSX_P (V2DFmode)
> +	  && out_mode == DFmode && out_n == 2
> +	  && in_mode == DFmode && in_n == 2)
> +	return rs6000_builtin_decls_x[RS6000_BIF_XVRDPIM];
> +      if (VECTOR_UNIT_VSX_P (V4SFmode)
> +	  && out_mode == SFmode && out_n == 4
> +	  && in_mode == SFmode && in_n == 4)
> +	return rs6000_builtin_decls_x[RS6000_BIF_XVRSPIM];
> +      if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
> +	  && out_mode == SFmode && out_n == 4
> +	  && in_mode == SFmode && in_n == 4)
> +	return rs6000_builtin_decls_x[RS6000_BIF_VRFIM];
> +      break;
> +    CASE_CFN_FMA:
> +      if (VECTOR_UNIT_VSX_P (V2DFmode)
> +	  && out_mode == DFmode && out_n == 2
> +	  && in_mode == DFmode && in_n == 2)
> +	return rs6000_builtin_decls_x[RS6000_BIF_XVMADDDP];
> +      if (VECTOR_UNIT_VSX_P (V4SFmode)
> +	  && out_mode == SFmode && out_n == 4
> +	  && in_mode == SFmode && in_n == 4)
> +	return rs6000_builtin_decls_x[RS6000_BIF_XVMADDSP];
> +      if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
> +	  && out_mode == SFmode && out_n == 4
> +	  && in_mode == SFmode && in_n == 4)
> +	return rs6000_builtin_decls_x[RS6000_BIF_VMADDFP];
> +      break;
> +    CASE_CFN_TRUNC:
> +      if (VECTOR_UNIT_VSX_P (V2DFmode)
> +	  && out_mode == DFmode && out_n == 2
> +	  && in_mode == DFmode && in_n == 2)
> +	return rs6000_builtin_decls_x[RS6000_BIF_XVRDPIZ];
> +      if (VECTOR_UNIT_VSX_P (V4SFmode)
> +	  && out_mode == SFmode && out_n == 4
> +	  && in_mode == SFmode && in_n == 4)
> +	return rs6000_builtin_decls_x[RS6000_BIF_XVRSPIZ];
> +      if (VECTOR_UNIT_ALTIVEC_P (V4SFmode)
> +	  && out_mode == SFmode && out_n == 4
> +	  && in_mode == SFmode && in_n == 4)
> +	return rs6000_builtin_decls_x[RS6000_BIF_VRFIZ];
> +      break;
> +    CASE_CFN_NEARBYINT:
> +      if (VECTOR_UNIT_VSX_P (V2DFmode)
> +	  && flag_unsafe_math_optimizations
> +	  && out_mode == DFmode && out_n == 2
> +	  && in_mode == DFmode && in_n == 2)
> +	return rs6000_builtin_decls_x[RS6000_BIF_XVRDPI];
> +      if (VECTOR_UNIT_VSX_P (V4SFmode)
> +	  && flag_unsafe_math_optimizations
> +	  && out_mode == SFmode && out_n == 4
> +	  && in_mode == SFmode && in_n == 4)
> +	return rs6000_builtin_decls_x[RS6000_BIF_XVRSPI];
> +      break;
> +    CASE_CFN_RINT:
> +      if (VECTOR_UNIT_VSX_P (V2DFmode)
> +	  && !flag_trapping_math
> +	  && out_mode == DFmode && out_n == 2
> +	  && in_mode == DFmode && in_n == 2)
> +	return rs6000_builtin_decls_x[RS6000_BIF_XVRDPIC];
> +      if (VECTOR_UNIT_VSX_P (V4SFmode)
> +	  && !flag_trapping_math
> +	  && out_mode == SFmode && out_n == 4
> +	  && in_mode == SFmode && in_n == 4)
> +	return rs6000_builtin_decls_x[RS6000_BIF_XVRSPIC];
> +      break;
> +    default:
> +      break;
> +    }
> +
> +  /* Generate calls to libmass if appropriate.  */
> +  if (rs6000_veclib_handler)
> +    return rs6000_veclib_handler (combined_fn (fn), type_out, type_in);
> +
> +  return NULL_TREE;
> +}
> +

ok

> +/* Implement TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION.  */
> +
> +static tree
> +rs6000_new_builtin_md_vectorized_function (tree fndecl, tree type_out,
> +					   tree type_in)
> +{
> +  machine_mode in_mode, out_mode;
> +  int in_n, out_n;
> +
> +  if (TARGET_DEBUG_BUILTIN)
> +    fprintf (stderr,
> +	     "rs6000_new_builtin_md_vectorized_function (%s, %s, %s)\n",
> +	     IDENTIFIER_POINTER (DECL_NAME (fndecl)),
> +	     GET_MODE_NAME (TYPE_MODE (type_out)),
> +	     GET_MODE_NAME (TYPE_MODE (type_in)));
> +
> +  if (TREE_CODE (type_out) != VECTOR_TYPE
> +      || TREE_CODE (type_in) != VECTOR_TYPE)
> +    return NULL_TREE;
> +
> +  out_mode = TYPE_MODE (TREE_TYPE (type_out));
> +  out_n = TYPE_VECTOR_SUBPARTS (type_out);
> +  in_mode = TYPE_MODE (TREE_TYPE (type_in));
> +  in_n = TYPE_VECTOR_SUBPARTS (type_in);
> +
> +  enum rs6000_gen_builtins fn
> +    = (enum rs6000_gen_builtins) DECL_MD_FUNCTION_CODE (fndecl);
> +  switch (fn)
> +    {
> +    case RS6000_BIF_RSQRTF:
> +      if (VECTOR_UNIT_ALTIVEC_OR_VSX_P (V4SFmode)
> +	  && out_mode == SFmode && out_n == 4
> +	  && in_mode == SFmode && in_n == 4)
> +	return rs6000_builtin_decls_x[RS6000_BIF_VRSQRTFP];
> +      break;
> +    case RS6000_BIF_RSQRT:
> +      if (VECTOR_UNIT_VSX_P (V2DFmode)
> +	  && out_mode == DFmode && out_n == 2
> +	  && in_mode == DFmode && in_n == 2)
> +	return rs6000_builtin_decls_x[RS6000_BIF_RSQRT_2DF];
> +      break;
> +    case RS6000_BIF_RECIPF:
> +      if (VECTOR_UNIT_ALTIVEC_OR_VSX_P (V4SFmode)
> +	  && out_mode == SFmode && out_n == 4
> +	  && in_mode == SFmode && in_n == 4)
> +	return rs6000_builtin_decls_x[RS6000_BIF_VRECIPFP];
> +      break;
> +    case RS6000_BIF_RECIP:
> +      if (VECTOR_UNIT_VSX_P (V2DFmode)
> +	  && out_mode == DFmode && out_n == 2
> +	  && in_mode == DFmode && in_n == 2)
> +	return rs6000_builtin_decls_x[RS6000_BIF_RECIP_V2DF];
> +      break;
> +    default:
> +      break;
> +    }
> +
> +  machine_mode in_vmode = TYPE_MODE (type_in);
> +  machine_mode out_vmode = TYPE_MODE (type_out);
> +
> +  /* Power10 supported vectorized built-in functions.  */
> +  if (TARGET_POWER10
> +      && in_vmode == out_vmode
> +      && VECTOR_UNIT_ALTIVEC_OR_VSX_P (in_vmode))
> +    {
> +      machine_mode exp_mode = DImode;
> +      machine_mode exp_vmode = V2DImode;
> +      enum rs6000_gen_builtins bif;
> +      switch (fn)
> +	{
> +	case RS6000_BIF_DIVWE:
> +	case RS6000_BIF_DIVWEU:
> +	  exp_mode = SImode;
> +	  exp_vmode = V4SImode;
> +	  if (fn == RS6000_BIF_DIVWE)
> +	    bif = RS6000_BIF_VDIVESW;
> +	  else
> +	    bif = RS6000_BIF_VDIVEUW;
> +	  break;
> +	case RS6000_BIF_DIVDE:
> +	case RS6000_BIF_DIVDEU:
> +	  if (fn == RS6000_BIF_DIVDE)
> +	    bif = RS6000_BIF_VDIVESD;
> +	  else
> +	    bif = RS6000_BIF_VDIVEUD;
> +	  break;
> +	case RS6000_BIF_CFUGED:
> +	  bif = RS6000_BIF_VCFUGED;
> +	  break;
> +	case RS6000_BIF_CNTLZDM:
> +	  bif = RS6000_BIF_VCLZDM;
> +	  break;
> +	case RS6000_BIF_CNTTZDM:
> +	  bif = RS6000_BIF_VCTZDM;
> +	  break;
> +	case RS6000_BIF_PDEPD:
> +	  bif = RS6000_BIF_VPDEPD;
> +	  break;
> +	case RS6000_BIF_PEXTD:
> +	  bif = RS6000_BIF_VPEXTD;
> +	  break;
> +	default:
> +	  return NULL_TREE;
> +	}
> +
> +      if (in_mode == exp_mode && in_vmode == exp_vmode)
> +	return rs6000_builtin_decls_x[bif];
> +    }
> +
> +  return NULL_TREE;
> +}


ok

> +
>  /* Handler for the Mathematical Acceleration Subsystem (mass) interface to a
>     library with vectorized intrinsics.  */
> 
> @@ -5620,6 +5866,9 @@ rs6000_builtin_vectorized_function (unsigned int fn, tree type_out,
>    machine_mode in_mode, out_mode;
>    int in_n, out_n;
> 
> +  if (new_builtins_are_live)
> +    return rs6000_new_builtin_vectorized_function (fn, type_out, type_in);
> +
>    if (TARGET_DEBUG_BUILTIN)
>      fprintf (stderr, "rs6000_builtin_vectorized_function (%s, %s, %s)\n",
>  	     combined_fn_name (combined_fn (fn)),
> @@ -5751,6 +6000,10 @@ rs6000_builtin_md_vectorized_function (tree fndecl, tree type_out,
>    machine_mode in_mode, out_mode;
>    int in_n, out_n;
> 
> +  if (new_builtins_are_live)
> +    return rs6000_new_builtin_md_vectorized_function (fndecl, type_out,
> +						      type_in);
> +
>    if (TARGET_DEBUG_BUILTIN)
>      fprintf (stderr, "rs6000_builtin_md_vectorized_function (%s, %s, %s)\n",
>  	     IDENTIFIER_POINTER (DECL_NAME (fndecl)),

ok.


lgtm, 
thanks
-Will





^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 01/18] rs6000: Handle overloads during program parsing
  2021-09-01 16:13 ` [PATCH 01/18] rs6000: Handle overloads during program parsing Bill Schmidt
  2021-09-13 17:17   ` will schmidt
@ 2021-09-13 23:53   ` Segher Boessenkool
  1 sibling, 0 replies; 52+ messages in thread
From: Segher Boessenkool @ 2021-09-13 23:53 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: gcc-patches, dje.gcc

On Wed, Sep 01, 2021 at 11:13:37AM -0500, Bill Schmidt wrote:
> Although this patch looks quite large, the changes are fairly minimal.
> Most of it is duplicating the large function that does the overload
> resolution using the automatically generated data structures instead of
> the old hand-generated ones.  This doesn't make the patch terribly easy to
> review, unfortunately.  Just be aware that generally we aren't changing
> the logic and functionality of overload handling.

> 	(altivec_build_new_resolved_builtin): New function.
> 	(altivec_resolve_new_overloaded_builtin): Likewise.

A new function of 973 lines (plus the function comment).  Please factor
that (can be in a later patch, but please do, you know what it all means
and does currently, now is the time :-) ).

> +static bool
> +rs6000_new_builtin_type_compatible (tree t, tree u)

This needs a function comment.  Are t and u used symmetrically at all?

> +{
> +  if (t == error_mark_node)
> +    return false;

(not here)

> +  if (POINTER_TYPE_P (t) && POINTER_TYPE_P (u))
> +    {
> +      t = TREE_TYPE (t);
> +      u = TREE_TYPE (u);
> +      if (TYPE_READONLY (u))
> +	t = build_qualified_type (t, TYPE_QUAL_CONST);
> +    }

Esp. here.  And it still creates junk trees where those are not needed
afaics, and that is not a great idea for functions that are called so
often.

> +static tree
> +altivec_build_new_resolved_builtin (tree *args, int n, tree fntype,
> +				    tree ret_type,
> +				    rs6000_gen_builtins bif_id,
> +				    rs6000_gen_builtins ovld_id)
> +{
> +  tree argtypes = TYPE_ARG_TYPES (fntype);
> +  tree arg_type[MAX_OVLD_ARGS];
> +  tree fndecl = rs6000_builtin_decls_x[bif_id];
> +  tree call;

Don't declare things so far ahead please.  Declare them right before
they are assigned to, ideally.

> +  for (int i = 0; i < n; i++)
> +    arg_type[i] = TREE_VALUE (argtypes), argtypes = TREE_CHAIN (argtypes);

Please do not use comma operators where you could use separate
statements.

> +  /* The AltiVec overloading implementation is overall gross, but this

Ooh you spell "AltiVec" correctly here ;-)

You can do
  for (int j = 0; j < n; j++)
    args[j] = fully_fold_convert (arg_type[j], args[j]);
here and then the rest becomes simpler.

> +  switch (n)
> +    {
> +    case 0:
> +      call = build_call_expr (fndecl, 0);
> +      break;
> +    case 1:
> +      call = build_call_expr (fndecl, 1,
> +			      fully_fold_convert (arg_type[0], args[0]));
> +      break;
> +    case 2:
> +      call = build_call_expr (fndecl, 2,
> +			      fully_fold_convert (arg_type[0], args[0]),
> +			      fully_fold_convert (arg_type[1], args[1]));
> +      break;
> +    case 3:
> +      call = build_call_expr (fndecl, 3,
> +			      fully_fold_convert (arg_type[0], args[0]),
> +			      fully_fold_convert (arg_type[1], args[1]),
> +			      fully_fold_convert (arg_type[2], args[2]));
> +      break;
> +    case 4:
> +      call = build_call_expr (fndecl, 4,
> +			      fully_fold_convert (arg_type[0], args[0]),
> +			      fully_fold_convert (arg_type[1], args[1]),
> +			      fully_fold_convert (arg_type[2], args[2]),
> +			      fully_fold_convert (arg_type[3], args[3]));
> +      break;
> +    default:
> +      gcc_unreachable ();
> +    }
> +  return fold_convert (ret_type, call);
> +}

> +static tree
> +altivec_resolve_new_overloaded_builtin (location_t loc, tree fndecl,
> +					void *passed_arglist)
> +{
> +  vec<tree, va_gc> *arglist = static_cast<vec<tree, va_gc> *> (passed_arglist);
> +  unsigned int nargs = vec_safe_length (arglist);
> +  enum rs6000_gen_builtins fcode
> +    = (enum rs6000_gen_builtins) DECL_MD_FUNCTION_CODE (fndecl);
> +  tree fnargs = TYPE_ARG_TYPES (TREE_TYPE (fndecl));
> +  tree types[MAX_OVLD_ARGS], args[MAX_OVLD_ARGS];

Two separate lines please, they are very different things, and very
important things, too.

> +  unsigned int n;

You use this var first 792 lines later.  Please don't.

Oh well, this will become much better once this is more properly
factored.  Who knows, some of it may become readable / understandable
even!  :-)

> +      arg = (*arglist)[0];
> +      type = TREE_TYPE (arg);
> +      if (!SCALAR_FLOAT_TYPE_P (type)
> +	  && !INTEGRAL_TYPE_P (type))
> +	goto bad;

And all gotos still scream "FACTOR ME".

> +	  case E_TImode:
> +	    type = (unsigned_p ? unsigned_V1TI_type_node : V1TI_type_node);
> +	    size = 1;
> +	    break;

  type = signed_or_unsigned_type_for (unsigned_p, V1TI_type_node);
etc.

> +	arg2 = build_binary_op (loc, BIT_AND_EXPR, arg2,
> +				build_int_cst (TREE_TYPE (arg2),
> +					       TYPE_VECTOR_SUBPARTS (arg1_type)
> +					       - 1), 0);

This needs some temporaries.  Whenever you are clutching the right
margin chances are you should add some extra names for readability.

> +	  if (TYPE_READONLY (TREE_TYPE (type))
> +	      && !TYPE_READONLY (TREE_TYPE (decl_type)))
> +	    warning (0, "passing argument %d of %qE discards qualifiers from "
> +		     "pointer target type", n + 1, fndecl);

It actually only tests the const qualifier.  Is there no utility
function to test all (or at least cv)?

> +	  type = build_pointer_type (build_qualified_type (TREE_TYPE (type),
> +							   0));

No new line needed.

> +	if ((GET_MODE_PRECISION (arg1_mode) > 32)
> +	    || (GET_MODE_PRECISION (arg2_mode) > 32))

Useless extra parens making things harder to read.

> +bool
> +rs6000_new_builtin_is_supported (enum rs6000_gen_builtins fncode)
> +{
> +  switch (rs6000_builtin_info_x[(size_t) fncode].enable)
> +    {
> +    default:
> +      gcc_unreachable ();

default belongs last, not first.

> +    case ENB_ALWAYS:
> +      return true;
> +    case ENB_P5:
> +      return TARGET_POPCNTB;
> +    case ENB_P6:
> +      return TARGET_CMPB;
> +    case ENB_ALTIVEC:
> +      return TARGET_ALTIVEC;
> +    case ENB_CELL:
> +      return TARGET_ALTIVEC && rs6000_cpu == PROCESSOR_CELL;
> +    case ENB_VSX:
> +      return TARGET_VSX;
> +    case ENB_P7:
> +      return TARGET_POPCNTD;
> +    case ENB_P7_64:
> +      return TARGET_POPCNTD && TARGET_POWERPC64;
> +    case ENB_P8:
> +      return TARGET_DIRECT_MOVE;
> +    case ENB_P8V:
> +      return TARGET_P8_VECTOR;
> +    case ENB_P9:
> +      return TARGET_MODULO;
> +    case ENB_P9_64:
> +      return TARGET_MODULO && TARGET_POWERPC64;
> +    case ENB_P9V:
> +      return TARGET_P9_VECTOR;
> +    case ENB_IEEE128_HW:
> +      return TARGET_FLOAT128_HW;
> +    case ENB_DFP:
> +      return TARGET_DFP;
> +    case ENB_CRYPTO:
> +      return TARGET_CRYPTO;
> +    case ENB_HTM:
> +      return TARGET_HTM;
> +    case ENB_P10:
> +      return TARGET_POWER10;
> +    case ENB_P10_64:
> +      return TARGET_POWER10 && TARGET_POWERPC64;
> +    case ENB_MMA:
> +      return TARGET_MMA;
> +    }
> +  gcc_unreachable ();
> +}

Could you put all the CPU ones together (except maybe Cell)?  The really
mean architecture version, and they should be renamed some day perhaps
(the TARGET_ names that is).  It now is some kind of revisionist
historical order :-)

> --- a/gcc/config/rs6000/rs6000-gen-builtins.c
> +++ b/gcc/config/rs6000/rs6000-gen-builtins.c
> @@ -2314,7 +2314,7 @@ write_decls (void)
>  
>    fprintf (header_file, "extern void rs6000_init_generated_builtins ();\n\n");
>    fprintf (header_file,
> -	   "extern bool rs6000_new_builtin_is_supported_p "
> +	   "extern bool rs6000_new_builtin_is_supported "
>  	   "(rs6000_gen_builtins);\n");

This now fits on one line, too :-)


Okay for trunk with the trivial things fixed.  And everythin else needs
to be fixed later still.

Thanks!


Segher

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/18] rs6000: Handle gimple folding of target built-ins
  2021-09-13 18:42   ` will schmidt
@ 2021-09-14 22:36     ` Bill Schmidt
  0 siblings, 0 replies; 52+ messages in thread
From: Bill Schmidt @ 2021-09-14 22:36 UTC (permalink / raw)
  To: will schmidt, gcc-patches; +Cc: dje.gcc, segher

Hi Will,

On 9/13/21 1:42 PM, will schmidt wrote:
> On Wed, 2021-09-01 at 11:13 -0500, Bill Schmidt via Gcc-patches wrote:
>> This is another patch that looks bigger than it really is.  Because we
>> have a new namespace for the builtins, allowing us to have both the old
>> and new builtin infrastructure supported at once, we need versions of
>> these functions that use the new builtin namespace.  Otherwise the code is
>> unchanged.
>>
>> 2021-08-31  Bill Schmidt  <wschmidt@linux.ibm.com>
>>
>> gcc/
>> 	* config/rs6000/rs6000-call.c (rs6000_gimple_fold_new_builtin):
>> 	New forward decl.
>> 	(rs6000_gimple_fold_builtin): Call rs6000_gimple_fold_new_builtin.
>> 	(rs6000_new_builtin_valid_without_lhs): New function.
>> 	(rs6000_gimple_fold_new_mma_builtin): Likewise.
>> 	(rs6000_gimple_fold_new_builtin): Likewise.
>> ---
>>   gcc/config/rs6000/rs6000-call.c | 1165 +++++++++++++++++++++++++++++++
>>   1 file changed, 1165 insertions(+)
>>
>> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
>> index 2c68aa3580c..eae4e15df1e 100644
>> --- a/gcc/config/rs6000/rs6000-call.c
>> +++ b/gcc/config/rs6000/rs6000-call.c
>> @@ -190,6 +190,7 @@ static tree builtin_function_type (machine_mode, machine_mode,
>>   static void rs6000_common_init_builtins (void);
>>   static void htm_init_builtins (void);
>>   static void mma_init_builtins (void);
>> +static bool rs6000_gimple_fold_new_builtin (gimple_stmt_iterator *gsi);
>>
>>
>>   /* Hash table to keep track of the argument types for builtin functions.  */
>> @@ -12024,6 +12025,9 @@ rs6000_gimple_fold_mma_builtin (gimple_stmt_iterator *gsi)
>>   bool
>>   rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>>   {
>> +  if (new_builtins_are_live)
>> +    return rs6000_gimple_fold_new_builtin (gsi);
>> +
>>     gimple *stmt = gsi_stmt (*gsi);
>>     tree fndecl = gimple_call_fndecl (stmt);
>>     gcc_checking_assert (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_MD);
> ok
>
>> @@ -12971,6 +12975,35 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>>     return false;
>>   }
>>
>> +/*  Helper function to sort out which built-ins may be valid without having
>> +    a LHS.  */
>> +static bool
>> +rs6000_new_builtin_valid_without_lhs (enum rs6000_gen_builtins fn_code,
>> +				      tree fndecl)
>> +{
>> +  if (TREE_TYPE (TREE_TYPE (fndecl)) == void_type_node)
>> +    return true;
> Is that a better or improved version of the code as seen in
> rs6000_builtin_valid_without_lhs ?
> That is
>>   if (rs6000_builtin_info[fn_code].attr & RS6000_BTC_VOID)
>>     return true;
> ok either way.
>
It's a required change, because the old attr field has gone away. Good eye.
>> +
>> +  switch (fn_code)
>> +    {
>> +    case RS6000_BIF_STVX_V16QI:
>> +    case RS6000_BIF_STVX_V8HI:
>> +    case RS6000_BIF_STVX_V4SI:
>> +    case RS6000_BIF_STVX_V4SF:
>> +    case RS6000_BIF_STVX_V2DI:
>> +    case RS6000_BIF_STVX_V2DF:
>> +    case RS6000_BIF_STXVW4X_V16QI:
>> +    case RS6000_BIF_STXVW4X_V8HI:
>> +    case RS6000_BIF_STXVW4X_V4SF:
>> +    case RS6000_BIF_STXVW4X_V4SI:
>> +    case RS6000_BIF_STXVD2X_V2DF:
>> +    case RS6000_BIF_STXVD2X_V2DI:
>> +      return true;
>> +    default:
>> +      return false;
>> +    }
>> +}
>> +
>>   /* Check whether a builtin function is supported in this target
>>      configuration.  */
>>   bool
>> @@ -13024,6 +13057,1138 @@ rs6000_new_builtin_is_supported (enum rs6000_gen_builtins fncode)
>>     gcc_unreachable ();
>>   }
>>
>> +/* Expand the MMA built-ins early, so that we can convert the pass-by-reference
>> +   __vector_quad arguments into pass-by-value arguments, leading to more
>> +   efficient code generation.  */
>> +static bool
>> +rs6000_gimple_fold_new_mma_builtin (gimple_stmt_iterator *gsi,
>> +				    rs6000_gen_builtins fn_code)
>> +{
>> +  gimple *stmt = gsi_stmt (*gsi);
>> +  size_t fncode = (size_t) fn_code;
>> +
>> +  if (!bif_is_mma (rs6000_builtin_info_x[fncode]))
>> +    return false;
>> +
>> +  /* Each call that can be gimple-expanded has an associated built-in
>> +     function that it will expand into.  If this one doesn't, we have
>> +     already expanded it!  */
>> +  if (rs6000_builtin_info_x[fncode].assoc_bif == RS6000_BIF_NONE)
>> +    return false;
>> +
>> +  bifdata *bd = &rs6000_builtin_info_x[fncode];
>> +  unsigned nopnds = bd->nargs;
>> +  gimple_seq new_seq = NULL;
>> +  gimple *new_call;
>> +  tree new_decl;
>> +
>> +  /* Compatibility built-ins; we used to call these
>> +     __builtin_mma_{dis,}assemble_pair, but now we call them
>> +     __builtin_vsx_{dis,}assemble_pair.  Handle the old versions.  */
>> +  if (fncode == RS6000_BIF_ASSEMBLE_PAIR)
>> +    fncode = RS6000_BIF_ASSEMBLE_PAIR_V;
>> +  else if (fncode == RS6000_BIF_DISASSEMBLE_PAIR)
>> +    fncode = RS6000_BIF_DISASSEMBLE_PAIR_V;
>> +
>> +  if (fncode == RS6000_BIF_DISASSEMBLE_ACC
>> +      || fncode == RS6000_BIF_DISASSEMBLE_PAIR_V)
>> +    {
>> +      /* This is an MMA disassemble built-in function.  */
>> +      push_gimplify_context (true);
>> +      unsigned nvec = (fncode == RS6000_BIF_DISASSEMBLE_ACC) ? 4 : 2;
>> +      tree dst_ptr = gimple_call_arg (stmt, 0);
>> +      tree src_ptr = gimple_call_arg (stmt, 1);
>> +      tree src_type = TREE_TYPE (src_ptr);
>> +      tree src = create_tmp_reg_or_ssa_name (TREE_TYPE (src_type));
>> +      gimplify_assign (src, build_simple_mem_ref (src_ptr), &new_seq);
>> +
>> +      /* If we are not disassembling an accumulator/pair or our destination is
>> +	 another accumulator/pair, then just copy the entire thing as is.  */
>> +      if ((fncode == RS6000_BIF_DISASSEMBLE_ACC
>> +	   && TREE_TYPE (TREE_TYPE (dst_ptr)) == vector_quad_type_node)
>> +	  || (fncode == RS6000_BIF_DISASSEMBLE_PAIR_V
>> +	      && TREE_TYPE (TREE_TYPE (dst_ptr)) == vector_pair_type_node))
>> +	{
>> +	  tree dst = build_simple_mem_ref (build1 (VIEW_CONVERT_EXPR,
>> +						   src_type, dst_ptr));
>> +	  gimplify_assign (dst, src, &new_seq);
>> +	  pop_gimplify_context (NULL);
>> +	  gsi_replace_with_seq (gsi, new_seq, true);
>> +	  return true;
>> +	}
>> +
>> +      /* If we're disassembling an accumulator into a different type, we need
>> +	 to emit a xxmfacc instruction now, since we cannot do it later.  */
>> +      if (fncode == RS6000_BIF_DISASSEMBLE_ACC)
>> +	{
>> +	  new_decl = rs6000_builtin_decls_x[RS6000_BIF_XXMFACC_INTERNAL];
>> +	  new_call = gimple_build_call (new_decl, 1, src);
>> +	  src = create_tmp_reg_or_ssa_name (vector_quad_type_node);
>> +	  gimple_call_set_lhs (new_call, src);
>> +	  gimple_seq_add_stmt (&new_seq, new_call);
>> +	}
>> +
>> +      /* Copy the accumulator/pair vector by vector.  */
>> +      new_decl
>> +	= rs6000_builtin_decls_x[rs6000_builtin_info_x[fncode].assoc_bif];
>> +      tree dst_type = build_pointer_type_for_mode (unsigned_V16QI_type_node,
>> +						   ptr_mode, true);
>> +      tree dst_base = build1 (VIEW_CONVERT_EXPR, dst_type, dst_ptr);
>> +      for (unsigned i = 0; i < nvec; i++)
>> +	{
>> +	  unsigned index = WORDS_BIG_ENDIAN ? i : nvec - 1 - i;
>> +	  tree dst = build2 (MEM_REF, unsigned_V16QI_type_node, dst_base,
>> +			     build_int_cst (dst_type, index * 16));
>> +	  tree dstssa = create_tmp_reg_or_ssa_name (unsigned_V16QI_type_node);
>> +	  new_call = gimple_build_call (new_decl, 2, src,
>> +					build_int_cstu (uint16_type_node, i));
>> +	  gimple_call_set_lhs (new_call, dstssa);
>> +	  gimple_seq_add_stmt (&new_seq, new_call);
>> +	  gimplify_assign (dst, dstssa, &new_seq);
>> +	}
>> +      pop_gimplify_context (NULL);
>> +      gsi_replace_with_seq (gsi, new_seq, true);
>> +      return true;
>> +    }
>> +
>> +  /* Convert this built-in into an internal version that uses pass-by-value
>> +     arguments.  The internal built-in is found in the assoc_bif field.  */
>> +  new_decl = rs6000_builtin_decls_x[rs6000_builtin_info_x[fncode].assoc_bif];
>> +  tree lhs, op[MAX_MMA_OPERANDS];
>> +  tree acc = gimple_call_arg (stmt, 0);
>> +  push_gimplify_context (true);
>> +
>> +  if (bif_is_quad (*bd))
>> +    {
>> +      /* This built-in has a pass-by-reference accumulator input, so load it
>> +	 into a temporary accumulator for use as a pass-by-value input.  */
>> +      op[0] = create_tmp_reg_or_ssa_name (vector_quad_type_node);
>> +      for (unsigned i = 1; i < nopnds; i++)
>> +	op[i] = gimple_call_arg (stmt, i);
>> +      gimplify_assign (op[0], build_simple_mem_ref (acc), &new_seq);
>> +    }
>> +  else
>> +    {
>> +      /* This built-in does not use its pass-by-reference accumulator argument
>> +	 as an input argument, so remove it from the input list.  */
>> +      nopnds--;
>> +      for (unsigned i = 0; i < nopnds; i++)
>> +	op[i] = gimple_call_arg (stmt, i + 1);
>> +    }
>> +
>> +  switch (nopnds)
>> +    {
>> +    case 0:
>> +      new_call = gimple_build_call (new_decl, 0);
>> +      break;
>> +    case 1:
>> +      new_call = gimple_build_call (new_decl, 1, op[0]);
>> +      break;
>> +    case 2:
>> +      new_call = gimple_build_call (new_decl, 2, op[0], op[1]);
>> +      break;
>> +    case 3:
>> +      new_call = gimple_build_call (new_decl, 3, op[0], op[1], op[2]);
>> +      break;
>> +    case 4:
>> +      new_call = gimple_build_call (new_decl, 4, op[0], op[1], op[2], op[3]);
>> +      break;
>> +    case 5:
>> +      new_call = gimple_build_call (new_decl, 5, op[0], op[1], op[2], op[3],
>> +				    op[4]);
>> +      break;
>> +    case 6:
>> +      new_call = gimple_build_call (new_decl, 6, op[0], op[1], op[2], op[3],
>> +				    op[4], op[5]);
>> +      break;
>> +    case 7:
>> +      new_call = gimple_build_call (new_decl, 7, op[0], op[1], op[2], op[3],
>> +				    op[4], op[5], op[6]);
>> +      break;
>> +    default:
>> +      gcc_unreachable ();
>> +    }
>> +
>> +  if (fncode == RS6000_BIF_BUILD_PAIR || fncode == RS6000_BIF_ASSEMBLE_PAIR_V)
>> +    lhs = create_tmp_reg_or_ssa_name (vector_pair_type_node);
>> +  else
>> +    lhs = create_tmp_reg_or_ssa_name (vector_quad_type_node);
>> +  gimple_call_set_lhs (new_call, lhs);
>> +  gimple_seq_add_stmt (&new_seq, new_call);
>> +  gimplify_assign (build_simple_mem_ref (acc), lhs, &new_seq);
>> +  pop_gimplify_context (NULL);
>> +  gsi_replace_with_seq (gsi, new_seq, true);
>> +
>> +  return true;
>> +}
> ok
>
>> +
>> +/* Fold a machine-dependent built-in in GIMPLE.  (For folding into
>> +   a constant, use rs6000_fold_builtin.)  */
>> +static bool
>> +rs6000_gimple_fold_new_builtin (gimple_stmt_iterator *gsi)
>> +{
>> +  gimple *stmt = gsi_stmt (*gsi);
>> +  tree fndecl = gimple_call_fndecl (stmt);
>> +  gcc_checking_assert (fndecl && DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_MD);
>> +  enum rs6000_gen_builtins fn_code
>> +    = (enum rs6000_gen_builtins) DECL_MD_FUNCTION_CODE (fndecl);
>> +  tree arg0, arg1, lhs, temp;
>> +  enum tree_code bcode;
>> +  gimple *g;
>> +
>> +  size_t uns_fncode = (size_t) fn_code;
>> +  enum insn_code icode = rs6000_builtin_info_x[uns_fncode].icode;
>> +  const char *fn_name1 = rs6000_builtin_info_x[uns_fncode].bifname;
>> +  const char *fn_name2 = (icode != CODE_FOR_nothing)
>> +			  ? get_insn_name ((int) icode)
>> +			  : "nothing";
>> +
>> +  if (TARGET_DEBUG_BUILTIN)
>> +      fprintf (stderr, "rs6000_gimple_fold_new_builtin %d %s %s\n",
>> +	       fn_code, fn_name1, fn_name2);
>> +
>> +  if (!rs6000_fold_gimple)
>> +    return false;
>> +
>> +  /* Prevent gimple folding for code that does not have a LHS, unless it is
>> +     allowed per the rs6000_new_builtin_valid_without_lhs helper function.  */
>> +  if (!gimple_call_lhs (stmt)
>> +      && !rs6000_new_builtin_valid_without_lhs (fn_code, fndecl))
>> +    return false;
>> +
>> +  /* Don't fold invalid builtins, let rs6000_expand_builtin diagnose it.  */
>> +  if (!rs6000_new_builtin_is_supported (fn_code))
>> +    return false;
>> +
>> +  if (rs6000_gimple_fold_new_mma_builtin (gsi, fn_code))
>> +    return true;
>> +
>> +  switch (fn_code)
>> +    {
>> +    /* Flavors of vec_add.  We deliberately don't expand
>> +       RS6000_BIF_VADDUQM as it gets lowered from V1TImode to
>> +       TImode, resulting in much poorer code generation.  */
>> +    case RS6000_BIF_VADDUBM:
>> +    case RS6000_BIF_VADDUHM:
>> +    case RS6000_BIF_VADDUWM:
>> +    case RS6000_BIF_VADDUDM:
>> +    case RS6000_BIF_VADDFP:
>> +    case RS6000_BIF_XVADDDP:
>> +    case RS6000_BIF_XVADDSP:
>> +      bcode = PLUS_EXPR;
>> +    do_binary:
>> +      arg0 = gimple_call_arg (stmt, 0);
>> +      arg1 = gimple_call_arg (stmt, 1);
>> +      lhs = gimple_call_lhs (stmt);
>> +      if (INTEGRAL_TYPE_P (TREE_TYPE (TREE_TYPE (lhs)))
>> +	  && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (TREE_TYPE (lhs))))
>> +	{
>> +	  /* Ensure the binary operation is performed in a type
>> +	     that wraps if it is integral type.  */
>> +	  gimple_seq stmts = NULL;
>> +	  tree type = unsigned_type_for (TREE_TYPE (lhs));
>> +	  tree uarg0 = gimple_build (&stmts, VIEW_CONVERT_EXPR,
>> +				     type, arg0);
>> +	  tree uarg1 = gimple_build (&stmts, VIEW_CONVERT_EXPR,
>> +				     type, arg1);
>> +	  tree res = gimple_build (&stmts, gimple_location (stmt), bcode,
>> +				   type, uarg0, uarg1);
>> +	  gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
>> +	  g = gimple_build_assign (lhs, VIEW_CONVERT_EXPR,
>> +				   build1 (VIEW_CONVERT_EXPR,
>> +					   TREE_TYPE (lhs), res));
>> +	  gsi_replace (gsi, g, true);
>> +	  return true;
>> +	}
>> +      g = gimple_build_assign (lhs, bcode, arg0, arg1);
>> +      gimple_set_location (g, gimple_location (stmt));
>> +      gsi_replace (gsi, g, true);
>> +      return true;
>> +    /* Flavors of vec_sub.  We deliberately don't expand
>> +       P8V_BUILTIN_VSUBUQM. */
>
> Is there a new name for which to reference VSUBUQM in that comment?

Yes!  Good catch.  Will fix that up.
>
>
>> +    case RS6000_BIF_VSUBUBM:
>> +    case RS6000_BIF_VSUBUHM:
>> +    case RS6000_BIF_VSUBUWM:
>> +    case RS6000_BIF_VSUBUDM:
>> +    case RS6000_BIF_VSUBFP:
>> +    case RS6000_BIF_XVSUBDP:
>> +    case RS6000_BIF_XVSUBSP:
>> +      bcode = MINUS_EXPR;
>> +      goto do_binary;
>> +    case RS6000_BIF_XVMULSP:
>> +    case RS6000_BIF_XVMULDP:
>> +      arg0 = gimple_call_arg (stmt, 0);
>> +      arg1 = gimple_call_arg (stmt, 1);
>> +      lhs = gimple_call_lhs (stmt);
>> +      g = gimple_build_assign (lhs, MULT_EXPR, arg0, arg1);
>> +      gimple_set_location (g, gimple_location (stmt));
>> +      gsi_replace (gsi, g, true);
>> +      return true;
>> +    /* Even element flavors of vec_mul (signed). */
>> +    case RS6000_BIF_VMULESB:
>> +    case RS6000_BIF_VMULESH:
>> +    case RS6000_BIF_VMULESW:
>> +    /* Even element flavors of vec_mul (unsigned).  */
>> +    case RS6000_BIF_VMULEUB:
>> +    case RS6000_BIF_VMULEUH:
>> +    case RS6000_BIF_VMULEUW:
>> +      arg0 = gimple_call_arg (stmt, 0);
>> +      arg1 = gimple_call_arg (stmt, 1);
>> +      lhs = gimple_call_lhs (stmt);
>> +      g = gimple_build_assign (lhs, VEC_WIDEN_MULT_EVEN_EXPR, arg0, arg1);
>> +      gimple_set_location (g, gimple_location (stmt));
>> +      gsi_replace (gsi, g, true);
>> +      return true;
>> +    /* Odd element flavors of vec_mul (signed).  */
>> +    case RS6000_BIF_VMULOSB:
>> +    case RS6000_BIF_VMULOSH:
>> +    case RS6000_BIF_VMULOSW:
>> +    /* Odd element flavors of vec_mul (unsigned). */
>> +    case RS6000_BIF_VMULOUB:
>> +    case RS6000_BIF_VMULOUH:
>> +    case RS6000_BIF_VMULOUW:
>> +      arg0 = gimple_call_arg (stmt, 0);
>> +      arg1 = gimple_call_arg (stmt, 1);
>> +      lhs = gimple_call_lhs (stmt);
>> +      g = gimple_build_assign (lhs, VEC_WIDEN_MULT_ODD_EXPR, arg0, arg1);
>> +      gimple_set_location (g, gimple_location (stmt));
>> +      gsi_replace (gsi, g, true);
>> +      return true;
>> +    /* Flavors of vec_div (Integer).  */
>> +    case RS6000_BIF_DIV_V2DI:
>> +    case RS6000_BIF_UDIV_V2DI:
>> +      arg0 = gimple_call_arg (stmt, 0);
>> +      arg1 = gimple_call_arg (stmt, 1);
>> +      lhs = gimple_call_lhs (stmt);
>> +      g = gimple_build_assign (lhs, TRUNC_DIV_EXPR, arg0, arg1);
>> +      gimple_set_location (g, gimple_location (stmt));
>> +      gsi_replace (gsi, g, true);
>> +      return true;
>> +    /* Flavors of vec_div (Float).  */
>> +    case RS6000_BIF_XVDIVSP:
>> +    case RS6000_BIF_XVDIVDP:
>> +      arg0 = gimple_call_arg (stmt, 0);
>> +      arg1 = gimple_call_arg (stmt, 1);
>> +      lhs = gimple_call_lhs (stmt);
>> +      g = gimple_build_assign (lhs, RDIV_EXPR, arg0, arg1);
>> +      gimple_set_location (g, gimple_location (stmt));
>> +      gsi_replace (gsi, g, true);
>> +      return true;
>> +    /* Flavors of vec_and.  */
>> +    case RS6000_BIF_VAND_V16QI_UNS:
>> +    case RS6000_BIF_VAND_V16QI:
>> +    case RS6000_BIF_VAND_V8HI_UNS:
>> +    case RS6000_BIF_VAND_V8HI:
>> +    case RS6000_BIF_VAND_V4SI_UNS:
>> +    case RS6000_BIF_VAND_V4SI:
>> +    case RS6000_BIF_VAND_V2DI_UNS:
>> +    case RS6000_BIF_VAND_V2DI:
>> +    case RS6000_BIF_VAND_V4SF:
>> +    case RS6000_BIF_VAND_V2DF:
>> +      arg0 = gimple_call_arg (stmt, 0);
>> +      arg1 = gimple_call_arg (stmt, 1);
>> +      lhs = gimple_call_lhs (stmt);
>> +      g = gimple_build_assign (lhs, BIT_AND_EXPR, arg0, arg1);
>> +      gimple_set_location (g, gimple_location (stmt));
>> +      gsi_replace (gsi, g, true);
>> +      return true;
>> +    /* Flavors of vec_andc.  */
>> +    case RS6000_BIF_VANDC_V16QI_UNS:
>> +    case RS6000_BIF_VANDC_V16QI:
>> +    case RS6000_BIF_VANDC_V8HI_UNS:
>> +    case RS6000_BIF_VANDC_V8HI:
>> +    case RS6000_BIF_VANDC_V4SI_UNS:
>> +    case RS6000_BIF_VANDC_V4SI:
>> +    case RS6000_BIF_VANDC_V2DI_UNS:
>> +    case RS6000_BIF_VANDC_V2DI:
>> +    case RS6000_BIF_VANDC_V4SF:
>> +    case RS6000_BIF_VANDC_V2DF:
>> +      arg0 = gimple_call_arg (stmt, 0);
>> +      arg1 = gimple_call_arg (stmt, 1);
>> +      lhs = gimple_call_lhs (stmt);
>> +      temp = create_tmp_reg_or_ssa_name (TREE_TYPE (arg1));
>> +      g = gimple_build_assign (temp, BIT_NOT_EXPR, arg1);
>> +      gimple_set_location (g, gimple_location (stmt));
>> +      gsi_insert_before (gsi, g, GSI_SAME_STMT);
>> +      g = gimple_build_assign (lhs, BIT_AND_EXPR, arg0, temp);
>> +      gimple_set_location (g, gimple_location (stmt));
>> +      gsi_replace (gsi, g, true);
>> +      return true;
>> +    /* Flavors of vec_nand.  */
>> +    case RS6000_BIF_NAND_V16QI_UNS:
>> +    case RS6000_BIF_NAND_V16QI:
>> +    case RS6000_BIF_NAND_V8HI_UNS:
>> +    case RS6000_BIF_NAND_V8HI:
>> +    case RS6000_BIF_NAND_V4SI_UNS:
>> +    case RS6000_BIF_NAND_V4SI:
>> +    case RS6000_BIF_NAND_V2DI_UNS:
>> +    case RS6000_BIF_NAND_V2DI:
>> +    case RS6000_BIF_NAND_V4SF:
>> +    case RS6000_BIF_NAND_V2DF:
>> +      arg0 = gimple_call_arg (stmt, 0);
>> +      arg1 = gimple_call_arg (stmt, 1);
>> +      lhs = gimple_call_lhs (stmt);
>> +      temp = create_tmp_reg_or_ssa_name (TREE_TYPE (arg1));
>> +      g = gimple_build_assign (temp, BIT_AND_EXPR, arg0, arg1);
>> +      gimple_set_location (g, gimple_location (stmt));
>> +      gsi_insert_before (gsi, g, GSI_SAME_STMT);
>> +      g = gimple_build_assign (lhs, BIT_NOT_EXPR, temp);
>> +      gimple_set_location (g, gimple_location (stmt));
>> +      gsi_replace (gsi, g, true);
>> +      return true;
>> +    /* Flavors of vec_or.  */
>> +    case RS6000_BIF_VOR_V16QI_UNS:
>> +    case RS6000_BIF_VOR_V16QI:
>> +    case RS6000_BIF_VOR_V8HI_UNS:
>> +    case RS6000_BIF_VOR_V8HI:
>> +    case RS6000_BIF_VOR_V4SI_UNS:
>> +    case RS6000_BIF_VOR_V4SI:
>> +    case RS6000_BIF_VOR_V2DI_UNS:
>> +    case RS6000_BIF_VOR_V2DI:
>> +    case RS6000_BIF_VOR_V4SF:
>> +    case RS6000_BIF_VOR_V2DF:
>> +      arg0 = gimple_call_arg (stmt, 0);
>> +      arg1 = gimple_call_arg (stmt, 1);
>> +      lhs = gimple_call_lhs (stmt);
>> +      g = gimple_build_assign (lhs, BIT_IOR_EXPR, arg0, arg1);
>> +      gimple_set_location (g, gimple_location (stmt));
>> +      gsi_replace (gsi, g, true);
>> +      return true;
>> +    /* flavors of vec_orc.  */
>> +    case RS6000_BIF_ORC_V16QI_UNS:
>> +    case RS6000_BIF_ORC_V16QI:
>> +    case RS6000_BIF_ORC_V8HI_UNS:
>> +    case RS6000_BIF_ORC_V8HI:
>> +    case RS6000_BIF_ORC_V4SI_UNS:
>> +    case RS6000_BIF_ORC_V4SI:
>> +    case RS6000_BIF_ORC_V2DI_UNS:
>> +    case RS6000_BIF_ORC_V2DI:
>> +    case RS6000_BIF_ORC_V4SF:
>> +    case RS6000_BIF_ORC_V2DF:
>> +      arg0 = gimple_call_arg (stmt, 0);
>> +      arg1 = gimple_call_arg (stmt, 1);
>> +      lhs = gimple_call_lhs (stmt);
>> +      temp = create_tmp_reg_or_ssa_name (TREE_TYPE (arg1));
>> +      g = gimple_build_assign (temp, BIT_NOT_EXPR, arg1);
>> +      gimple_set_location (g, gimple_location (stmt));
>> +      gsi_insert_before (gsi, g, GSI_SAME_STMT);
>> +      g = gimple_build_assign (lhs, BIT_IOR_EXPR, arg0, temp);
>> +      gimple_set_location (g, gimple_location (stmt));
>> +      gsi_replace (gsi, g, true);
>> +      return true;
>> +    /* Flavors of vec_xor.  */
>> +    case RS6000_BIF_VXOR_V16QI_UNS:
>> +    case RS6000_BIF_VXOR_V16QI:
>> +    case RS6000_BIF_VXOR_V8HI_UNS:
>> +    case RS6000_BIF_VXOR_V8HI:
>> +    case RS6000_BIF_VXOR_V4SI_UNS:
>> +    case RS6000_BIF_VXOR_V4SI:
>> +    case RS6000_BIF_VXOR_V2DI_UNS:
>> +    case RS6000_BIF_VXOR_V2DI:
>> +    case RS6000_BIF_VXOR_V4SF:
>> +    case RS6000_BIF_VXOR_V2DF:
>> +      arg0 = gimple_call_arg (stmt, 0);
>> +      arg1 = gimple_call_arg (stmt, 1);
>> +      lhs = gimple_call_lhs (stmt);
>> +      g = gimple_build_assign (lhs, BIT_XOR_EXPR, arg0, arg1);
>> +      gimple_set_location (g, gimple_location (stmt));
>> +      gsi_replace (gsi, g, true);
>> +      return true;
>> +    /* Flavors of vec_nor.  */
>> +    case RS6000_BIF_VNOR_V16QI_UNS:
>> +    case RS6000_BIF_VNOR_V16QI:
>> +    case RS6000_BIF_VNOR_V8HI_UNS:
>> +    case RS6000_BIF_VNOR_V8HI:
>> +    case RS6000_BIF_VNOR_V4SI_UNS:
>> +    case RS6000_BIF_VNOR_V4SI:
>> +    case RS6000_BIF_VNOR_V2DI_UNS:
>> +    case RS6000_BIF_VNOR_V2DI:
>> +    case RS6000_BIF_VNOR_V4SF:
>> +    case RS6000_BIF_VNOR_V2DF:
>> +      arg0 = gimple_call_arg (stmt, 0);
>> +      arg1 = gimple_call_arg (stmt, 1);
>> +      lhs = gimple_call_lhs (stmt);
>> +      temp = create_tmp_reg_or_ssa_name (TREE_TYPE (arg1));
>> +      g = gimple_build_assign (temp, BIT_IOR_EXPR, arg0, arg1);
>> +      gimple_set_location (g, gimple_location (stmt));
>> +      gsi_insert_before (gsi, g, GSI_SAME_STMT);
>> +      g = gimple_build_assign (lhs, BIT_NOT_EXPR, temp);
>> +      gimple_set_location (g, gimple_location (stmt));
>> +      gsi_replace (gsi, g, true);
>> +      return true;
>> +    /* flavors of vec_abs.  */
>> +    case RS6000_BIF_ABS_V16QI:
>> +    case RS6000_BIF_ABS_V8HI:
>> +    case RS6000_BIF_ABS_V4SI:
>> +    case RS6000_BIF_ABS_V4SF:
>> +    case RS6000_BIF_ABS_V2DI:
>> +    case RS6000_BIF_XVABSDP:
>> +    case RS6000_BIF_XVABSSP:
>> +      arg0 = gimple_call_arg (stmt, 0);
>> +      if (INTEGRAL_TYPE_P (TREE_TYPE (TREE_TYPE (arg0)))
>> +	  && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (TREE_TYPE (arg0))))
>> +	return false;
>> +      lhs = gimple_call_lhs (stmt);
>> +      g = gimple_build_assign (lhs, ABS_EXPR, arg0);
>> +      gimple_set_location (g, gimple_location (stmt));
>> +      gsi_replace (gsi, g, true);
>> +      return true;
>> +    /* flavors of vec_min.  */
>> +    case RS6000_BIF_XVMINDP:
>> +    case RS6000_BIF_XVMINSP:
>> +    case RS6000_BIF_VMINSD:
>> +    case RS6000_BIF_VMINUD:
>> +    case RS6000_BIF_VMINSB:
>> +    case RS6000_BIF_VMINSH:
>> +    case RS6000_BIF_VMINSW:
>> +    case RS6000_BIF_VMINUB:
>> +    case RS6000_BIF_VMINUH:
>> +    case RS6000_BIF_VMINUW:
>> +    case RS6000_BIF_VMINFP:
>> +      arg0 = gimple_call_arg (stmt, 0);
>> +      arg1 = gimple_call_arg (stmt, 1);
>> +      lhs = gimple_call_lhs (stmt);
>> +      g = gimple_build_assign (lhs, MIN_EXPR, arg0, arg1);
>> +      gimple_set_location (g, gimple_location (stmt));
>> +      gsi_replace (gsi, g, true);
>> +      return true;
>> +    /* flavors of vec_max.  */
>> +    case RS6000_BIF_XVMAXDP:
>> +    case RS6000_BIF_XVMAXSP:
>> +    case RS6000_BIF_VMAXSD:
>> +    case RS6000_BIF_VMAXUD:
>> +    case RS6000_BIF_VMAXSB:
>> +    case RS6000_BIF_VMAXSH:
>> +    case RS6000_BIF_VMAXSW:
>> +    case RS6000_BIF_VMAXUB:
>> +    case RS6000_BIF_VMAXUH:
>> +    case RS6000_BIF_VMAXUW:
>> +    case RS6000_BIF_VMAXFP:
>> +      arg0 = gimple_call_arg (stmt, 0);
>> +      arg1 = gimple_call_arg (stmt, 1);
>> +      lhs = gimple_call_lhs (stmt);
>> +      g = gimple_build_assign (lhs, MAX_EXPR, arg0, arg1);
>> +      gimple_set_location (g, gimple_location (stmt));
>> +      gsi_replace (gsi, g, true);
>> +      return true;
>> +    /* Flavors of vec_eqv.  */
>> +    case RS6000_BIF_EQV_V16QI:
>> +    case RS6000_BIF_EQV_V8HI:
>> +    case RS6000_BIF_EQV_V4SI:
>> +    case RS6000_BIF_EQV_V4SF:
>> +    case RS6000_BIF_EQV_V2DF:
>> +    case RS6000_BIF_EQV_V2DI:
>> +      arg0 = gimple_call_arg (stmt, 0);
>> +      arg1 = gimple_call_arg (stmt, 1);
>> +      lhs = gimple_call_lhs (stmt);
>> +      temp = create_tmp_reg_or_ssa_name (TREE_TYPE (arg1));
>> +      g = gimple_build_assign (temp, BIT_XOR_EXPR, arg0, arg1);
>> +      gimple_set_location (g, gimple_location (stmt));
>> +      gsi_insert_before (gsi, g, GSI_SAME_STMT);
>> +      g = gimple_build_assign (lhs, BIT_NOT_EXPR, temp);
>> +      gimple_set_location (g, gimple_location (stmt));
>> +      gsi_replace (gsi, g, true);
>> +      return true;
>> +    /* Flavors of vec_rotate_left.  */
>> +    case RS6000_BIF_VRLB:
>> +    case RS6000_BIF_VRLH:
>> +    case RS6000_BIF_VRLW:
>> +    case RS6000_BIF_VRLD:
>> +      arg0 = gimple_call_arg (stmt, 0);
>> +      arg1 = gimple_call_arg (stmt, 1);
>> +      lhs = gimple_call_lhs (stmt);
>> +      g = gimple_build_assign (lhs, LROTATE_EXPR, arg0, arg1);
>> +      gimple_set_location (g, gimple_location (stmt));
>> +      gsi_replace (gsi, g, true);
>> +      return true;
>> +  /* Flavors of vector shift right algebraic.
>> +     vec_sra{b,h,w} -> vsra{b,h,w}.  */
>> +    case RS6000_BIF_VSRAB:
>> +    case RS6000_BIF_VSRAH:
>> +    case RS6000_BIF_VSRAW:
>> +    case RS6000_BIF_VSRAD:
>> +      {
>> +	arg0 = gimple_call_arg (stmt, 0);
>> +	arg1 = gimple_call_arg (stmt, 1);
>> +	lhs = gimple_call_lhs (stmt);
>> +	tree arg1_type = TREE_TYPE (arg1);
>> +	tree unsigned_arg1_type = unsigned_type_for (TREE_TYPE (arg1));
>> +	tree unsigned_element_type = unsigned_type_for (TREE_TYPE (arg1_type));
>> +	location_t loc = gimple_location (stmt);
>> +	/* Force arg1 into the range valid matching the arg0 type.  */
>> +	/* Build a vector consisting of the max valid bit-size values.  */
>> +	int n_elts = VECTOR_CST_NELTS (arg1);
>> +	tree element_size = build_int_cst (unsigned_element_type,
>> +					   128 / n_elts);
>> +	tree_vector_builder elts (unsigned_arg1_type, n_elts, 1);
>> +	for (int i = 0; i < n_elts; i++)
>> +	  elts.safe_push (element_size);
>> +	tree modulo_tree = elts.build ();
>> +	/* Modulo the provided shift value against that vector.  */
>> +	gimple_seq stmts = NULL;
>> +	tree unsigned_arg1 = gimple_build (&stmts, VIEW_CONVERT_EXPR,
>> +					   unsigned_arg1_type, arg1);
>> +	tree new_arg1 = gimple_build (&stmts, loc, TRUNC_MOD_EXPR,
>> +				      unsigned_arg1_type, unsigned_arg1,
>> +				      modulo_tree);
>> +	gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
>> +	/* And finally, do the shift.  */
>> +	g = gimple_build_assign (lhs, RSHIFT_EXPR, arg0, new_arg1);
>> +	gimple_set_location (g, loc);
>> +	gsi_replace (gsi, g, true);
>> +	return true;
>> +      }
>> +   /* Flavors of vector shift left.
>> +      builtin_altivec_vsl{b,h,w} -> vsl{b,h,w}.  */
>> +    case RS6000_BIF_VSLB:
>> +    case RS6000_BIF_VSLH:
>> +    case RS6000_BIF_VSLW:
>> +    case RS6000_BIF_VSLD:
>> +      {
>> +	location_t loc;
>> +	gimple_seq stmts = NULL;
>> +	arg0 = gimple_call_arg (stmt, 0);
>> +	tree arg0_type = TREE_TYPE (arg0);
>> +	if (INTEGRAL_TYPE_P (TREE_TYPE (arg0_type))
>> +	    && !TYPE_OVERFLOW_WRAPS (TREE_TYPE (arg0_type)))
>> +	  return false;
>> +	arg1 = gimple_call_arg (stmt, 1);
>> +	tree arg1_type = TREE_TYPE (arg1);
>> +	tree unsigned_arg1_type = unsigned_type_for (TREE_TYPE (arg1));
>> +	tree unsigned_element_type = unsigned_type_for (TREE_TYPE (arg1_type));
>> +	loc = gimple_location (stmt);
>> +	lhs = gimple_call_lhs (stmt);
>> +	/* Force arg1 into the range valid matching the arg0 type.  */
>> +	/* Build a vector consisting of the max valid bit-size values.  */
>> +	int n_elts = VECTOR_CST_NELTS (arg1);
>> +	int tree_size_in_bits = TREE_INT_CST_LOW (size_in_bytes (arg1_type))
>> +				* BITS_PER_UNIT;
>> +	tree element_size = build_int_cst (unsigned_element_type,
>> +					   tree_size_in_bits / n_elts);
>> +	tree_vector_builder elts (unsigned_type_for (arg1_type), n_elts, 1);
>> +	for (int i = 0; i < n_elts; i++)
>> +	  elts.safe_push (element_size);
>> +	tree modulo_tree = elts.build ();
>> +	/* Modulo the provided shift value against that vector.  */
>> +	tree unsigned_arg1 = gimple_build (&stmts, VIEW_CONVERT_EXPR,
>> +					   unsigned_arg1_type, arg1);
>> +	tree new_arg1 = gimple_build (&stmts, loc, TRUNC_MOD_EXPR,
>> +				      unsigned_arg1_type, unsigned_arg1,
>> +				      modulo_tree);
>> +	gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
>> +	/* And finally, do the shift.  */
>> +	g = gimple_build_assign (lhs, LSHIFT_EXPR, arg0, new_arg1);
>> +	gimple_set_location (g, gimple_location (stmt));
>> +	gsi_replace (gsi, g, true);
>> +	return true;
>> +      }
>> +    /* Flavors of vector shift right.  */
>> +    case RS6000_BIF_VSRB:
>> +    case RS6000_BIF_VSRH:
>> +    case RS6000_BIF_VSRW:
>> +    case RS6000_BIF_VSRD:
>> +      {
>> +	arg0 = gimple_call_arg (stmt, 0);
>> +	arg1 = gimple_call_arg (stmt, 1);
>> +	lhs = gimple_call_lhs (stmt);
>> +	tree arg1_type = TREE_TYPE (arg1);
>> +	tree unsigned_arg1_type = unsigned_type_for (TREE_TYPE (arg1));
>> +	tree unsigned_element_type = unsigned_type_for (TREE_TYPE (arg1_type));
>> +	location_t loc = gimple_location (stmt);
>> +	gimple_seq stmts = NULL;
>> +	/* Convert arg0 to unsigned.  */
>> +	tree arg0_unsigned
>> +	  = gimple_build (&stmts, VIEW_CONVERT_EXPR,
>> +			  unsigned_type_for (TREE_TYPE (arg0)), arg0);
>> +	/* Force arg1 into the range valid matching the arg0 type.  */
>> +	/* Build a vector consisting of the max valid bit-size values.  */
>> +	int n_elts = VECTOR_CST_NELTS (arg1);
>> +	tree element_size = build_int_cst (unsigned_element_type,
>> +					   128 / n_elts);
>> +	tree_vector_builder elts (unsigned_arg1_type, n_elts, 1);
>> +	for (int i = 0; i < n_elts; i++)
>> +	  elts.safe_push (element_size);
>> +	tree modulo_tree = elts.build ();
>> +	/* Modulo the provided shift value against that vector.  */
>> +	tree unsigned_arg1 = gimple_build (&stmts, VIEW_CONVERT_EXPR,
>> +					   unsigned_arg1_type, arg1);
>> +	tree new_arg1 = gimple_build (&stmts, loc, TRUNC_MOD_EXPR,
>> +				      unsigned_arg1_type, unsigned_arg1,
>> +				      modulo_tree);
>> +	/* Do the shift.  */
>> +	tree res
>> +	  = gimple_build (&stmts, RSHIFT_EXPR,
>> +			  TREE_TYPE (arg0_unsigned), arg0_unsigned, new_arg1);
>> +	/* Convert result back to the lhs type.  */
>> +	res = gimple_build (&stmts, VIEW_CONVERT_EXPR, TREE_TYPE (lhs), res);
>> +	gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
>> +	replace_call_with_value (gsi, res);
>> +	return true;
>> +      }
>> +    /* Vector loads.  */
>> +    case RS6000_BIF_LVX_V16QI:
>> +    case RS6000_BIF_LVX_V8HI:
>> +    case RS6000_BIF_LVX_V4SI:
>> +    case RS6000_BIF_LVX_V4SF:
>> +    case RS6000_BIF_LVX_V2DI:
>> +    case RS6000_BIF_LVX_V2DF:
>> +    case RS6000_BIF_LVX_V1TI:
>> +      {
>> +	arg0 = gimple_call_arg (stmt, 0);  // offset
>> +	arg1 = gimple_call_arg (stmt, 1);  // address
>> +	lhs = gimple_call_lhs (stmt);
>> +	location_t loc = gimple_location (stmt);
>> +	/* Since arg1 may be cast to a different type, just use ptr_type_node
>> +	   here instead of trying to enforce TBAA on pointer types.  */
>> +	tree arg1_type = ptr_type_node;
>> +	tree lhs_type = TREE_TYPE (lhs);
>> +	/* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  Create
>> +	   the tree using the value from arg0.  The resulting type will match
>> +	   the type of arg1.  */
>> +	gimple_seq stmts = NULL;
>> +	tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg0);
>> +	tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR,
>> +				       arg1_type, arg1, temp_offset);
>> +	/* Mask off any lower bits from the address.  */
>> +	tree aligned_addr = gimple_build (&stmts, loc, BIT_AND_EXPR,
>> +					  arg1_type, temp_addr,
>> +					  build_int_cst (arg1_type, -16));
>> +	gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
>> +	if (!is_gimple_mem_ref_addr (aligned_addr))
>> +	  {
>> +	    tree t = make_ssa_name (TREE_TYPE (aligned_addr));
>> +	    gimple *g = gimple_build_assign (t, aligned_addr);
>> +	    gsi_insert_before (gsi, g, GSI_SAME_STMT);
>> +	    aligned_addr = t;
>> +	  }
>> +	/* Use the build2 helper to set up the mem_ref.  The MEM_REF could also
>> +	   take an offset, but since we've already incorporated the offset
>> +	   above, here we just pass in a zero.  */
>> +	gimple *g
>> +	  = gimple_build_assign (lhs, build2 (MEM_REF, lhs_type, aligned_addr,
>> +					      build_int_cst (arg1_type, 0)));
>> +	gimple_set_location (g, loc);
>> +	gsi_replace (gsi, g, true);
>> +	return true;
>> +      }
>> +    /* Vector stores.  */
>> +    case RS6000_BIF_STVX_V16QI:
>> +    case RS6000_BIF_STVX_V8HI:
>> +    case RS6000_BIF_STVX_V4SI:
>> +    case RS6000_BIF_STVX_V4SF:
>> +    case RS6000_BIF_STVX_V2DI:
>> +    case RS6000_BIF_STVX_V2DF:
>> +      {
>> +	arg0 = gimple_call_arg (stmt, 0); /* Value to be stored.  */
>> +	arg1 = gimple_call_arg (stmt, 1); /* Offset.  */
>> +	tree arg2 = gimple_call_arg (stmt, 2); /* Store-to address.  */
>> +	location_t loc = gimple_location (stmt);
>> +	tree arg0_type = TREE_TYPE (arg0);
>> +	/* Use ptr_type_node (no TBAA) for the arg2_type.
>> +	   FIXME: (Richard)  "A proper fix would be to transition this type as
>> +	   seen from the frontend to GIMPLE, for example in a similar way we
>> +	   do for MEM_REFs by piggy-backing that on an extra argument, a
>> +	   constant zero pointer of the alias pointer type to use (which would
>> +	   also serve as a type indicator of the store itself).  I'd use a
>> +	   target specific internal function for this (not sure if we can have
>> +	   those target specific, but I guess if it's folded away then that's
>> +	   fine) and get away with the overload set."  */
>> +	tree arg2_type = ptr_type_node;
>> +	/* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  Create
>> +	   the tree using the value from arg0.  The resulting type will match
>> +	   the type of arg2.  */
>> +	gimple_seq stmts = NULL;
>> +	tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg1);
>> +	tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR,
>> +				       arg2_type, arg2, temp_offset);
>> +	/* Mask off any lower bits from the address.  */
>> +	tree aligned_addr = gimple_build (&stmts, loc, BIT_AND_EXPR,
>> +					  arg2_type, temp_addr,
>> +					  build_int_cst (arg2_type, -16));
>> +	gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
>> +	if (!is_gimple_mem_ref_addr (aligned_addr))
>> +	  {
>> +	    tree t = make_ssa_name (TREE_TYPE (aligned_addr));
>> +	    gimple *g = gimple_build_assign (t, aligned_addr);
>> +	    gsi_insert_before (gsi, g, GSI_SAME_STMT);
>> +	    aligned_addr = t;
>> +	  }
>> +	/* The desired gimple result should be similar to:
>> +	   MEM[(__vector floatD.1407 *)_1] = vf1D.2697;  */
>> +	gimple *g
>> +	  = gimple_build_assign (build2 (MEM_REF, arg0_type, aligned_addr,
>> +					 build_int_cst (arg2_type, 0)), arg0);
>> +	gimple_set_location (g, loc);
>> +	gsi_replace (gsi, g, true);
>> +	return true;
>> +      }
>> +
>> +    /* unaligned Vector loads.  */
>> +    case RS6000_BIF_LXVW4X_V16QI:
>> +    case RS6000_BIF_LXVW4X_V8HI:
>> +    case RS6000_BIF_LXVW4X_V4SF:
>> +    case RS6000_BIF_LXVW4X_V4SI:
>> +    case RS6000_BIF_LXVD2X_V2DF:
>> +    case RS6000_BIF_LXVD2X_V2DI:
>> +      {
>> +	arg0 = gimple_call_arg (stmt, 0);  // offset
>> +	arg1 = gimple_call_arg (stmt, 1);  // address
>> +	lhs = gimple_call_lhs (stmt);
>> +	location_t loc = gimple_location (stmt);
>> +	/* Since arg1 may be cast to a different type, just use ptr_type_node
>> +	   here instead of trying to enforce TBAA on pointer types.  */
>> +	tree arg1_type = ptr_type_node;
>> +	tree lhs_type = TREE_TYPE (lhs);
>> +	/* In GIMPLE the type of the MEM_REF specifies the alignment.  The
>> +	  required alignment (power) is 4 bytes regardless of data type.  */
>> +	tree align_ltype = build_aligned_type (lhs_type, 4);
>> +	/* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  Create
>> +	   the tree using the value from arg0.  The resulting type will match
>> +	   the type of arg1.  */
>> +	gimple_seq stmts = NULL;
>> +	tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg0);
>> +	tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR,
>> +				       arg1_type, arg1, temp_offset);
>> +	gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
>> +	if (!is_gimple_mem_ref_addr (temp_addr))
>> +	  {
>> +	    tree t = make_ssa_name (TREE_TYPE (temp_addr));
>> +	    gimple *g = gimple_build_assign (t, temp_addr);
>> +	    gsi_insert_before (gsi, g, GSI_SAME_STMT);
>> +	    temp_addr = t;
>> +	  }
>> +	/* Use the build2 helper to set up the mem_ref.  The MEM_REF could also
>> +	   take an offset, but since we've already incorporated the offset
>> +	   above, here we just pass in a zero.  */
>> +	gimple *g;
>> +	g = gimple_build_assign (lhs, build2 (MEM_REF, align_ltype, temp_addr,
>> +					      build_int_cst (arg1_type, 0)));
>> +	gimple_set_location (g, loc);
>> +	gsi_replace (gsi, g, true);
>> +	return true;
>> +      }
>> +
>> +    /* unaligned Vector stores.  */
>> +    case RS6000_BIF_STXVW4X_V16QI:
>> +    case RS6000_BIF_STXVW4X_V8HI:
>> +    case RS6000_BIF_STXVW4X_V4SF:
>> +    case RS6000_BIF_STXVW4X_V4SI:
>> +    case RS6000_BIF_STXVD2X_V2DF:
>> +    case RS6000_BIF_STXVD2X_V2DI:
>> +      {
>> +	arg0 = gimple_call_arg (stmt, 0); /* Value to be stored.  */
>> +	arg1 = gimple_call_arg (stmt, 1); /* Offset.  */
>> +	tree arg2 = gimple_call_arg (stmt, 2); /* Store-to address.  */
>> +	location_t loc = gimple_location (stmt);
>> +	tree arg0_type = TREE_TYPE (arg0);
>> +	/* Use ptr_type_node (no TBAA) for the arg2_type.  */
>> +	tree arg2_type = ptr_type_node;
>> +	/* In GIMPLE the type of the MEM_REF specifies the alignment.  The
>> +	   required alignment (power) is 4 bytes regardless of data type.  */
>> +	tree align_stype = build_aligned_type (arg0_type, 4);
>> +	/* POINTER_PLUS_EXPR wants the offset to be of type 'sizetype'.  Create
>> +	   the tree using the value from arg1.  */
>> +	gimple_seq stmts = NULL;
>> +	tree temp_offset = gimple_convert (&stmts, loc, sizetype, arg1);
>> +	tree temp_addr = gimple_build (&stmts, loc, POINTER_PLUS_EXPR,
>> +				       arg2_type, arg2, temp_offset);
>> +	gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
>> +	if (!is_gimple_mem_ref_addr (temp_addr))
>> +	  {
>> +	    tree t = make_ssa_name (TREE_TYPE (temp_addr));
>> +	    gimple *g = gimple_build_assign (t, temp_addr);
>> +	    gsi_insert_before (gsi, g, GSI_SAME_STMT);
>> +	    temp_addr = t;
>> +	  }
>> +	gimple *g;
>> +	g = gimple_build_assign (build2 (MEM_REF, align_stype, temp_addr,
>> +					 build_int_cst (arg2_type, 0)), arg0);
>> +	gimple_set_location (g, loc);
>> +	gsi_replace (gsi, g, true);
>> +	return true;
>> +      }
>> +
>> +    /* Vector Fused multiply-add (fma).  */
>> +    case RS6000_BIF_VMADDFP:
>> +    case RS6000_BIF_XVMADDDP:
>> +    case RS6000_BIF_XVMADDSP:
> I notice that XVMADDSP was missing in the original.
Right -- something I caught when making these changes.  Similar with the 
unsigned vperm cases later.

Thanks very much for the review!
Bill
>
>> +    case RS6000_BIF_VMLADDUHM:
>> +      {
>> +	arg0 = gimple_call_arg (stmt, 0);
>> +	arg1 = gimple_call_arg (stmt, 1);
>> +	tree arg2 = gimple_call_arg (stmt, 2);
>> +	lhs = gimple_call_lhs (stmt);
>> +	gcall *g = gimple_build_call_internal (IFN_FMA, 3, arg0, arg1, arg2);
>> +	gimple_call_set_lhs (g, lhs);
>> +	gimple_call_set_nothrow (g, true);
>> +	gimple_set_location (g, gimple_location (stmt));
>> +	gsi_replace (gsi, g, true);
>> +	return true;
>> +      }
>> +
>> +    /* Vector compares; EQ, NE, GE, GT, LE.  */
>> +    case RS6000_BIF_VCMPEQUB:
>> +    case RS6000_BIF_VCMPEQUH:
>> +    case RS6000_BIF_VCMPEQUW:
>> +    case RS6000_BIF_VCMPEQUD:
>> +    /* We deliberately omit RS6000_BIF_VCMPEQUT for now, because gimple
>> +       folding produces worse code for 128-bit compares.  */
> ok
>
>> +      fold_compare_helper (gsi, EQ_EXPR, stmt);
>> +      return true;
>> +
>> +    case RS6000_BIF_VCMPNEB:
>> +    case RS6000_BIF_VCMPNEH:
>> +    case RS6000_BIF_VCMPNEW:
>> +    /* We deliberately omit RS6000_BIF_VCMPNET for now, because gimple
>> +       folding produces worse code for 128-bit compares.  */
>> +      fold_compare_helper (gsi, NE_EXPR, stmt);
>> +      return true;
>> +
>> +    case RS6000_BIF_CMPGE_16QI:
>> +    case RS6000_BIF_CMPGE_U16QI:
>> +    case RS6000_BIF_CMPGE_8HI:
>> +    case RS6000_BIF_CMPGE_U8HI:
>> +    case RS6000_BIF_CMPGE_4SI:
>> +    case RS6000_BIF_CMPGE_U4SI:
>> +    case RS6000_BIF_CMPGE_2DI:
>> +    case RS6000_BIF_CMPGE_U2DI:
>> +    /* We deliberately omit RS6000_BIF_CMPGE_1TI and RS6000_BIF_CMPGE_U1TI
>> +       for now, because gimple folding produces worse code for 128-bit
>> +       compares.  */
>> +      fold_compare_helper (gsi, GE_EXPR, stmt);
>> +      return true;
>> +
>> +    case RS6000_BIF_VCMPGTSB:
>> +    case RS6000_BIF_VCMPGTUB:
>> +    case RS6000_BIF_VCMPGTSH:
>> +    case RS6000_BIF_VCMPGTUH:
>> +    case RS6000_BIF_VCMPGTSW:
>> +    case RS6000_BIF_VCMPGTUW:
>> +    case RS6000_BIF_VCMPGTUD:
>> +    case RS6000_BIF_VCMPGTSD:
>> +    /* We deliberately omit RS6000_BIF_VCMPGTUT and RS6000_BIF_VCMPGTST
>> +       for now, because gimple folding produces worse code for 128-bit
>> +       compares.  */
>> +      fold_compare_helper (gsi, GT_EXPR, stmt);
>> +      return true;
>> +
>> +    case RS6000_BIF_CMPLE_16QI:
>> +    case RS6000_BIF_CMPLE_U16QI:
>> +    case RS6000_BIF_CMPLE_8HI:
>> +    case RS6000_BIF_CMPLE_U8HI:
>> +    case RS6000_BIF_CMPLE_4SI:
>> +    case RS6000_BIF_CMPLE_U4SI:
>> +    case RS6000_BIF_CMPLE_2DI:
>> +    case RS6000_BIF_CMPLE_U2DI:
>> +    /* We deliberately omit RS6000_BIF_CMPLE_1TI and RS6000_BIF_CMPLE_U1TI
>> +       for now, because gimple folding produces worse code for 128-bit
>> +       compares.  */
>> +      fold_compare_helper (gsi, LE_EXPR, stmt);
>> +      return true;
>> +
>> +    /* flavors of vec_splat_[us]{8,16,32}.  */
>> +    case RS6000_BIF_VSPLTISB:
>> +    case RS6000_BIF_VSPLTISH:
>> +    case RS6000_BIF_VSPLTISW:
>> +      {
>> +	arg0 = gimple_call_arg (stmt, 0);
>> +	lhs = gimple_call_lhs (stmt);
>> +
>> +	/* Only fold the vec_splat_*() if the lower bits of arg 0 is a
>> +	   5-bit signed constant in range -16 to +15.  */
>> +	if (TREE_CODE (arg0) != INTEGER_CST
>> +	    || !IN_RANGE (TREE_INT_CST_LOW (arg0), -16, 15))
>> +	  return false;
>> +	gimple_seq stmts = NULL;
>> +	location_t loc = gimple_location (stmt);
>> +	tree splat_value = gimple_convert (&stmts, loc,
>> +					   TREE_TYPE (TREE_TYPE (lhs)), arg0);
>> +	gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
>> +	tree splat_tree = build_vector_from_val (TREE_TYPE (lhs), splat_value);
>> +	g = gimple_build_assign (lhs, splat_tree);
>> +	gimple_set_location (g, gimple_location (stmt));
>> +	gsi_replace (gsi, g, true);
>> +	return true;
>> +      }
>> +
>> +    /* Flavors of vec_splat.  */
>> +    /* a = vec_splat (b, 0x3) becomes a = { b[3],b[3],b[3],...};  */
>> +    case RS6000_BIF_VSPLTB:
>> +    case RS6000_BIF_VSPLTH:
>> +    case RS6000_BIF_VSPLTW:
>> +    case RS6000_BIF_XXSPLTD_V2DI:
>> +    case RS6000_BIF_XXSPLTD_V2DF:
>> +      {
>> +	arg0 = gimple_call_arg (stmt, 0); /* input vector.  */
>> +	arg1 = gimple_call_arg (stmt, 1); /* index into arg0.  */
>> +	/* Only fold the vec_splat_*() if arg1 is both a constant value and
>> +	   is a valid index into the arg0 vector.  */
>> +	unsigned int n_elts = VECTOR_CST_NELTS (arg0);
>> +	if (TREE_CODE (arg1) != INTEGER_CST
>> +	    || TREE_INT_CST_LOW (arg1) > (n_elts -1))
>> +	  return false;
>> +	lhs = gimple_call_lhs (stmt);
>> +	tree lhs_type = TREE_TYPE (lhs);
>> +	tree arg0_type = TREE_TYPE (arg0);
>> +	tree splat;
>> +	if (TREE_CODE (arg0) == VECTOR_CST)
>> +	  splat = VECTOR_CST_ELT (arg0, TREE_INT_CST_LOW (arg1));
>> +	else
>> +	  {
>> +	    /* Determine (in bits) the length and start location of the
>> +	       splat value for a call to the tree_vec_extract helper.  */
>> +	    int splat_elem_size = TREE_INT_CST_LOW (size_in_bytes (arg0_type))
>> +				  * BITS_PER_UNIT / n_elts;
>> +	    int splat_start_bit = TREE_INT_CST_LOW (arg1) * splat_elem_size;
>> +	    tree len = build_int_cst (bitsizetype, splat_elem_size);
>> +	    tree start = build_int_cst (bitsizetype, splat_start_bit);
>> +	    splat = tree_vec_extract (gsi, TREE_TYPE (lhs_type), arg0,
>> +				      len, start);
>> +	  }
>> +	/* And finally, build the new vector.  */
>> +	tree splat_tree = build_vector_from_val (lhs_type, splat);
>> +	g = gimple_build_assign (lhs, splat_tree);
>> +	gimple_set_location (g, gimple_location (stmt));
>> +	gsi_replace (gsi, g, true);
>> +	return true;
>> +      }
>> +
>> +    /* vec_mergel (integrals).  */
>> +    case RS6000_BIF_VMRGLH:
>> +    case RS6000_BIF_VMRGLW:
>> +    case RS6000_BIF_XXMRGLW_4SI:
>> +    case RS6000_BIF_VMRGLB:
>> +    case RS6000_BIF_VEC_MERGEL_V2DI:
>> +    case RS6000_BIF_XXMRGLW_4SF:
>> +    case RS6000_BIF_VEC_MERGEL_V2DF:
>> +      fold_mergehl_helper (gsi, stmt, 1);
>> +      return true;
>> +    /* vec_mergeh (integrals).  */
>> +    case RS6000_BIF_VMRGHH:
>> +    case RS6000_BIF_VMRGHW:
>> +    case RS6000_BIF_XXMRGHW_4SI:
>> +    case RS6000_BIF_VMRGHB:
>> +    case RS6000_BIF_VEC_MERGEH_V2DI:
>> +    case RS6000_BIF_XXMRGHW_4SF:
>> +    case RS6000_BIF_VEC_MERGEH_V2DF:
>> +      fold_mergehl_helper (gsi, stmt, 0);
>> +      return true;
>> +
>> +    /* Flavors of vec_mergee.  */
>> +    case RS6000_BIF_VMRGEW_V4SI:
>> +    case RS6000_BIF_VMRGEW_V2DI:
>> +    case RS6000_BIF_VMRGEW_V4SF:
>> +    case RS6000_BIF_VMRGEW_V2DF:
>> +      fold_mergeeo_helper (gsi, stmt, 0);
>> +      return true;
>> +    /* Flavors of vec_mergeo.  */
>> +    case RS6000_BIF_VMRGOW_V4SI:
>> +    case RS6000_BIF_VMRGOW_V2DI:
>> +    case RS6000_BIF_VMRGOW_V4SF:
>> +    case RS6000_BIF_VMRGOW_V2DF:
>> +      fold_mergeeo_helper (gsi, stmt, 1);
>> +      return true;
>> +
>> +    /* d = vec_pack (a, b) */
>> +    case RS6000_BIF_VPKUDUM:
>> +    case RS6000_BIF_VPKUHUM:
>> +    case RS6000_BIF_VPKUWUM:
>> +      {
>> +	arg0 = gimple_call_arg (stmt, 0);
>> +	arg1 = gimple_call_arg (stmt, 1);
>> +	lhs = gimple_call_lhs (stmt);
>> +	gimple *g = gimple_build_assign (lhs, VEC_PACK_TRUNC_EXPR, arg0, arg1);
>> +	gimple_set_location (g, gimple_location (stmt));
>> +	gsi_replace (gsi, g, true);
>> +	return true;
>> +      }
>> +
>> +    /* d = vec_unpackh (a) */
>> +    /* Note that the UNPACK_{HI,LO}_EXPR used in the gimple_build_assign call
>> +       in this code is sensitive to endian-ness, and needs to be inverted to
>> +       handle both LE and BE targets.  */
>> +    case RS6000_BIF_VUPKHSB:
>> +    case RS6000_BIF_VUPKHSH:
>> +    case RS6000_BIF_VUPKHSW:
>> +      {
>> +	arg0 = gimple_call_arg (stmt, 0);
>> +	lhs = gimple_call_lhs (stmt);
>> +	if (BYTES_BIG_ENDIAN)
>> +	  g = gimple_build_assign (lhs, VEC_UNPACK_HI_EXPR, arg0);
>> +	else
>> +	  g = gimple_build_assign (lhs, VEC_UNPACK_LO_EXPR, arg0);
>> +	gimple_set_location (g, gimple_location (stmt));
>> +	gsi_replace (gsi, g, true);
>> +	return true;
>> +      }
>> +    /* d = vec_unpackl (a) */
>> +    case RS6000_BIF_VUPKLSB:
>> +    case RS6000_BIF_VUPKLSH:
>> +    case RS6000_BIF_VUPKLSW:
>> +      {
>> +	arg0 = gimple_call_arg (stmt, 0);
>> +	lhs = gimple_call_lhs (stmt);
>> +	if (BYTES_BIG_ENDIAN)
>> +	  g = gimple_build_assign (lhs, VEC_UNPACK_LO_EXPR, arg0);
>> +	else
>> +	  g = gimple_build_assign (lhs, VEC_UNPACK_HI_EXPR, arg0);
>> +	gimple_set_location (g, gimple_location (stmt));
>> +	gsi_replace (gsi, g, true);
>> +	return true;
>> +      }
>> +    /* There is no gimple type corresponding with pixel, so just return.  */
>> +    case RS6000_BIF_VUPKHPX:
>> +    case RS6000_BIF_VUPKLPX:
>> +      return false;
>> +
>> +    /* vec_perm.  */
>> +    case RS6000_BIF_VPERM_16QI:
>> +    case RS6000_BIF_VPERM_8HI:
>> +    case RS6000_BIF_VPERM_4SI:
>> +    case RS6000_BIF_VPERM_2DI:
>> +    case RS6000_BIF_VPERM_4SF:
>> +    case RS6000_BIF_VPERM_2DF:
>> +    case RS6000_BIF_VPERM_16QI_UNS:
>> +    case RS6000_BIF_VPERM_8HI_UNS:
>> +    case RS6000_BIF_VPERM_4SI_UNS:
>> +    case RS6000_BIF_VPERM_2DI_UNS:
> Noting that the _UNS entries are new with respect to the original code.
> ok.
>
>> +      {
>> +	arg0 = gimple_call_arg (stmt, 0);
>> +	arg1 = gimple_call_arg (stmt, 1);
>> +	tree permute = gimple_call_arg (stmt, 2);
>> +	lhs = gimple_call_lhs (stmt);
>> +	location_t loc = gimple_location (stmt);
>> +	gimple_seq stmts = NULL;
>> +	// convert arg0 and arg1 to match the type of the permute
>> +	// for the VEC_PERM_EXPR operation.
>> +	tree permute_type = (TREE_TYPE (permute));
>> +	tree arg0_ptype = gimple_build (&stmts, loc, VIEW_CONVERT_EXPR,
>> +					permute_type, arg0);
>> +	tree arg1_ptype = gimple_build (&stmts, loc, VIEW_CONVERT_EXPR,
>> +					permute_type, arg1);
>> +	tree lhs_ptype = gimple_build (&stmts, loc, VEC_PERM_EXPR,
>> +				      permute_type, arg0_ptype, arg1_ptype,
>> +				      permute);
>> +	// Convert the result back to the desired lhs type upon completion.
>> +	tree temp = gimple_build (&stmts, loc, VIEW_CONVERT_EXPR,
>> +				  TREE_TYPE (lhs), lhs_ptype);
>> +	gsi_insert_seq_before (gsi, stmts, GSI_SAME_STMT);
>> +	g = gimple_build_assign (lhs, temp);
>> +	gimple_set_location (g, loc);
>> +	gsi_replace (gsi, g, true);
>> +	return true;
>> +      }
>> +
>> +    default:
>> +      if (TARGET_DEBUG_BUILTIN)
>> +	fprintf (stderr, "gimple builtin intrinsic not matched:%d %s %s\n",
>> +		 fn_code, fn_name1, fn_name2);
>> +      break;
>> +    }
>> +
>> +  return false;
>> +}
>
>
> lgtm
> thanks
> -Will
>
>
>> +
>>   /* Expand an expression EXP that calls a built-in function,
>>      with result going to TARGET if that's convenient
>>      (and in mode MODE if that's convenient).
>


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 02/18] rs6000: Move __builtin_mffsl to the [always] stanza
  2021-09-01 16:13 ` [PATCH 02/18] rs6000: Move __builtin_mffsl to the [always] stanza Bill Schmidt
  2021-09-13 17:53   ` will schmidt
@ 2021-09-16 22:52   ` Segher Boessenkool
  1 sibling, 0 replies; 52+ messages in thread
From: Segher Boessenkool @ 2021-09-16 22:52 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: gcc-patches, dje.gcc

On Wed, Sep 01, 2021 at 11:13:38AM -0500, Bill Schmidt wrote:
> I over-restricted use of __builtin_mffsl, since I was unaware that it
> automatically uses mffs when mffsl is not available.  Paul Clarke pointed
> this out in discussion of his SSE 4.1 compatibility patches.

Right.  Do we need to document this better?  There are more builtins
that can generate code for older archs than you might expect (like,
set_fpscr_rn).

Hrm, it *is* documented, but in a big wall of text.  Not sure we can do
much better though, there simply are this many builtins, but maybe you
have an idea how to arrange things better?

Anyway: okay for trunk.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 03/18] rs6000: Handle gimple folding of target built-ins
  2021-09-01 16:13 ` [PATCH 03/18] rs6000: Handle gimple folding of target built-ins Bill Schmidt
  2021-09-13 18:42   ` will schmidt
@ 2021-09-16 22:58   ` Segher Boessenkool
  1 sibling, 0 replies; 52+ messages in thread
From: Segher Boessenkool @ 2021-09-16 22:58 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: gcc-patches, dje.gcc

On Wed, Sep 01, 2021 at 11:13:39AM -0500, Bill Schmidt wrote:
> This is another patch that looks bigger than it really is.  Because we
> have a new namespace for the builtins, allowing us to have both the old
> and new builtin infrastructure supported at once, we need versions of
> these functions that use the new builtin namespace.  Otherwise the code is
> unchanged.

I'll just blindly approve it, given that Will has slogged through it all
already :-)

> 	* config/rs6000/rs6000-call.c (rs6000_gimple_fold_new_builtin):
> 	New forward decl.

Changelog margins are at 80 chars, not 72.

Okay for trunk (w/ the fixes from Will's review).  Thanks!


Segher

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 04/18] rs6000: Handle some recent MMA builtin changes
  2021-09-01 16:13 ` [PATCH 04/18] rs6000: Handle some recent MMA builtin changes Bill Schmidt
  2021-09-13 19:02   ` will schmidt
@ 2021-09-16 23:38   ` Segher Boessenkool
  2021-09-17 15:14     ` Bill Schmidt
  1 sibling, 1 reply; 52+ messages in thread
From: Segher Boessenkool @ 2021-09-16 23:38 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: gcc-patches, dje.gcc

Hi!

On Wed, Sep 01, 2021 at 11:13:40AM -0500, Bill Schmidt wrote:
> Peter Bergner recently added two new builtins __builtin_vsx_lxvp and
> __builtin_vsx_stxvp.  These happened to break a pattern in MMA builtins that
> I had been using to automate gimple folding of MMA builtins.  Previously,
> every MMA function that could be folded had an associated internal function
> that it was folded into.  The LXVP/STXVP builtins are just folded directly
> into memory operations.
> 
> Instead of relying on this pattern, this patch adds a new attribute to
> builtins called "mmaint," which is set for all MMA builtins that have an
> associated internal builtin.  The naming convention that adds _INTERNAL to
> the builtin index name remains.
> 
> The rest of the patch is just duplicating Peter's patch, using the new
> builtin infrastructure.

> 	* config/rs6000/rs6000-call.c
> 	(rs6000_gimple_fold_new_mma_builtin): Handle RS6000_BIF_LXVP and
> 	RS6000_BIF_STXVP.

It is fine to end a changelog line in a colon.

> +  else if (fncode == RS6000_BIF_LXVP)
> +    {
> +      push_gimplify_context (true);
> +      tree offset = gimple_call_arg (stmt, 0);
> +      tree ptr = gimple_call_arg (stmt, 1);
> +      tree lhs = gimple_call_lhs (stmt);
> +      if (TREE_TYPE (TREE_TYPE (ptr)) != vector_pair_type_node)
> +	ptr = build1 (VIEW_CONVERT_EXPR,
> +		      build_pointer_type (vector_pair_type_node), ptr);
> +      tree mem = build_simple_mem_ref (build2 (POINTER_PLUS_EXPR,
> +					       TREE_TYPE (ptr), ptr, offset));
> +      gimplify_assign (lhs, mem, &new_seq);
> +      pop_gimplify_context (NULL);
> +      gsi_replace_with_seq (gsi, new_seq, true);
> +      return true;
> +    }

Fwiw, all those cases return, so those "else" are not needed.  Also it
would be nice if this could be factored a bit better, hrm.

Is that "if" in there useful?  Maybe add a helper function for it, then?

Anyway: okay for trunk.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 05/18] rs6000: Support for vectorizing built-in functions
  2021-09-01 16:13 ` [PATCH 05/18] rs6000: Support for vectorizing built-in functions Bill Schmidt
  2021-09-13 19:29   ` will schmidt
@ 2021-09-17 12:17   ` Segher Boessenkool
  1 sibling, 0 replies; 52+ messages in thread
From: Segher Boessenkool @ 2021-09-17 12:17 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: gcc-patches, dje.gcc

On Wed, Sep 01, 2021 at 11:13:41AM -0500, Bill Schmidt wrote:
> This patch just duplicates a couple of functions and adjusts them to use the
> new builtin names.  There's no logical change otherwise.

> +/* Returns a function decl for a vectorized version of the builtin function
> +   with builtin function code FN and the result vector type TYPE, or NULL_TREE
> +   if it is not available.  */
> +
> +static tree
> +rs6000_new_builtin_vectorized_function (unsigned int fn, tree type_out,
> +					tree type_in)
> +{
> +  machine_mode in_mode, out_mode;
> +  int in_n, out_n;
> +
> +  if (TARGET_DEBUG_BUILTIN)
> +    fprintf (stderr, "rs6000_new_builtin_vectorized_function (%s, %s, %s)\n",
> +	     combined_fn_name (combined_fn (fn)),
> +	     GET_MODE_NAME (TYPE_MODE (type_out)),
> +	     GET_MODE_NAME (TYPE_MODE (type_in)));
> +
> +  if (TREE_CODE (type_out) != VECTOR_TYPE
> +      || TREE_CODE (type_in) != VECTOR_TYPE)
> +    return NULL_TREE;

This is not described in the function comment.  Should it?  Should this
be here at all, should it be an assert instead?

It also should say it implements the
TARGET_VECTORIZE_BUILTIN_VECTORIZED_FUNCTION macro?

> +static tree
> +rs6000_new_builtin_md_vectorized_function (tree fndecl, tree type_out,
> +					   tree type_in)
> +{
> +  machine_mode in_mode, out_mode;
> +  int in_n, out_n;
> +
> +  if (TARGET_DEBUG_BUILTIN)
> +    fprintf (stderr,
> +	     "rs6000_new_builtin_md_vectorized_function (%s, %s, %s)\n",
> +	     IDENTIFIER_POINTER (DECL_NAME (fndecl)),
> +	     GET_MODE_NAME (TYPE_MODE (type_out)),
> +	     GET_MODE_NAME (TYPE_MODE (type_in)));
> +
> +  if (TREE_CODE (type_out) != VECTOR_TYPE
> +      || TREE_CODE (type_in) != VECTOR_TYPE)
> +    return NULL_TREE;

Here it definitely should be an assert, the documentation of this hook
says so.

Other than that this is fine of course (or not worse than what there
already was, anyway ;-) ).  So put this on the big "one day we will
clean this up" list?

Okay for trunk.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 04/18] rs6000: Handle some recent MMA builtin changes
  2021-09-16 23:38   ` Segher Boessenkool
@ 2021-09-17 15:14     ` Bill Schmidt
  0 siblings, 0 replies; 52+ messages in thread
From: Bill Schmidt @ 2021-09-17 15:14 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, dje.gcc

Thanks!  I'll remove the elses in the committed patch, along with a TODO 
comment for the additional factoring opportunity for when I get to that 
stage.

Thanks for all the reviews!
Bill

On 9/16/21 6:38 PM, Segher Boessenkool wrote:
> Hi!
>
> On Wed, Sep 01, 2021 at 11:13:40AM -0500, Bill Schmidt wrote:
>> Peter Bergner recently added two new builtins __builtin_vsx_lxvp and
>> __builtin_vsx_stxvp.  These happened to break a pattern in MMA builtins that
>> I had been using to automate gimple folding of MMA builtins.  Previously,
>> every MMA function that could be folded had an associated internal function
>> that it was folded into.  The LXVP/STXVP builtins are just folded directly
>> into memory operations.
>>
>> Instead of relying on this pattern, this patch adds a new attribute to
>> builtins called "mmaint," which is set for all MMA builtins that have an
>> associated internal builtin.  The naming convention that adds _INTERNAL to
>> the builtin index name remains.
>>
>> The rest of the patch is just duplicating Peter's patch, using the new
>> builtin infrastructure.
>> 	* config/rs6000/rs6000-call.c
>> 	(rs6000_gimple_fold_new_mma_builtin): Handle RS6000_BIF_LXVP and
>> 	RS6000_BIF_STXVP.
> It is fine to end a changelog line in a colon.
>
>> +  else if (fncode == RS6000_BIF_LXVP)
>> +    {
>> +      push_gimplify_context (true);
>> +      tree offset = gimple_call_arg (stmt, 0);
>> +      tree ptr = gimple_call_arg (stmt, 1);
>> +      tree lhs = gimple_call_lhs (stmt);
>> +      if (TREE_TYPE (TREE_TYPE (ptr)) != vector_pair_type_node)
>> +	ptr = build1 (VIEW_CONVERT_EXPR,
>> +		      build_pointer_type (vector_pair_type_node), ptr);
>> +      tree mem = build_simple_mem_ref (build2 (POINTER_PLUS_EXPR,
>> +					       TREE_TYPE (ptr), ptr, offset));
>> +      gimplify_assign (lhs, mem, &new_seq);
>> +      pop_gimplify_context (NULL);
>> +      gsi_replace_with_seq (gsi, new_seq, true);
>> +      return true;
>> +    }
> Fwiw, all those cases return, so those "else" are not needed.  Also it
> would be nice if this could be factored a bit better, hrm.
>
> Is that "if" in there useful?  Maybe add a helper function for it, then?
>
> Anyway: okay for trunk.  Thanks!
>
>
> Segher


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 06/18] rs6000: Builtin expansion, part 1
  2021-09-01 16:13 ` [PATCH 06/18] rs6000: Builtin expansion, part 1 Bill Schmidt
@ 2021-10-31  3:24   ` Segher Boessenkool
  0 siblings, 0 replies; 52+ messages in thread
From: Segher Boessenkool @ 2021-10-31  3:24 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: gcc-patches, dje.gcc

Hi!

On Wed, Sep 01, 2021 at 11:13:42AM -0500, Bill Schmidt wrote:
> Differences between the old and new support in this patch include:
>  - Make use of the new builtin data structures, directly looking up
>    a function's information rather than searching for the function
>    multiple times;

Is that measurable, do you think?

>  - Test for enablement of builtins at expand time, to support #pragma
>    target changes within a compilation unit;

But not within a function, right?

> Note that these six patches must be pushed together, because otherwise
> unused parameter warnings in the stub functions will prevent bootstrap.

Must be *committed* as one commit, even.  Committing them as six
separate commits and pushing them all at once will do nothing for people
who try to bisect, etc.

Merging patches is easy with Git though :-)

> +/* Expand ALTIVEC_BUILTIN_MASK_FOR_LOAD.  */
> +rtx
> +rs6000_expand_ldst_mask (rtx target, tree arg0)
> + {
> +  return target;
> + }

Interesting leading spaces, heh.  Please fix.

> +/* Expand the HTM builtin in EXP and store the result in TARGET.  */
> +static rtx
> +new_htm_expand_builtin (bifdata *bifaddr, rs6000_gen_builtins fcode,
> +			tree exp, rtx target)
> +{
> +  return const0_rtx;
> +}

The function commment should say what the return value is.

> +/* Expand an expression EXP that calls a built-in function,
> +   with result going to TARGET if that's convenient
> +   (and in mode MODE if that's convenient).
> +   SUBTARGET may be used as the target for computing one of EXP's operands.
> +   IGNORE is nonzero if the value is to be ignored.
> +   Use the new builtin infrastructure.  */
> +static rtx
> +rs6000_expand_new_builtin (tree exp, rtx target,
> +			   rtx subtarget ATTRIBUTE_UNUSED,
> +			   machine_mode ignore_mode ATTRIBUTE_UNUSED,
> +			   int ignore ATTRIBUTE_UNUSED)

Don't use ATTRIBUTE_UNUSED?  We have C++ now, you can leave out the
parameter name, with the same effect (other than it does not make you go
blind ;-) ).  In the case where you still want to show the name, you can
do something like
rs6000_expand_new_builtin (tree exp, trx target, rtx /*subtarget*/,
			    machine_mode, int /*ignore*/)

(There is no argument MODE btw, the comment needs some tweaking).

> +  /* We have two different modes (KFmode, TFmode) that are the IEEE
> +     128-bit floating point type, depending on whether long double is the
> +     IBM extended double (KFmode) or long double is IEEE 128-bit (TFmode).

KFmode *always* is IEEE QP.  TFmode is the one that can be different.

> +     It is simpler if we only define one variant of the built-in function,
> +     and switch the code when defining it, rather than defining two built-
> +     ins and using the overload table in rs6000-c.c to switch between the
> +     two.  If we don't have the proper assembler, don't do this switch
> +     because CODE_FOR_*kf* and CODE_FOR_*tf* will be CODE_FOR_nothing.  */
> +  if (FLOAT128_IEEE_P (TFmode))
> +    switch (icode)
> +      {
> +      default:
> +	break;

default: goes at the *end*.  And you can usually leave it out.

> +      case CODE_FOR_sqrtkf2_odd:
> +	icode = CODE_FOR_sqrttf2_odd;
> +	break;

So please do this the other way?  In libgcc "tf" means double-double,
it is historical.  So let's do the clearer thing please: translate tf to
kf in this handling (when tf *does* mean kf ;-) )

> +  /* In case of "#pragma target" changes, we initialize all builtins
> +     but check for actual availability now, during expand time.  For
> +     invalid builtins, generate a normal call.  */
> +  bifdata *bifaddr = &rs6000_builtin_info_x[uns_fcode];
> +  bif_enable e = bifaddr->enable;
> +
> +  if (e != ENB_ALWAYS
> +      && (e != ENB_P5       || !TARGET_POPCNTB)

  if (!(e == ENB_ALWAYS
	|| (e == ENB_P5 && TARGET_POPCNTB)

etc.

Computers are better at De Morgan than humans are.  It is much more
important to write clear code.  This often means using fewer negations.

> +  const int MAX_BUILTIN_ARGS = 6;
> +  tree arg[MAX_BUILTIN_ARGS];
> +  rtx op[MAX_BUILTIN_ARGS];
> +  machine_mode mode[MAX_BUILTIN_ARGS + 1];

Arrays are always better with a short comment.
Why is the "mode" array one entry longer, btw?

> +  rtx pat;
> +  bool void_func = TREE_TYPE (TREE_TYPE (fndecl)) == void_type_node;
> +  int k;

These things should be declared where they are first used.  The
initialisation of void_func should not be hidden amidst boring
declarations.

"k" needs a comment.

> +      if (!insn_data[icode].operand[i+k].mode)
> +	mode[i+k] = TARGET_64BIT ? Pmode : SImode;

That is
  mode[i+k] = Pmode;
always.

Does this depend on VOIDmode being equal to 0?  That is guaranteed, but
if you write out VOIDmode elsewhere, do it here as well?

> +      else
> +	mode[i+k] = insn_data[icode].operand[i+k].mode;
> +    }

So this is
  mode[i+k] = insn_data[icode].operand[i+k].mode;
  if (!mode[i+k])
    mode[i+k] = Pmode;

> +      switch (bifaddr->restr[i])
> +	{
> +	default:

default: goes at the end.

> +	case RES_BITS:
> +	  {
> +	    size_t mask = (1 << bifaddr->restr_val1[i]) - 1;

1 is an int, it can overflow much before a size_t would.
  size_t mask = 1;
  mask <<= bifaddr->restr_val1[i];
  mask--;

> +	    tree restr_arg = arg[bifaddr->restr_opnd[i] - 1];
> +	    STRIP_NOPS (restr_arg);
> +	    if (TREE_CODE (restr_arg) != INTEGER_CST
> +		|| TREE_INT_CST_LOW (restr_arg) & ~mask)

Manual De Morgan again?  More later too, ugh.  Well at least these
are trivial either way.

> +	case RES_VAR_RANGE:
> +	  {
> +	    tree restr_arg = arg[bifaddr->restr_opnd[i] - 1];
> +	    STRIP_NOPS (restr_arg);
> +	    if (TREE_CODE (restr_arg) == INTEGER_CST
> +		&& !IN_RANGE (tree_to_shwi (restr_arg),
> +			      bifaddr->restr_val1[i],
> +			      bifaddr->restr_val2[i]))
> +	      {
> +		error ("argument %d must be a variable or a literal "
> +		       "between %d and %d, inclusive",
> +		       bifaddr->restr_opnd[i], bifaddr->restr_val1[i],
> +		       bifaddr->restr_val2[i]);
> +		return CONST0_RTX (mode[0]);
> +	      }
> +	    break;
> +	  }

This error check is incongruent with the rest, and with its error
message?  If it is not an INTEGER_CST, it does not check anything about
it.  That sounds like trouble later.

> +  if (fcode == RS6000_BIF_PACK_IF
> +      && TARGET_LONG_DOUBLE_128 && !TARGET_IEEEQUAD)

Curious line breaks.  Break before each &&, or don't break before the
first one?

> +    {
> +      icode = CODE_FOR_packtf;
> +      fcode = RS6000_BIF_PACK_TF;

The fcode was for IF, can you use TF now?

> +      uns_fcode = (size_t)fcode;
> +    }

(space after cast)

> +  else if (fcode == RS6000_BIF_UNPACK_IF
> +	   && TARGET_LONG_DOUBLE_128 && !TARGET_IEEEQUAD)
> +    {
> +      icode = CODE_FOR_unpacktf;
> +      fcode = RS6000_BIF_UNPACK_TF;
> +      uns_fcode = (size_t)fcode;
> +    }

(same issues)

> +  switch (nargs)
> +    {
> +    default:

Just Say No.  Don't write  0 == err  and don't put default: first.  It
does not improve your code.  It makes it worse, instead.  Yoda can
understand Yoda-speak, it comes natural to him.  Yoda is not amongst the
readers of your code though, so please write in the common idiom :-)


This is okay for trunk with these things fixed.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 07/18] rs6000: Builtin expansion, part 2
  2021-09-01 16:13 ` [PATCH 07/18] rs6000: Builtin expansion, part 2 Bill Schmidt
@ 2021-11-01 12:18   ` Segher Boessenkool
  0 siblings, 0 replies; 52+ messages in thread
From: Segher Boessenkool @ 2021-11-01 12:18 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: gcc-patches, dje.gcc

Hi!

On Wed, Sep 01, 2021 at 11:13:43AM -0500, Bill Schmidt wrote:
> 	* config/rs6000/rs6000-call.c (rs6000_invalid_new_builtin):
> 	Implement.

That fits on one line.  Don't wrap early, esp. not if that leaves a
colon without anything following it on that line: it looks like
something is missing.

> 	(rs6000_expand_ldst_mask): Likewise.
> 	(rs6000_init_builtins): Initialize altivec_builtin_mask_for_load.


>  static void
>  rs6000_invalid_new_builtin (enum rs6000_gen_builtins fncode)
>  {
> +  size_t uns_fncode = (size_t) fncode;

Like in the previous patch, the "uns_*" name made me think "you do not
need an explicit cast, the assignment will do that automatically".  But
of course it does not matter this is unsigned at all: the cast is
casting an enum to a number, which in C++ does require a cast.

So maybe you can think of some better name?  Something like "j" is fine
with me as well btw, it's nice and short, and it is clear you do not
want more meaning ;-)

> +  switch (rs6000_builtin_info_x[uns_fncode].enable)

> +    case ENB_P6:
> +      error ("%qs requires the %qs option", name, "-mcpu=power6");
> +      break;

> +    case ENB_CELL:
> +      error ("%qs is only valid for the cell processor", name);
> +      break;

Maybe  "%qs requires the %qs option", name, "-mcpu=cell"  ?  Boring is
good ;-)

> +    };

(This is  switch (...) { ... };  )
Stray semi.  Was there no warning?

>  rtx
>  rs6000_expand_ldst_mask (rtx target, tree arg0)
>   {
> +  int icode2 = BYTES_BIG_ENDIAN

You do not need a line break here.

> +    ? (int) CODE_FOR_altivec_lvsr_direct
> +    : (int) CODE_FOR_altivec_lvsl_direct;

You can align the ? and : just fine without it.

> +  rtx op, addr, pat;

Don't declare such things early.

Okay for trunk with those things fixed.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 08/18] rs6000: Builtin expansion, part 3
  2021-09-01 16:13 ` [PATCH 08/18] rs6000: Builtin expansion, part 3 Bill Schmidt
@ 2021-11-03  1:15   ` Segher Boessenkool
  0 siblings, 0 replies; 52+ messages in thread
From: Segher Boessenkool @ 2021-11-03  1:15 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: gcc-patches, dje.gcc

On Wed, Sep 01, 2021 at 11:13:44AM -0500, Bill Schmidt wrote:
> 	* config/rs6000/rs6000-call.c (new_cpu_expand_builtin):
> 	Implement.

(just one line)

> @@ -14646,6 +14646,108 @@ static rtx
>  new_cpu_expand_builtin (enum rs6000_gen_builtins fcode,
>  			tree exp ATTRIBUTE_UNUSED, rtx target)
>  {
> +  /* __builtin_cpu_init () is a nop, so expand to nothing.  */
> +  if (fcode == RS6000_BIF_CPU_INIT)
> +    return const0_rtx;
> +
> +  if (target == 0 || GET_MODE (target) != SImode)
> +    target = gen_reg_rtx (SImode);
> +
> +#ifdef TARGET_LIBC_PROVIDES_HWCAP_IN_TCB

It would make sense to put this #ifdef in a separate function (it
certainly is big enough for that, that is reason enough ;-) ), and then
you can probably do it without #ifdef more easily as a bonus.  That's a
future improvement of course.

In general, any function that is unwieldily big should have pieces
factored out.  A good time to do that is if you would be touching it
anyway (as a separate patch, before the other stuff most likely).

The patch is okay for trunk (w/ the changelog nit fixed :-) )  Thanks!


Segher

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 09/18] rs6000: Builtin expansion, part 4
  2021-09-01 16:13 ` [PATCH 09/18] rs6000: Builtin expansion, part 4 Bill Schmidt
@ 2021-11-03  1:52   ` Segher Boessenkool
  0 siblings, 0 replies; 52+ messages in thread
From: Segher Boessenkool @ 2021-11-03  1:52 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: gcc-patches, dje.gcc

Hi!

On Wed, Sep 01, 2021 at 11:13:45AM -0500, Bill Schmidt wrote:
>  static insn_code
>  elemrev_icode (rs6000_gen_builtins fcode)
>  {
> +  switch (fcode)
> +    {
> +    default:
> +      gcc_unreachable ();

default: goes at the end.

> +    case RS6000_BIF_ST_ELEMREV_V1TI:
> +      return BYTES_BIG_ENDIAN
> +	? CODE_FOR_vsx_store_v1ti
> +	: CODE_FOR_vsx_st_elemrev_v1ti;

That fits on one or two lines.  Many more like that, I won't point them
all out.

Alternatively, maybe nicer, put an "if (BYTES_BIG_ENDIAN)" before the
switch, and duplicate the switch.  It probably is more readable that
way, easier to spot if you missed or typoedsomething.

> +    }
> +  gcc_unreachable ();

So the default: has no use at all!  We do the same after the switch
already.

>    return (insn_code) 0;

And neither does this line.  Leave it out, the gcc_unreachable tells GCC
that this path does not have to return a value (because this path can
never be taken).  Put a blank line before the gcc_unreachable though,
nice and dramatic right before the function-closing curly ;-)

>  static rtx
>  ldv_expand_builtin (rtx target, insn_code icode, rtx *op, machine_mode tmode)
>  {
> +  rtx pat, addr;
> +  bool blk = (icode == CODE_FOR_altivec_lvlx
> +	      || icode == CODE_FOR_altivec_lvlxl
> +	      || icode == CODE_FOR_altivec_lvrx
> +	      || icode == CODE_FOR_altivec_lvrxl);

"blk" is used 32 lines later.  Maybe a better name would help?  Maybe
you can declare (and init) the variable later?  Maybe a tiny comment
will help?

Apparently it means to use BLKmode instead of tmode, but why is that?

> +  if (target == 0
> +      || GET_MODE (target) != tmode
> +      || !insn_data[icode].operand[0].predicate (target, tmode))
> +    target = gen_reg_rtx (tmode);

And here it uses tmode anyway.  Hrm.

> +  /* For LVX, express the RTL accurately by ANDing the address with -16.
> +     LVXL and LVE*X expand to use UNSPECs to hide their special behavior,
> +     so the raw address is fine.  */

In the case of lvxl the unspec is not around the memory address, so this
is not true.

Oh, for lve* the same:

(define_insn "altivec_lve<VI_char>x"
  [(parallel
    [(set (match_operand:VI 0 "register_operand" "=v")
          (match_operand:VI 1 "memory_operand" "Z"))
     (unspec [(const_int 0)] UNSPEC_LVE)])]
  "TARGET_ALTIVEC"
  "lve<VI_char>x %0,%y1"
  [(set_attr "type" "vecload")])

The "set" is just plain.  It is paralleled with an unspec so it is not
the same RTL as some other load, but it is perfectly visible to all RTL
optimisers.

There needs to be an unspec in the "set" itself, like already done for
stve*: lve* leaves most of the vector undefined, but the RTL does not
express that currently, and then things can go wrong.  I cannot think of
an example where it *will* go wrong, but that does not say much :-)

So, all VMX-style loads and stores need the &-16 .

We survived this for ages, and it is not like lve* is such a hotly used
builtin these days, so we'll survive things, but: put it on a to-do
list somewhere?  :-)

> +  /* Emit the lxvr*x insn.  */
> +  pat = GEN_FCN (icode) (tiscratch, addr);

(declare it here, "rtx pat", not much earlier)

Okay for trunk with whatever tidyings you feel you can make now, and
leave the rest for a later day.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 10/18] rs6000: Builtin expansion, part 5
  2021-09-01 16:13 ` [PATCH 10/18] rs6000: Builtin expansion, part 5 Bill Schmidt
@ 2021-11-04  0:55   ` Segher Boessenkool
  0 siblings, 0 replies; 52+ messages in thread
From: Segher Boessenkool @ 2021-11-04  0:55 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: gcc-patches, dje.gcc

Hi!

On Wed, Sep 01, 2021 at 11:13:46AM -0500, Bill Schmidt wrote:
> 	* config/rs6000/rs6000-call.c (new_mma_expand_builtin):
> 	Implement.

Same comment as all previous here :-)

>  new_mma_expand_builtin (tree exp, rtx target, insn_code icode,
>  			rs6000_gen_builtins fcode)
>  {
> +  tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
> +  tree arg;
> +  call_expr_arg_iterator iter;
> +  const struct insn_operand_data *insn_op;
> +  rtx op[MAX_MMA_OPERANDS];
> +  unsigned nopnds = 0;
> +  bool void_func = TREE_TYPE (TREE_TYPE (fndecl)) == void_type_node;
> +  machine_mode tmode = VOIDmode;

All those declarations could be much later, making things easier to read
and understand (and review and check :-) ).

> +  if (!void_func)
> +    {
> +      tmode = insn_data[icode].operand[0].mode;
> +      if (!target
> +	  || GET_MODE (target) != tmode
> +	  || !insn_data[icode].operand[0].predicate (target, tmode))

Inverted logic again.

> +      if (!insn_op->predicate (opnd, insn_op->mode))
> +	{
> +	  if (!strcmp (insn_op->constraint, "n"))

Is looking at the contraint string as text optimal here?  Is it even
correct?  Hrm.  It should have a comment explaining it, at least.

> +    case 5:
> +      /* The ASSEMBLE builtin source operands are reversed in little-endian
> +	 mode, so reorder them.  */
> +      if (fcode == RS6000_BIF_ASSEMBLE_ACC_INTERNAL && !WORDS_BIG_ENDIAN)
> +	{
> +	  std::swap (op[1], op[4]);
> +	  std::swap (op[2], op[3]);
> +	}

Where is this done in the "old" code?  I probably are just looking with
unsuitable body parts for seeing, but :-)

> +  if (!pat)
> +    return NULL_RTX;
> +  emit_insn (pat);
> +
>    return target;

I'd put a blank line before that emit_insn.  For dramatic tension, if
nothing else ;-)  (It reads better imo).

Okay for trunk.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 11/18] rs6000: Builtin expansion, part 6
  2021-09-01 16:13 ` [PATCH 11/18] rs6000: Builtin expansion, part 6 Bill Schmidt
@ 2021-11-04  1:24   ` Segher Boessenkool
  2021-11-07 15:28     ` Bill Schmidt
  0 siblings, 1 reply; 52+ messages in thread
From: Segher Boessenkool @ 2021-11-04  1:24 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: gcc-patches, dje.gcc

Hi!

On Wed, Sep 01, 2021 at 11:13:47AM -0500, Bill Schmidt wrote:
> Provide replacements for htm_spr_num and htm_expand_builtin.  No logic
> changes are intended here, as usual.  Much code was factored out into
> rs6000_expand_new_builtin, so the new version of htm_expand_builtin is
> a little tidier.

Nice.

> Also implement the support for the "endian" and "32bit" attributes,
> which is straightforward.  These just do icode substitution.

Don't call this "attributes" please?  I don't know what would be a
better name, of course.  "bif attribute" maybe?

> +  rtx op[MAX_HTM_OPERANDS], pat;

Don't declare arrays and scalars in the same statement, in general.  It
is important that the arrays stand out.

Also, don't declare things before they are used please.

> +  FOR_EACH_CALL_EXPR_ARG (arg, iter, exp)
> +    {
> +      if (arg == error_mark_node || nopnds >= MAX_HTM_OPERANDS)
> +	return const0_rtx;
> +
> +      insn_op = &insn_data[icode].operand[nopnds];
> +      op[nopnds] = expand_normal (arg);
> +
> +      if (!insn_op->predicate (op[nopnds], insn_op->mode))
> +	{
> +	  if (!strcmp (insn_op->constraint, "n"))
> +	    {
> +	      int arg_num = (nonvoid) ? nopnds : nopnds + 1;

Please don't parenthesise random expressions like "nonvoid".  I wonder
if that can be simpler handled by just unshifting a void_node into the
operands, btw :-)

And the same "n" thing as before of course.  Since it is the same: some
factoring would be helpful probably.

> +      machine_mode mode = (TARGET_POWERPC64) ? DImode : SImode;

Superfluous parens.  This is just "word_mode", anyway?

> +  /* If this builtin accesses a CR, then pass in a scratch
> +     CR as the last operand.  */
> +  else if (bif_is_htmcr (*bifaddr))
> +    {
> +      cr = gen_reg_rtx (CCmode);
> +      op[nopnds++] = cr;
> +    }

There is only one CR ("condition register").  You can say CRF here
("condition register field", a 4-bit thing), or just cc or CC maybe
("condition code").  A pet peeve, I know.

> +  if (bif_is_endian (*bifaddr) && BYTES_BIG_ENDIAN)

"is_endian" should maybe be "is_bigendian" or something like that?

Okay for trunk with the changes you see fit at this time.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 12/18] rs6000: Update rs6000_builtin_decl
  2021-09-01 16:13 ` [PATCH 12/18] rs6000: Update rs6000_builtin_decl Bill Schmidt
@ 2021-11-05 20:27   ` Segher Boessenkool
  0 siblings, 0 replies; 52+ messages in thread
From: Segher Boessenkool @ 2021-11-05 20:27 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: gcc-patches, dje.gcc

Hi!

On Wed, Sep 01, 2021 at 11:13:48AM -0500, Bill Schmidt wrote:
> 	* config/rs6000/rs6000-call.c (rs6000_new_builtin_decl): New
> 	function.

One line, etc.

> +static tree
> +rs6000_new_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)

No ATTRIBUTE_UNUSED please.

Okay for trunk with those trivialities fixed.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 13/18] rs6000: Miscellaneous uses of rs6000_builtins_decl_x
  2021-09-01 16:13 ` [PATCH 13/18] rs6000: Miscellaneous uses of rs6000_builtins_decl_x Bill Schmidt
@ 2021-11-05 20:36   ` Segher Boessenkool
  0 siblings, 0 replies; 52+ messages in thread
From: Segher Boessenkool @ 2021-11-05 20:36 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: gcc-patches, dje.gcc

Hi!

On Wed, Sep 01, 2021 at 11:13:49AM -0500, Bill Schmidt wrote:
> 	* config/rs6000/rs6000.c (rs6000_builtin_reciprocal): Use
> 	rs6000_builtin_decls_x when appropriate.
> 	(add_condition_to_bb): Likewise.
> 	(rs6000_atomic_assign_expand_fenv): Likewise.

> +  tree predicate_decl
> +    = (new_builtins_are_live
> +       ? rs6000_builtin_decls_x[(int) RS6000_BIF_CPU_SUPPORTS]
> +       : rs6000_builtin_decls [(int) RS6000_BUILTIN_CPU_SUPPORTS]);

Please don't randomly parenthesise stuff.  If something in emacs
complains about it, fix *that*?  Or complain back, etc. :-)

> -  tree mffs = rs6000_builtin_decls[RS6000_BUILTIN_MFFS];
> -  tree mtfsf = rs6000_builtin_decls[RS6000_BUILTIN_MTFSF];
> +  tree mffs
> +    = (new_builtins_are_live
> +       ? rs6000_builtin_decls_x[RS6000_BIF_MFFS]
> +       : rs6000_builtin_decls[RS6000_BUILTIN_MFFS]);
> +  tree mtfsf
> +    = (new_builtins_are_live
> +       ? rs6000_builtin_decls_x[RS6000_BIF_MTFSF]
> +       : rs6000_builtin_decls[RS6000_BUILTIN_MTFSF]);
>    tree call_mffs = build_call_expr (mffs, 0);

Same here.

Okay for trunk with that fixed.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 14/18] rs6000: Debug support
  2021-09-01 16:13 ` [PATCH 14/18] rs6000: Debug support Bill Schmidt
@ 2021-11-05 21:34   ` Segher Boessenkool
  2021-11-09 15:06     ` Bill Schmidt
  0 siblings, 1 reply; 52+ messages in thread
From: Segher Boessenkool @ 2021-11-05 21:34 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: gcc-patches, dje.gcc

On Wed, Sep 01, 2021 at 11:13:50AM -0500, Bill Schmidt wrote:
> 	* config/rs6000/rs6000-call.c (rs6000_debug_type): New function.
> 	(def_builtin): Change debug formatting for easier parsing and
> 	include more information.
> 	(rs6000_init_builtins): Add dump of autogenerated builtins.
> 	(altivec_init_builtins): Dump __builtin_altivec_mask_for_load for
> 	completeness.

>  /* Builtins.  */

Maybe change this header now?  It heads "def_builtin" before this patch,
maybe it should just move back there?  And as any function comment, it
should describe the function arguments (and return value, but that is
void there :-) )

> +/* Debug utility to translate a type node to a single token.  */

It returns a text string, instead.  The function name can be better:

> +static
> +const char *rs6000_debug_type (tree type)

"debug_" suggests this function outputs something to stderr.

> +{
> +  if (type == void_type_node)
> +    return "void";
> +  else if (type == long_integer_type_node)
> +    return "long";
> +  else if (type == long_unsigned_type_node)
> +    return "ulong";
> +  else if (type == long_long_integer_type_node)
> +    return "longlong";
> +  else if (type == long_long_unsigned_type_node)
> +    return "ulonglong";
> +  else if (type == bool_V2DI_type_node)
> +    return "vbll";
> +  else if (type == bool_V4SI_type_node)
> +    return "vbi";
> +  else if (type == bool_V8HI_type_node)
> +    return "vbs";
> +  else if (type == bool_V16QI_type_node)
> +    return "vbc";
> +  else if (type == bool_int_type_node)
> +    return "bool";
> +  else if (type == dfloat64_type_node)
> +    return "_Decimal64";
> +  else if (type == double_type_node)
> +    return "double";
> +  else if (type == intDI_type_node)
> +    return "sll";
> +  else if (type == intHI_type_node)
> +    return "ss";
> +  else if (type == ibm128_float_type_node)
> +    return "__ibm128";
> +  else if (type == opaque_V4SI_type_node)
> +    return "opaque";
> +  else if (POINTER_TYPE_P (type))
> +    return "void*";
> +  else if (type == intQI_type_node || type == char_type_node)
> +    return "sc";
> +  else if (type == dfloat32_type_node)
> +    return "_Decimal32";
> +  else if (type == float_type_node)
> +    return "float";
> +  else if (type == intSI_type_node || type == integer_type_node)
> +    return "si";
> +  else if (type == dfloat128_type_node)
> +    return "_Decimal128";
> +  else if (type == long_double_type_node)
> +    return "longdouble";
> +  else if (type == intTI_type_node)
> +    return "sq";
> +  else if (type == unsigned_intDI_type_node)
> +    return "ull";
> +  else if (type == unsigned_intHI_type_node)
> +    return "us";
> +  else if (type == unsigned_intQI_type_node)
> +    return "uc";
> +  else if (type == unsigned_intSI_type_node)
> +    return "ui";
> +  else if (type == unsigned_intTI_type_node)
> +    return "uq";
> +  else if (type == unsigned_V1TI_type_node)
> +    return "vuq";
> +  else if (type == unsigned_V2DI_type_node)
> +    return "vull";
> +  else if (type == unsigned_V4SI_type_node)
> +    return "vui";
> +  else if (type == unsigned_V8HI_type_node)
> +    return "vus";
> +  else if (type == unsigned_V16QI_type_node)
> +    return "vuc";
> +  else if (type == V16QI_type_node)
> +    return "vsc";
> +  else if (type == V1TI_type_node)
> +    return "vsq";
> +  else if (type == V2DF_type_node)
> +    return "vd";
> +  else if (type == V2DI_type_node)
> +    return "vsll";
> +  else if (type == V4SF_type_node)
> +    return "vf";
> +  else if (type == V4SI_type_node)
> +    return "vsi";
> +  else if (type == V8HI_type_node)
> +    return "vss";
> +  else if (type == pixel_V8HI_type_node)
> +    return "vp";
> +  else if (type == pcvoid_type_node)
> +    return "voidc*";
> +  else if (type == float128_type_node)
> +    return "_Float128";
> +  else if (type == vector_pair_type_node)
> +    return "__vector_pair";
> +  else if (type == vector_quad_type_node)
> +    return "__vector_quad";
> +  else
> +    return "unknown";
> +}

Please use a switch statement for this.  You can call the variable
"type_node" then as well, which would be a good idea.

>    if (TARGET_DEBUG_BUILTIN)
> -    fprintf (stderr, "rs6000_builtin, code = %4d, %s%s\n",
> -	     (int)code, name, attr_string);
> +    {
> +      tree t = TREE_TYPE (type);
> +      fprintf (stderr, "%s %s (", rs6000_debug_type (t), name);
> +      t = TYPE_ARG_TYPES (type);
> +      while (t && TREE_VALUE (t) != void_type_node)

Can both 0 and void_type_node terminate a list here?

> +	{
> +	  fprintf (stderr, "%s",
> +		   rs6000_debug_type (TREE_VALUE (t)));

This easily fits on one line.

> +	  t = TREE_CHAIN (t);
> +	  if (t && TREE_VALUE (t) != void_type_node)
> +	    fprintf (stderr, ", ");

It is easier to use a "bool first" extra var, you do not need to write
the same condition twice that way.

  bool first = true;
  while (...)
    {
      if (!first)
	fprintf ...;
      first = false;

      rest of loop body;
    }


> +	}
> +      fprintf (stderr, "); %s [%4d]\n", attr_string, (int)code);
> +    }

(space after cast)

>  
>  static const struct builtin_compatibility bdesc_compat[] =
> @@ -16097,6 +16209,67 @@ rs6000_init_builtins (void)
>    /* Execute the autogenerated initialization code for builtins.  */
>    rs6000_init_generated_builtins ();
>  
> +  if (TARGET_DEBUG_BUILTIN)
> +     {

(misindent)

> +	  if (e == ENB_P10_64 && (!TARGET_POWER10 || !TARGET_POWERPC64))

	  if (e == ENB_P10_64 && !(TARGET_POWER10 && TARGET_POWERPC64))

It even is shorter in this case ;-)

> +  if (TARGET_DEBUG_BUILTIN)
> +    fprintf (stderr, "%s __builtin_altivec_mask_for_load (%s); [%4d]\n",
> +	     rs6000_debug_type (TREE_TYPE (v16qi_ftype_pcvoid)),
> +	     rs6000_debug_type (TREE_VALUE
> +				(TYPE_ARG_TYPES (v16qi_ftype_pcvoid))),

Never start a line with a paren from a function call.  Often using an
extra variable is the best solution?

Okay for trunk with those things touched up.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 15/18] rs6000: Update altivec.h for automated interfaces
  2021-09-01 16:13 ` [PATCH 15/18] rs6000: Update altivec.h for automated interfaces Bill Schmidt
@ 2021-11-05 22:08   ` Segher Boessenkool
  0 siblings, 0 replies; 52+ messages in thread
From: Segher Boessenkool @ 2021-11-05 22:08 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: gcc-patches, dje.gcc

On Wed, Sep 01, 2021 at 11:13:51AM -0500, Bill Schmidt wrote:
> gcc/
> 	* config/rs6000/altivec.h: Delete a number of #defines that are
> 	now superfluous.  Alphabetize.  Include rs6000-vecdefines.h.
> 	Include some synonyms.

	* config/rs6000/altivec.h: Delete a number of #defines that are now
	superfluous.  Alphabetize.  Include rs6000-vecdefines.h.  Include some
	synonyms.

This looks good, but I cannot easily check any of it.  And I doubt the
testsuite tests if all of this is defined correctly.  How did you test
this, maybe you can add a testcase to check this?

Okay for trunk.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 17/18] rs6000: Enable the new builtin support
  2021-09-01 16:13 ` [PATCH 17/18] rs6000: Enable the new builtin support Bill Schmidt
@ 2021-11-05 22:10   ` Segher Boessenkool
  0 siblings, 0 replies; 52+ messages in thread
From: Segher Boessenkool @ 2021-11-05 22:10 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: gcc-patches, dje.gcc

Hi!

On Wed, Sep 01, 2021 at 11:13:53AM -0500, Bill Schmidt wrote:
> 	* config/rs6000/rs6000-gen-builtins.c (write_init_file):
> 	Initialize new_builtins_are_live to 1.

> --- a/gcc/config/rs6000/rs6000-gen-builtins.c
> +++ b/gcc/config/rs6000/rs6000-gen-builtins.c
> @@ -2791,7 +2791,7 @@ write_init_file (void)
>    fprintf (init_file, "#include \"rs6000-builtins.h\"\n");
>    fprintf (init_file, "\n");
>  
> -  fprintf (init_file, "int new_builtins_are_live = 0;\n\n");
> +  fprintf (init_file, "int new_builtins_are_live = 1;\n\n");
>  
>    fprintf (init_file, "tree rs6000_builtin_decls_x[RS6000_OVLD_MAX];\n\n");

... and everything still works after this?  Congrats!

Okay for trunk.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 16/18] rs6000: Test case adjustments
  2021-09-01 16:13 ` [PATCH 16/18] rs6000: Test case adjustments Bill Schmidt
@ 2021-11-05 22:37   ` Segher Boessenkool
  2021-11-11 20:06     ` Bill Schmidt
  0 siblings, 1 reply; 52+ messages in thread
From: Segher Boessenkool @ 2021-11-05 22:37 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: gcc-patches, dje.gcc

On Wed, Sep 01, 2021 at 11:13:52AM -0500, Bill Schmidt wrote:
> 	* gcc.target/powerpc/bfp/scalar-extract-exp-2.c: Adjust.

My favourite changelog entry!  But, adjust to what?  This is the first
line :-)

"Adjust expected error message"?

But you should fold this patch with some previous patch anyway, when
committing (or you break bisecting).

> --- a/gcc/testsuite/gcc.target/powerpc/fold-vec-splat-floatdouble.c
> +++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-splat-floatdouble.c
> @@ -18,7 +18,7 @@ vector float test_fc ()
>  vector double testd_00 (vector double x) { return vec_splat (x, 0b00000); }
>  vector double testd_01 (vector double x) { return vec_splat (x, 0b00001); }
>  vector double test_dc ()
> -{ const vector double y = { 3.0, 5.0 }; return vec_splat (y, 0b00010); }
> +{ const vector double y = { 3.0, 5.0 }; return vec_splat (y, 0b00001); }
>  
>  /* If the source vector is a known constant, we will generate a load or possibly
>     XXSPLTIW.  */
> @@ -28,5 +28,5 @@ vector double test_dc ()
>  /* { dg-final { scan-assembler-times {\mvspltw\M|\mxxspltw\M} 3 } } */
>  
>  /* For double types, we will generate xxpermdi instructions.  */
> -/* { dg-final { scan-assembler-times "xxpermdi" 3 } } */
> +/* { dg-final { scan-assembler-times "xxpermdi" 2 } } */

Why these changes?

> --- a/gcc/testsuite/gcc.target/powerpc/fold-vec-splat-longlong.c
> +++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-splat-longlong.c
> @@ -9,23 +9,19 @@
>  
>  vector bool long long testb_00 (vector bool long long x) { return vec_splat (x, 0b00000); }
>  vector bool long long testb_01 (vector bool long long x) { return vec_splat (x, 0b00001); }
> -vector bool long long testb_02 (vector bool long long x) { return vec_splat (x, 0b00010); }
>  
>  vector signed long long tests_00 (vector signed long long x) { return vec_splat (x, 0b00000); }
>  vector signed long long tests_01 (vector signed long long x) { return vec_splat (x, 0b00001); }
> -vector signed long long tests_02 (vector signed long long x) { return vec_splat (x, 0b00010); }
>  
>  vector unsigned long long testu_00 (vector unsigned long long x) { return vec_splat (x, 0b00000); }
>  vector unsigned long long testu_01 (vector unsigned long long x) { return vec_splat (x, 0b00001); }
> -vector unsigned long long testu_02 (vector unsigned long long x) { return vec_splat (x, 0b00010); }
>  
>  /* Similar test as above, but the source vector is a known constant. */
> -vector bool long long test_bll () { const vector bool long long y = {12, 23}; return vec_splat (y, 0b00010); }
> -vector signed long long test_sll () { const vector signed long long y = {34, 45}; return vec_splat (y, 0b00010); }
> -vector unsigned long long test_ull () { const vector unsigned long long y = {56, 67}; return vec_splat (y, 0b00010); }
> +vector bool long long test_bll () { const vector bool long long y = {12, 23}; return vec_splat (y, 0b00001); }
> +vector signed long long test_sll () { const vector signed long long y = {34, 45}; return vec_splat (y, 0b00001); }
>  
>  /* Assorted load instructions for the initialization with known constants. */
> -/* { dg-final { scan-assembler-times {\mlvx\M|\mlxvd2x\M|\mlxv\M|\mplxv\M} 3 } } */
> +/* { dg-final { scan-assembler-times {\mlvx\M|\mlxvd2x\M|\mlxv\M|\mplxv\M|\mxxspltib\M} 2 } } */
>  
>  /* xxpermdi for vec_splat of long long vectors.
>   At the time of this writing, the number of xxpermdi instructions

Ditto.

> --- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> +++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
> @@ -11,9 +11,9 @@
>  /* { dg-final { scan-assembler-times {\mvrlq\M} 2 } } */
>  /* { dg-final { scan-assembler-times {\mvrlqnm\M} 2 } } */
>  /* { dg-final { scan-assembler-times {\mvrlqmi\M} 2 } } */
> -/* { dg-final { scan-assembler-times {\mvcmpequq\M} 16 } } */
> -/* { dg-final { scan-assembler-times {\mvcmpgtsq\M} 16 } } */
> -/* { dg-final { scan-assembler-times {\mvcmpgtuq\M} 16 } } */
> +/* { dg-final { scan-assembler-times {\mvcmpequq\M} 24 } } */
> +/* { dg-final { scan-assembler-times {\mvcmpgtsq\M} 26 } } */
> +/* { dg-final { scan-assembler-times {\mvcmpgtuq\M} 26 } } */
>  /* { dg-final { scan-assembler-times {\mvmuloud\M} 1 } } */
>  /* { dg-final { scan-assembler-times {\mvmulesd\M} 1 } } */
>  /* { dg-final { scan-assembler-times {\mvmulosd\M} 1 } } */

And this?

> --- a/gcc/testsuite/gcc.target/powerpc/p8vector-builtin-8.c
> +++ b/gcc/testsuite/gcc.target/powerpc/p8vector-builtin-8.c
> @@ -126,6 +126,7 @@ void foo (vector signed char *vscr,
>  /* { dg-final { scan-assembler-times "vsubcuw" 4 } } */
>  /* { dg-final { scan-assembler-times "vsubuwm" 4 } } */
>  /* { dg-final { scan-assembler-times "vbpermq" 2 } } */
> +/* { dg-final { scan-assembler-times "vbpermd" 0 } } */
>  /* { dg-final { scan-assembler-times "xxleqv" 4 } } */
>  /* { dg-final { scan-assembler-times "vgbbd" 1 } } */
>  /* { dg-final { scan-assembler-times "xxlnand" 4 } } */

This curious one could have been a separate (obvious) patch.  It is a
bit out-of-place here.

> --- a/gcc/testsuite/gcc.target/powerpc/pragma_power8.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pragma_power8.c
> @@ -19,6 +19,7 @@ test1 (vector int a, vector int b)
>  #pragma GCC target ("cpu=power7")
>  /* Force a re-read of altivec.h with new cpu target. */
>  #undef _ALTIVEC_H
> +#undef _RS6000_VECDEFINES_H
>  #include <altivec.h>

Wow ugly :-)  But nothing new here, heh.  Best not to look at testcase
internals too closely, in any case.

> --- a/gcc/testsuite/gcc.target/powerpc/test_mffsl.c
> +++ b/gcc/testsuite/gcc.target/powerpc/test_mffsl.c
> @@ -1,5 +1,6 @@
>  /* { dg-do run { target { powerpc*-*-* } } } */
> -/* { dg-options "-O2 -std=c99" } */
> +/* { dg-options "-O2 -std=c99 -mcpu=power9" } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
>  
>  #ifdef DEBUG
>  #include <stdio.h>

This one is a bug fix as well (and obvious).

> --- a/gcc/testsuite/gcc.target/powerpc/vsu/vec-all-nez-7.c
> +++ b/gcc/testsuite/gcc.target/powerpc/vsu/vec-all-nez-7.c
> @@ -12,5 +12,5 @@ test_all_not_equal_and_not_zero (vector unsigned short *arg1_p,
>    vector unsigned short arg_2 = *arg2_p;
>  
>    return __builtin_vec_vcmpnez_p (__CR6_LT, arg_1, arg_2);
> -  /* { dg-error "'__builtin_altivec_vcmpnezh_p' requires the '-mcpu=power9' option" "" { target *-*-* } .-1 } */
> +  /* { dg-error "'__builtin_altivec_vcmpnezh_p' requires the '-mpower9-vector' option" "" { target *-*-* } .-1 } */
>  }

Hrm.  People should not use the -mpower9-vector option (except implied
by -mcpu=power9, without vectors disabled).  How hard is it to give a
better error message here?

The obvious bugfixes independent of this series are of course okay for
trunk, as separate patches, now.  But some more work is needed
elsewhere.


Segher

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 18/18] rs6000: Add escape-newline support for builtins files
  2021-09-01 16:13 ` [PATCH 18/18] rs6000: Add escape-newline support for builtins files Bill Schmidt
@ 2021-11-05 23:50   ` Segher Boessenkool
  2021-11-08 19:40     ` Bill Schmidt
  0 siblings, 1 reply; 52+ messages in thread
From: Segher Boessenkool @ 2021-11-05 23:50 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: gcc-patches, dje.gcc

Hi!

On Wed, Sep 01, 2021 at 11:13:54AM -0500, Bill Schmidt wrote:
> +/* Escape-newline support.  For readability, we prefer to allow developers
> +   to use escape-newline to continue long lines to the next one.  We
> +   maintain a buffer of "original" lines here, which are concatenated into
> +   linebuf, above, and which can be used to convert the virtual line
> +   position "line / pos" into actual line and position information.  */
> +#define MAXLINES 4

Make this bigger already?  Or, want to bet if we will need to increase
it for GCC 12 already?  Because for GCC 13 we almost certainly will :-)

> +/* From a possibly extended line with a virtual position, calculate
> +   the current line and character position.  */
> +static void
> +real_line_pos (int diagpos, int *real_line, int *real_pos)
> +{
> +  *real_line = line - lastline;
> +  *real_pos = diagpos;
> +
> +  for (int i = 0; i < MAXLINES && *real_pos > (int) strlen (lines[i]); i++)
> +    {
> +      (*real_line)++;
> +      *real_pos -= strlen (lines[i]) - 2;
> +    }
> +
> +  /* Convert from zero-base to one-base for printing.  */
> +  (*real_pos)++;
> +}

The cast is nasty, and you reuse that expression (which includes an
expensive call, which GCC might not realise it can CSE) anyway.  You
can rewrite this like

  for (int i = 0; i < MAXLINES; i++)
    {
      int len = strlen (lines[i]);
      if (*real_pos > len)
	break;

      (*real_line)++;
      *real_pos -= len - 2;
    }

fixing both these issues.

> +/* Produce a fatal error message.  */
> +static void
> +fatal (const char *msg)
> +{
> +  fprintf (stderr, "FATAL: %s\n", msg);
> +  abort ();
> +}

No vfprint?  Aww :-)  You didn't yet see any use for formatted fatal
errors I guess.

> +	      if (lastline == MAXLINES - 1)
> +		fatal ("number of supported overflow lines exceeded");
> +	      lastline++;

If you test after the increment you don't need the - 1 ;-)

> +	      line++;
> +	      if (!fgets (lines[lastline], LINELEN, file))
> +		fatal ("unexpected end of file");
> +	      strcpy (&linebuf[len - 2], lines[lastline]);

So, can this overflow linebuf?  I didn't see a test for that.

> +  /* Allocate some buffers.  */
> +  for (int i = 0; i < MAXLINES; i++)
> +    lines[i] = (char *) malloc (LINELEN);

C++ forces such unsightly casts, sigh.  Well there are worse things.
Some certain operator comes to mind.


Thanks for this cleanup!  In the new builtin definitions lines are much
shorter than before, but a few got really long anyway :-)

Okay for trunk.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 11/18] rs6000: Builtin expansion, part 6
  2021-11-04  1:24   ` Segher Boessenkool
@ 2021-11-07 15:28     ` Bill Schmidt
  2021-11-07 21:05       ` Segher Boessenkool
  0 siblings, 1 reply; 52+ messages in thread
From: Bill Schmidt @ 2021-11-07 15:28 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, dje.gcc

Hi Segher,

Thank you for all of the reviews!  I appreciate your hard work and thorough 
study of the patches.

I've updated these 6 patches and combined them into 1, pushed today.  There 
are still a couple of cleanups I haven't done, but I made note in the code
where these are needed.

Thanks again!
Bill

On 11/3/21 8:24 PM, Segher Boessenkool wrote:
> Hi!
>
> On Wed, Sep 01, 2021 at 11:13:47AM -0500, Bill Schmidt wrote:
>> Provide replacements for htm_spr_num and htm_expand_builtin.  No logic
>> changes are intended here, as usual.  Much code was factored out into
>> rs6000_expand_new_builtin, so the new version of htm_expand_builtin is
>> a little tidier.
> Nice.
>
>> Also implement the support for the "endian" and "32bit" attributes,
>> which is straightforward.  These just do icode substitution.
> Don't call this "attributes" please?  I don't know what would be a
> better name, of course.  "bif attribute" maybe?
>
>> +  rtx op[MAX_HTM_OPERANDS], pat;
> Don't declare arrays and scalars in the same statement, in general.  It
> is important that the arrays stand out.
>
> Also, don't declare things before they are used please.
>
>> +  FOR_EACH_CALL_EXPR_ARG (arg, iter, exp)
>> +    {
>> +      if (arg == error_mark_node || nopnds >= MAX_HTM_OPERANDS)
>> +	return const0_rtx;
>> +
>> +      insn_op = &insn_data[icode].operand[nopnds];
>> +      op[nopnds] = expand_normal (arg);
>> +
>> +      if (!insn_op->predicate (op[nopnds], insn_op->mode))
>> +	{
>> +	  if (!strcmp (insn_op->constraint, "n"))
>> +	    {
>> +	      int arg_num = (nonvoid) ? nopnds : nopnds + 1;
> Please don't parenthesise random expressions like "nonvoid".  I wonder
> if that can be simpler handled by just unshifting a void_node into the
> operands, btw :-)
>
> And the same "n" thing as before of course.  Since it is the same: some
> factoring would be helpful probably.
>
>> +      machine_mode mode = (TARGET_POWERPC64) ? DImode : SImode;
> Superfluous parens.  This is just "word_mode", anyway?
>
>> +  /* If this builtin accesses a CR, then pass in a scratch
>> +     CR as the last operand.  */
>> +  else if (bif_is_htmcr (*bifaddr))
>> +    {
>> +      cr = gen_reg_rtx (CCmode);
>> +      op[nopnds++] = cr;
>> +    }
> There is only one CR ("condition register").  You can say CRF here
> ("condition register field", a 4-bit thing), or just cc or CC maybe
> ("condition code").  A pet peeve, I know.
>
>> +  if (bif_is_endian (*bifaddr) && BYTES_BIG_ENDIAN)
> "is_endian" should maybe be "is_bigendian" or something like that?
>
> Okay for trunk with the changes you see fit at this time.  Thanks!
>
>
> Segher

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 11/18] rs6000: Builtin expansion, part 6
  2021-11-07 15:28     ` Bill Schmidt
@ 2021-11-07 21:05       ` Segher Boessenkool
  2021-11-08 13:16         ` Bill Schmidt
  0 siblings, 1 reply; 52+ messages in thread
From: Segher Boessenkool @ 2021-11-07 21:05 UTC (permalink / raw)
  To: Bill Schmidt; +Cc: gcc-patches, dje.gcc

Hi!

On Sun, Nov 07, 2021 at 09:28:09AM -0600, Bill Schmidt wrote:
> Thank you for all of the reviews!  I appreciate your hard work and thorough 
> study of the patches.
> 
> I've updated these 6 patches and combined them into 1, pushed today.  There 
> are still a couple of cleanups I haven't done, but I made note in the code
> where these are needed.

I did not approve the testsuite one, it needs more work?


Segher

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 11/18] rs6000: Builtin expansion, part 6
  2021-11-07 21:05       ` Segher Boessenkool
@ 2021-11-08 13:16         ` Bill Schmidt
  0 siblings, 0 replies; 52+ messages in thread
From: Bill Schmidt @ 2021-11-08 13:16 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, dje.gcc

Sorry for the misunderstanding.  What I meant is the 6 patches entitled "Builtin expansion, part N".

I still have 6-7 patches left to look at.

Thanks!
Bill

On 11/7/21 3:05 PM, Segher Boessenkool wrote:
> Hi!
>
> On Sun, Nov 07, 2021 at 09:28:09AM -0600, Bill Schmidt wrote:
>> Thank you for all of the reviews!  I appreciate your hard work and thorough 
>> study of the patches.
>>
>> I've updated these 6 patches and combined them into 1, pushed today.  There 
>> are still a couple of cleanups I haven't done, but I made note in the code
>> where these are needed.
> I did not approve the testsuite one, it needs more work?
>
>
> Segher

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 18/18] rs6000: Add escape-newline support for builtins files
  2021-11-05 23:50   ` Segher Boessenkool
@ 2021-11-08 19:40     ` Bill Schmidt
  0 siblings, 0 replies; 52+ messages in thread
From: Bill Schmidt @ 2021-11-08 19:40 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, dje.gcc

On 11/5/21 6:50 PM, Segher Boessenkool wrote:
> Hi!
>
> On Wed, Sep 01, 2021 at 11:13:54AM -0500, Bill Schmidt wrote:
>> +/* Escape-newline support.  For readability, we prefer to allow developers
>> +   to use escape-newline to continue long lines to the next one.  We
>> +   maintain a buffer of "original" lines here, which are concatenated into
>> +   linebuf, above, and which can be used to convert the virtual line
>> +   position "line / pos" into actual line and position information.  */
>> +#define MAXLINES 4
> Make this bigger already?  Or, want to bet if we will need to increase
> it for GCC 12 already?  Because for GCC 13 we almost certainly will :-)

We *could*, but honestly I don't think we'll need it anytime soon.  The only
reason we need 4 is for a single built-in that takes sixteen parameters:

+  const vsc __builtin_vec_init_v16qi (signed char, signed char, signed char, \
+            signed char, signed char, signed char, signed char, signed char, \
+            signed char, signed char, signed char, signed char, signed char, \
+            signed char, signed char, signed char);

It's hard to think of a rational built-in that will need more space than this.
We can always make it bigger later if needed.

I'll make the rest of the cleanups you suggested.  Thanks again for the review!

Bill



^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 14/18] rs6000: Debug support
  2021-11-05 21:34   ` Segher Boessenkool
@ 2021-11-09 15:06     ` Bill Schmidt
  0 siblings, 0 replies; 52+ messages in thread
From: Bill Schmidt @ 2021-11-09 15:06 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, dje.gcc


On 11/5/21 4:34 PM, Segher Boessenkool wrote:
> On Wed, Sep 01, 2021 at 11:13:50AM -0500, Bill Schmidt wrote:
>> 	* config/rs6000/rs6000-call.c (rs6000_debug_type): New function.
>> 	(def_builtin): Change debug formatting for easier parsing and
>> 	include more information.
>> 	(rs6000_init_builtins): Add dump of autogenerated builtins.
>> 	(altivec_init_builtins): Dump __builtin_altivec_mask_for_load for
>> 	completeness.
>
>> +{
>> +  if (type == void_type_node)
>> +    return "void";
>> +  else if (type == long_integer_type_node)
>> +    return "long";
>> +  else if (type == long_unsigned_type_node)
>> +    return "ulong";
>> +  else if (type == long_long_integer_type_node)
>> +    return "longlong";
>> +  else if (type == long_long_unsigned_type_node)
>> +    return "ulonglong";
>> +  else if (type == bool_V2DI_type_node)
>> +    return "vbll";
>> +  else if (type == bool_V4SI_type_node)
>> +    return "vbi";
>> +  else if (type == bool_V8HI_type_node)
>> +    return "vbs";
>> +  else if (type == bool_V16QI_type_node)
>> +    return "vbc";
>> +  else if (type == bool_int_type_node)
>> +    return "bool";
>> +  else if (type == dfloat64_type_node)
>> +    return "_Decimal64";
>> +  else if (type == double_type_node)
>> +    return "double";
>> +  else if (type == intDI_type_node)
>> +    return "sll";
>> +  else if (type == intHI_type_node)
>> +    return "ss";
>> +  else if (type == ibm128_float_type_node)
>> +    return "__ibm128";
>> +  else if (type == opaque_V4SI_type_node)
>> +    return "opaque";
>> +  else if (POINTER_TYPE_P (type))
>> +    return "void*";
>> +  else if (type == intQI_type_node || type == char_type_node)
>> +    return "sc";
>> +  else if (type == dfloat32_type_node)
>> +    return "_Decimal32";
>> +  else if (type == float_type_node)
>> +    return "float";
>> +  else if (type == intSI_type_node || type == integer_type_node)
>> +    return "si";
>> +  else if (type == dfloat128_type_node)
>> +    return "_Decimal128";
>> +  else if (type == long_double_type_node)
>> +    return "longdouble";
>> +  else if (type == intTI_type_node)
>> +    return "sq";
>> +  else if (type == unsigned_intDI_type_node)
>> +    return "ull";
>> +  else if (type == unsigned_intHI_type_node)
>> +    return "us";
>> +  else if (type == unsigned_intQI_type_node)
>> +    return "uc";
>> +  else if (type == unsigned_intSI_type_node)
>> +    return "ui";
>> +  else if (type == unsigned_intTI_type_node)
>> +    return "uq";
>> +  else if (type == unsigned_V1TI_type_node)
>> +    return "vuq";
>> +  else if (type == unsigned_V2DI_type_node)
>> +    return "vull";
>> +  else if (type == unsigned_V4SI_type_node)
>> +    return "vui";
>> +  else if (type == unsigned_V8HI_type_node)
>> +    return "vus";
>> +  else if (type == unsigned_V16QI_type_node)
>> +    return "vuc";
>> +  else if (type == V16QI_type_node)
>> +    return "vsc";
>> +  else if (type == V1TI_type_node)
>> +    return "vsq";
>> +  else if (type == V2DF_type_node)
>> +    return "vd";
>> +  else if (type == V2DI_type_node)
>> +    return "vsll";
>> +  else if (type == V4SF_type_node)
>> +    return "vf";
>> +  else if (type == V4SI_type_node)
>> +    return "vsi";
>> +  else if (type == V8HI_type_node)
>> +    return "vss";
>> +  else if (type == pixel_V8HI_type_node)
>> +    return "vp";
>> +  else if (type == pcvoid_type_node)
>> +    return "voidc*";
>> +  else if (type == float128_type_node)
>> +    return "_Float128";
>> +  else if (type == vector_pair_type_node)
>> +    return "__vector_pair";
>> +  else if (type == vector_quad_type_node)
>> +    return "__vector_quad";
>> +  else
>> +    return "unknown";
>> +}
> Please use a switch statement for this.  You can call the variable
> "type_node" then as well, which would be a good idea.
>
Unfortunately you can't have a switch on a non-integer type, so I'm afraid
I'll have to leave this as is.  I'll make all the other suggested changes.
Thanks for the review!

Bill


^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 16/18] rs6000: Test case adjustments
  2021-11-05 22:37   ` Segher Boessenkool
@ 2021-11-11 20:06     ` Bill Schmidt
  2021-11-11 20:55       ` Bill Schmidt
  0 siblings, 1 reply; 52+ messages in thread
From: Bill Schmidt @ 2021-11-11 20:06 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, dje.gcc

Hi Segher,

[Sorry to be answering these out of order...]

On 11/5/21 5:37 PM, Segher Boessenkool wrote:
> On Wed, Sep 01, 2021 at 11:13:52AM -0500, Bill Schmidt wrote:
>> 	* gcc.target/powerpc/bfp/scalar-extract-exp-2.c: Adjust.
> My favourite changelog entry!  But, adjust to what?  This is the first
> line :-)
>
> "Adjust expected error message"?

OK, I'll be a bit less succinct. :-)
>
> But you should fold this patch with some previous patch anyway, when
> committing (or you break bisecting).

Yes, I failed to mention that patches 15-17 need to go in together to avoid
bisection problems.

>
>> --- a/gcc/testsuite/gcc.target/powerpc/fold-vec-splat-floatdouble.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-splat-floatdouble.c
>> @@ -18,7 +18,7 @@ vector float test_fc ()
>>  vector double testd_00 (vector double x) { return vec_splat (x, 0b00000); }
>>  vector double testd_01 (vector double x) { return vec_splat (x, 0b00001); }
>>  vector double test_dc ()
>> -{ const vector double y = { 3.0, 5.0 }; return vec_splat (y, 0b00010); }
>> +{ const vector double y = { 3.0, 5.0 }; return vec_splat (y, 0b00001); }
>>  
>>  /* If the source vector is a known constant, we will generate a load or possibly
>>     XXSPLTIW.  */
>> @@ -28,5 +28,5 @@ vector double test_dc ()
>>  /* { dg-final { scan-assembler-times {\mvspltw\M|\mxxspltw\M} 3 } } */
>>  
>>  /* For double types, we will generate xxpermdi instructions.  */
>> -/* { dg-final { scan-assembler-times "xxpermdi" 3 } } */
>> +/* { dg-final { scan-assembler-times "xxpermdi" 2 } } */
> Why these changes?

Sorry, I should have done a better job of explaining these.  For vector
double, only one bit matters, so the bit mask 0b00010 is a nonsensical
thing to have in the test case.  Replacing that with 0b00001 resulted
in one fewer xxpermdi required.  I'm going to review this one more time
to remind myself why, since I made this change a long time ago and it's
not fresh in my mind; it made sense then! :-)

>
>> --- a/gcc/testsuite/gcc.target/powerpc/fold-vec-splat-longlong.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-splat-longlong.c
>> @@ -9,23 +9,19 @@
>>  
>>  vector bool long long testb_00 (vector bool long long x) { return vec_splat (x, 0b00000); }
>>  vector bool long long testb_01 (vector bool long long x) { return vec_splat (x, 0b00001); }
>> -vector bool long long testb_02 (vector bool long long x) { return vec_splat (x, 0b00010); }
>>  
>>  vector signed long long tests_00 (vector signed long long x) { return vec_splat (x, 0b00000); }
>>  vector signed long long tests_01 (vector signed long long x) { return vec_splat (x, 0b00001); }
>> -vector signed long long tests_02 (vector signed long long x) { return vec_splat (x, 0b00010); }
>>  
>>  vector unsigned long long testu_00 (vector unsigned long long x) { return vec_splat (x, 0b00000); }
>>  vector unsigned long long testu_01 (vector unsigned long long x) { return vec_splat (x, 0b00001); }
>> -vector unsigned long long testu_02 (vector unsigned long long x) { return vec_splat (x, 0b00010); }
>>  
>>  /* Similar test as above, but the source vector is a known constant. */
>> -vector bool long long test_bll () { const vector bool long long y = {12, 23}; return vec_splat (y, 0b00010); }
>> -vector signed long long test_sll () { const vector signed long long y = {34, 45}; return vec_splat (y, 0b00010); }
>> -vector unsigned long long test_ull () { const vector unsigned long long y = {56, 67}; return vec_splat (y, 0b00010); }
>> +vector bool long long test_bll () { const vector bool long long y = {12, 23}; return vec_splat (y, 0b00001); }
>> +vector signed long long test_sll () { const vector signed long long y = {34, 45}; return vec_splat (y, 0b00001); }
>>  
>>  /* Assorted load instructions for the initialization with known constants. */
>> -/* { dg-final { scan-assembler-times {\mlvx\M|\mlxvd2x\M|\mlxv\M|\mplxv\M} 3 } } */
>> +/* { dg-final { scan-assembler-times {\mlvx\M|\mlxvd2x\M|\mlxv\M|\mplxv\M|\mxxspltib\M} 2 } } */
>>  
>>  /* xxpermdi for vec_splat of long long vectors.
>>   At the time of this writing, the number of xxpermdi instructions
> Ditto.

Same issue.  0b00010 makes no sense for vector long long.  I need to remind
myself about the change in counts here as well.

>
>> --- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
>> @@ -11,9 +11,9 @@
>>  /* { dg-final { scan-assembler-times {\mvrlq\M} 2 } } */
>>  /* { dg-final { scan-assembler-times {\mvrlqnm\M} 2 } } */
>>  /* { dg-final { scan-assembler-times {\mvrlqmi\M} 2 } } */
>> -/* { dg-final { scan-assembler-times {\mvcmpequq\M} 16 } } */
>> -/* { dg-final { scan-assembler-times {\mvcmpgtsq\M} 16 } } */
>> -/* { dg-final { scan-assembler-times {\mvcmpgtuq\M} 16 } } */
>> +/* { dg-final { scan-assembler-times {\mvcmpequq\M} 24 } } */
>> +/* { dg-final { scan-assembler-times {\mvcmpgtsq\M} 26 } } */
>> +/* { dg-final { scan-assembler-times {\mvcmpgtuq\M} 26 } } */
>>  /* { dg-final { scan-assembler-times {\mvmuloud\M} 1 } } */
>>  /* { dg-final { scan-assembler-times {\mvmulesd\M} 1 } } */
>>  /* { dg-final { scan-assembler-times {\mvmulosd\M} 1 } } */
> And this?

Again I'm a little sketchy on the details, but I believe this resulted
from some of the vector compares having been previously omitted by
accident from gimple expansion.  When I added them in for the new
support, that gave us increased counts here because the code generation
was improved.  I'll double-check this one as well to provide a more
certain explanation.

>
>> --- a/gcc/testsuite/gcc.target/powerpc/p8vector-builtin-8.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/p8vector-builtin-8.c
>> @@ -126,6 +126,7 @@ void foo (vector signed char *vscr,
>>  /* { dg-final { scan-assembler-times "vsubcuw" 4 } } */
>>  /* { dg-final { scan-assembler-times "vsubuwm" 4 } } */
>>  /* { dg-final { scan-assembler-times "vbpermq" 2 } } */
>> +/* { dg-final { scan-assembler-times "vbpermd" 0 } } */
>>  /* { dg-final { scan-assembler-times "xxleqv" 4 } } */
>>  /* { dg-final { scan-assembler-times "vgbbd" 1 } } */
>>  /* { dg-final { scan-assembler-times "xxlnand" 4 } } */
> This curious one could have been a separate (obvious) patch.  It is a
> bit out-of-place here.

Yeah, bit of a head-scratcher, this.  The test case probably went
through a few revisions.  I'll test it once more and commit it
separately.

>
>> --- a/gcc/testsuite/gcc.target/powerpc/pragma_power8.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/pragma_power8.c
>> @@ -19,6 +19,7 @@ test1 (vector int a, vector int b)
>>  #pragma GCC target ("cpu=power7")
>>  /* Force a re-read of altivec.h with new cpu target. */
>>  #undef _ALTIVEC_H
>> +#undef _RS6000_VECDEFINES_H
>>  #include <altivec.h>
> Wow ugly :-)  But nothing new here, heh.  Best not to look at testcase
> internals too closely, in any case.
>
>> --- a/gcc/testsuite/gcc.target/powerpc/test_mffsl.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/test_mffsl.c
>> @@ -1,5 +1,6 @@
>>  /* { dg-do run { target { powerpc*-*-* } } } */
>> -/* { dg-options "-O2 -std=c99" } */
>> +/* { dg-options "-O2 -std=c99 -mcpu=power9" } */
>> +/* { dg-require-effective-target powerpc_p9vector_ok } */
>>  
>>  #ifdef DEBUG
>>  #include <stdio.h>
> This one is a bug fix as well (and obvious).

Yeah. :-(  Will handle.
>
>> --- a/gcc/testsuite/gcc.target/powerpc/vsu/vec-all-nez-7.c
>> +++ b/gcc/testsuite/gcc.target/powerpc/vsu/vec-all-nez-7.c
>> @@ -12,5 +12,5 @@ test_all_not_equal_and_not_zero (vector unsigned short *arg1_p,
>>    vector unsigned short arg_2 = *arg2_p;
>>  
>>    return __builtin_vec_vcmpnez_p (__CR6_LT, arg_1, arg_2);
>> -  /* { dg-error "'__builtin_altivec_vcmpnezh_p' requires the '-mcpu=power9' option" "" { target *-*-* } .-1 } */
>> +  /* { dg-error "'__builtin_altivec_vcmpnezh_p' requires the '-mpower9-vector' option" "" { target *-*-* } .-1 } */
>>  }
> Hrm.  People should not use the -mpower9-vector option (except implied
> by -mcpu=power9, without vectors disabled).  How hard is it to give a
> better error message here?

Yeah, agreed, I think I can fix that easily enough.  There may be similar
issues with -mpower8-vector as well that should be fixed.

Thanks for the review!  I'll get back on this one soon.

Bill

>
> The obvious bugfixes independent of this series are of course okay for
> trunk, as separate patches, now.  But some more work is needed
> elsewhere.
>
>
> Segher

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [PATCH 16/18] rs6000: Test case adjustments
  2021-11-11 20:06     ` Bill Schmidt
@ 2021-11-11 20:55       ` Bill Schmidt
  0 siblings, 0 replies; 52+ messages in thread
From: Bill Schmidt @ 2021-11-11 20:55 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, dje.gcc

On 11/11/21 2:06 PM, Bill Schmidt wrote:
>
>>> --- a/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
>>> +++ b/gcc/testsuite/gcc.target/powerpc/int_128bit-runnable.c
>>> @@ -11,9 +11,9 @@
>>>  /* { dg-final { scan-assembler-times {\mvrlq\M} 2 } } */
>>>  /* { dg-final { scan-assembler-times {\mvrlqnm\M} 2 } } */
>>>  /* { dg-final { scan-assembler-times {\mvrlqmi\M} 2 } } */
>>> -/* { dg-final { scan-assembler-times {\mvcmpequq\M} 16 } } */
>>> -/* { dg-final { scan-assembler-times {\mvcmpgtsq\M} 16 } } */
>>> -/* { dg-final { scan-assembler-times {\mvcmpgtuq\M} 16 } } */
>>> +/* { dg-final { scan-assembler-times {\mvcmpequq\M} 24 } } */
>>> +/* { dg-final { scan-assembler-times {\mvcmpgtsq\M} 26 } } */
>>> +/* { dg-final { scan-assembler-times {\mvcmpgtuq\M} 26 } } */
>>>  /* { dg-final { scan-assembler-times {\mvmuloud\M} 1 } } */
>>>  /* { dg-final { scan-assembler-times {\mvmulesd\M} 1 } } */
>>>  /* { dg-final { scan-assembler-times {\mvmulosd\M} 1 } } */
>> And this?
> Again I'm a little sketchy on the details, but I believe this resulted
> from some of the vector compares having been previously omitted by
> accident from gimple expansion.  When I added them in for the new
> support, that gave us increased counts here because the code generation
> was improved.  I'll double-check this one as well to provide a more
> certain explanation.

Upon review [1], it was the other way around.  I removed some of the
builtins from early gimple expansion because if we expand those early,
we get poor code generation instead of the vector compare instructions
we want.  As a result we get more matches in this test case.

Thanks!
Bill

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576526.html

>
>>> --- a/gcc/testsuite/gcc.target/powerpc/p8vector-builtin-8.c
>>> +++ b/gcc/testsuite/gcc.target/powerpc/p8vector-builtin-8.c
>>> @@ -126,6 +126,7 @@ void foo (vector signed char *vscr,
>>>  /* { dg-final { scan-assembler-times "vsubcuw" 4 } } */
>>>  /* { dg-final { scan-assembler-times "vsubuwm" 4 } } */
>>>  /* { dg-final { scan-assembler-times "vbpermq" 2 } } */
>>> +/* { dg-final { scan-assembler-times "vbpermd" 0 } } */
>>>  /* { dg-final { scan-assembler-times "xxleqv" 4 } } */
>>>  /* { dg-final { scan-assembler-times "vgbbd" 1 } } */
>>>  /* { dg-final { scan-assembler-times "xxlnand" 4 } } */
>> This curious one could have been a separate (obvious) patch.  It is a
>> bit out-of-place here.
> Yeah, bit of a head-scratcher, this.  The test case probably went
> through a few revisions.  I'll test it once more and commit it
> separately.
>
>>> --- a/gcc/testsuite/gcc.target/powerpc/pragma_power8.c
>>> +++ b/gcc/testsuite/gcc.target/powerpc/pragma_power8.c
>>> @@ -19,6 +19,7 @@ test1 (vector int a, vector int b)
>>>  #pragma GCC target ("cpu=power7")
>>>  /* Force a re-read of altivec.h with new cpu target. */
>>>  #undef _ALTIVEC_H
>>> +#undef _RS6000_VECDEFINES_H
>>>  #include <altivec.h>
>> Wow ugly :-)  But nothing new here, heh.  Best not to look at testcase
>> internals too closely, in any case.
>>
>>> --- a/gcc/testsuite/gcc.target/powerpc/test_mffsl.c
>>> +++ b/gcc/testsuite/gcc.target/powerpc/test_mffsl.c
>>> @@ -1,5 +1,6 @@
>>>  /* { dg-do run { target { powerpc*-*-* } } } */
>>> -/* { dg-options "-O2 -std=c99" } */
>>> +/* { dg-options "-O2 -std=c99 -mcpu=power9" } */
>>> +/* { dg-require-effective-target powerpc_p9vector_ok } */
>>>  
>>>  #ifdef DEBUG
>>>  #include <stdio.h>
>> This one is a bug fix as well (and obvious).
> Yeah. :-(  Will handle.
>>> --- a/gcc/testsuite/gcc.target/powerpc/vsu/vec-all-nez-7.c
>>> +++ b/gcc/testsuite/gcc.target/powerpc/vsu/vec-all-nez-7.c
>>> @@ -12,5 +12,5 @@ test_all_not_equal_and_not_zero (vector unsigned short *arg1_p,
>>>    vector unsigned short arg_2 = *arg2_p;
>>>  
>>>    return __builtin_vec_vcmpnez_p (__CR6_LT, arg_1, arg_2);
>>> -  /* { dg-error "'__builtin_altivec_vcmpnezh_p' requires the '-mcpu=power9' option" "" { target *-*-* } .-1 } */
>>> +  /* { dg-error "'__builtin_altivec_vcmpnezh_p' requires the '-mpower9-vector' option" "" { target *-*-* } .-1 } */
>>>  }
>> Hrm.  People should not use the -mpower9-vector option (except implied
>> by -mcpu=power9, without vectors disabled).  How hard is it to give a
>> better error message here?
> Yeah, agreed, I think I can fix that easily enough.  There may be similar
> issues with -mpower8-vector as well that should be fixed.
>
> Thanks for the review!  I'll get back on this one soon.
>
> Bill
>
>> The obvious bugfixes independent of this series are of course okay for
>> trunk, as separate patches, now.  But some more work is needed
>> elsewhere.
>>
>>
>> Segher

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2021-11-11 20:55 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-01 16:13 [PATCHv5 00/18] Replace the Power target-specific builtin machinery Bill Schmidt
2021-09-01 16:13 ` [PATCH 01/18] rs6000: Handle overloads during program parsing Bill Schmidt
2021-09-13 17:17   ` will schmidt
2021-09-13 23:53   ` Segher Boessenkool
2021-09-01 16:13 ` [PATCH 02/18] rs6000: Move __builtin_mffsl to the [always] stanza Bill Schmidt
2021-09-13 17:53   ` will schmidt
2021-09-16 22:52   ` Segher Boessenkool
2021-09-01 16:13 ` [PATCH 03/18] rs6000: Handle gimple folding of target built-ins Bill Schmidt
2021-09-13 18:42   ` will schmidt
2021-09-14 22:36     ` Bill Schmidt
2021-09-16 22:58   ` Segher Boessenkool
2021-09-01 16:13 ` [PATCH 04/18] rs6000: Handle some recent MMA builtin changes Bill Schmidt
2021-09-13 19:02   ` will schmidt
2021-09-16 23:38   ` Segher Boessenkool
2021-09-17 15:14     ` Bill Schmidt
2021-09-01 16:13 ` [PATCH 05/18] rs6000: Support for vectorizing built-in functions Bill Schmidt
2021-09-13 19:29   ` will schmidt
2021-09-17 12:17   ` Segher Boessenkool
2021-09-01 16:13 ` [PATCH 06/18] rs6000: Builtin expansion, part 1 Bill Schmidt
2021-10-31  3:24   ` Segher Boessenkool
2021-09-01 16:13 ` [PATCH 07/18] rs6000: Builtin expansion, part 2 Bill Schmidt
2021-11-01 12:18   ` Segher Boessenkool
2021-09-01 16:13 ` [PATCH 08/18] rs6000: Builtin expansion, part 3 Bill Schmidt
2021-11-03  1:15   ` Segher Boessenkool
2021-09-01 16:13 ` [PATCH 09/18] rs6000: Builtin expansion, part 4 Bill Schmidt
2021-11-03  1:52   ` Segher Boessenkool
2021-09-01 16:13 ` [PATCH 10/18] rs6000: Builtin expansion, part 5 Bill Schmidt
2021-11-04  0:55   ` Segher Boessenkool
2021-09-01 16:13 ` [PATCH 11/18] rs6000: Builtin expansion, part 6 Bill Schmidt
2021-11-04  1:24   ` Segher Boessenkool
2021-11-07 15:28     ` Bill Schmidt
2021-11-07 21:05       ` Segher Boessenkool
2021-11-08 13:16         ` Bill Schmidt
2021-09-01 16:13 ` [PATCH 12/18] rs6000: Update rs6000_builtin_decl Bill Schmidt
2021-11-05 20:27   ` Segher Boessenkool
2021-09-01 16:13 ` [PATCH 13/18] rs6000: Miscellaneous uses of rs6000_builtins_decl_x Bill Schmidt
2021-11-05 20:36   ` Segher Boessenkool
2021-09-01 16:13 ` [PATCH 14/18] rs6000: Debug support Bill Schmidt
2021-11-05 21:34   ` Segher Boessenkool
2021-11-09 15:06     ` Bill Schmidt
2021-09-01 16:13 ` [PATCH 15/18] rs6000: Update altivec.h for automated interfaces Bill Schmidt
2021-11-05 22:08   ` Segher Boessenkool
2021-09-01 16:13 ` [PATCH 16/18] rs6000: Test case adjustments Bill Schmidt
2021-11-05 22:37   ` Segher Boessenkool
2021-11-11 20:06     ` Bill Schmidt
2021-11-11 20:55       ` Bill Schmidt
2021-09-01 16:13 ` [PATCH 17/18] rs6000: Enable the new builtin support Bill Schmidt
2021-11-05 22:10   ` Segher Boessenkool
2021-09-01 16:13 ` [PATCH 18/18] rs6000: Add escape-newline support for builtins files Bill Schmidt
2021-11-05 23:50   ` Segher Boessenkool
2021-11-08 19:40     ` Bill Schmidt
2021-09-13 13:33 ` [PATCHv5 00/18] Replace the Power target-specific builtin machinery Bill Schmidt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).