From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <meissner@sourceware.org>
Received: by sourceware.org (Postfix, from userid 1005)
	id A575E385840E; Tue, 23 Jan 2024 07:36:47 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A575E385840E
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1705995407;
	bh=MzwvDqH8kSddujJojqGEiWh2yN4OkOu4Um5vuQblYO0=;
	h=From:To:Subject:Date:From;
	b=Bbc5Wz1lvPP784mM0uGMujbNPn3xI2AoMj4tViQLKWthyJ+DDwAGmz9BZVgbgGiId
	 nTsAozqncCzhEb12bd2SFR44z58176/sA6JsdTdz0Y+1aRd5jPBKhYe9IE0Jqv00xP
	 3X1Q6ghRdZmsnICy0jYA/ww3GkzH+w2dSfFEX9yg=
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
From: Michael Meissner <meissner@gcc.gnu.org>
To: gcc-cvs@gcc.gnu.org
Subject: [gcc(refs/users/meissner/heads/work154-vcombo)] Add vector pair
 built-in functions.
X-Act-Checkin: gcc
X-Git-Author: Michael Meissner <meissner@linux.ibm.com>
X-Git-Refname: refs/users/meissner/heads/work154-vcombo
X-Git-Oldrev: 2ffec8b255662b47175aabab782f63a73ef0755a
X-Git-Newrev: ea987aafefb63acd3b6ee5a9c850d55a09be6f1d
Message-Id: <20240123073647.A575E385840E@sourceware.org>
Date: Tue, 23 Jan 2024 07:36:47 +0000 (GMT)
List-Id: <gcc-cvs.sourceware.org>

https://gcc.gnu.org/g:ea987aafefb63acd3b6ee5a9c850d55a09be6f1d

commit ea987aafefb63acd3b6ee5a9c850d55a09be6f1d
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Tue Jan 23 02:34:39 2024 -0500

    Add vector pair built-in functions.
    
    2024-01-23  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/vector-pair.md (vpair_add_neg_<vpair_modename>3): New
            combiner insn to convert vector plus/neg into a minus operation.
            (vpair_fma_<vpair_modename>_merge): Optimize multiply, add/subtract, and
            negation into fma operations if the user specifies to create fmas.
            (vpair_fma_<vpair_modename>_merge): Likewise.
            (vpair_fma_<vpair_modename>_merge2): Likewise.
            (vpair_nfma_<vpair_modename>_merge): Likewise.
            (vpair_nfms_<vpair_modename>_merge): Likewise.
            (vpair_nfms_<vpair_modename>_merge2): Likewise.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/vector-pair-7.c: New test.
            * gcc.target/powerpc/vector-pair-8.c: Likewise.
            * gcc.target/powerpc/vector-pair-9.c: Likewise.
            * gcc.target/powerpc/vector-pair-10.c: Likewise.
            * gcc.target/powerpc/vector-pair-11.c: Likewise.
            * gcc.target/powerpc/vector-pair-12xs.c: Likewise.
    
    2024-01-23  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/rs6000-builtins.def (__builtin_vpair_zero): New
            built-in function.
            (__builtin_vpair_f32_splat): Likewise.
            (__builtin_vpair_f64_splat): Likewise.
            * config/rs6000/vector-pair.md (UNSPEC_VPAIR_ZERO): New unspec.
            (UNSPEC_VPAIR_SPLAT): Likewise.
            (VPAIR_SPLAT_VMODE): New mode iterator.
            (VPAIR_SPLAT_ELEMENT_TO_VMODE): New mode attribute.
            (vpair_splat_name): Likewise.
            (vpair_zero): New insn.
            (vpair_splat_<vpair_splat_name>): New define_expand.
            (vpair_splat_<vpair_splat_name>_internal): New insns.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/vector-pair-5.c: New test.
            * gcc.target/powerpc/vector-pair-6.c: Likewise.
    
    2024-01-23  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/rs6000-builtins.def (__builtin_vpair_f32_fma): New
            built-in.
            (__builtin_vpair_f32_fms): Likewise.
            (__builtin_vpair_f32_nfma): Likewise.
            (__builtin_vpair_f32_nfms): Likewise.
            (__builtin_vpair_f64_fma): Likewise.
            (__builtin_vpair_f64_fms): Likewise.
            (__builtin_vpair_f64_nfma): Likewise.
            * config/rs6000/rs6000/rs6000-proto.h (enum vpair_split_fma): New
            enumeration.
            (vpair_split_fma): New declaration.
            * config/rs6000/rs6000.cc (vpair_split_fma): New function to split
            vector pair FMA operations.
            * config/rs6000/vector-pair.md (UNSPEC_VPAIR_FMA): New unspec.
            (vpair_stdname): Add UNSPEC_VPAIR_FMA.
            (VPAIR_OP): Likewise.
            (vpair_fma_<vpair_modename>4): New insns.
            (vpair_fms_<vpair_modename>4): Likewise.
            (vpair_nfma_<vpair_modename>4): Likewise.
            (vpair_nfms_<vpair_modename>4): Likewise.
            * doc/extend.texi (PowerPC Vector Pair Built-in Functions): Document new
            vector pair fma built-in functions.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/vector-pair-3.c: New test.
            * gcc.target/powerpc/vector-pair-4.c: Likewise.
    
    2024-01-23  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/rs6000-builtins.def (__builtin_vpair_*): Add new
            built-in functions for vector pair support.
            * config/rs6000/rs6000-protos.h (enum vpair_split_unary): New
            enumeration.
            (vpair_split_unary): New declaration.
            (vpair_split_binary): Likewise.
            * config/rs6000/rs6000.cc (vpair_split_unary): New function to split
            vector pair operations.
            (vpair_split_binary): Likewise.
            * config/rs6000/rs6000.md (toplevel): Include vector-pair.md.
            * config/rs6000/t-rs6000 (MD_INCLUDES): Add vector-pair.md.
            * config/rs6000/vector-pair.md: New file.
            * doc/extend.texi (PowerPC Vector Pair Built-in Functions): Add
            documentation for the new vector pair built-in functions.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/vector-pair-1.c: New test.
            * gcc.target/powerpc/vector-pair-2.c: Likewise.

Diff:
---
 gcc/config/rs6000/rs6000-builtins.def             |  90 ++++
 gcc/config/rs6000/rs6000-protos.h                 |  25 +
 gcc/config/rs6000/rs6000.cc                       | 138 +++++
 gcc/config/rs6000/rs6000.md                       |   1 +
 gcc/config/rs6000/t-rs6000                        |   1 +
 gcc/config/rs6000/vector-pair.md                  | 580 ++++++++++++++++++++++
 gcc/doc/extend.texi                               |  85 ++++
 gcc/testsuite/gcc.target/powerpc/vector-pair-1.c  |  87 ++++
 gcc/testsuite/gcc.target/powerpc/vector-pair-10.c |  61 +++
 gcc/testsuite/gcc.target/powerpc/vector-pair-11.c |  65 +++
 gcc/testsuite/gcc.target/powerpc/vector-pair-12.c |  65 +++
 gcc/testsuite/gcc.target/powerpc/vector-pair-2.c  |  86 ++++
 gcc/testsuite/gcc.target/powerpc/vector-pair-3.c  |  57 +++
 gcc/testsuite/gcc.target/powerpc/vector-pair-4.c  |  57 +++
 gcc/testsuite/gcc.target/powerpc/vector-pair-5.c  |  56 +++
 gcc/testsuite/gcc.target/powerpc/vector-pair-6.c  |  56 +++
 gcc/testsuite/gcc.target/powerpc/vector-pair-7.c  |  18 +
 gcc/testsuite/gcc.target/powerpc/vector-pair-8.c  |  18 +
 gcc/testsuite/gcc.target/powerpc/vector-pair-9.c  |  61 +++
 19 files changed, 1607 insertions(+)

diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def
index 3bc7fed6956..b757a8630ff 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -4131,3 +4131,93 @@
 
   void __builtin_vsx_stxvp (v256, unsigned long, const v256 *);
     STXVP nothing {mma,pair}
+
+;; Vector pair built-in functions.
+  v256 __builtin_vpair_zero ();
+    VPAIR_ZERO vpair_zero {mma}
+
+;; Vector pair built-in functions with float elements
+  v256 __builtin_vpair_f32_abs (v256);
+    VPAIR_F32_ABS vpair_abs_v8sf2 {mma}
+
+  v256 __builtin_vpair_f32_add (v256, v256);
+    VPAIR_F32_ADD vpair_add_v8sf3 {mma}
+
+  v256 __builtin_vpair_f32_div (v256, v256);
+    VPAIR_F32_DIV vpair_div_v8sf3 {mma}
+
+  v256 __builtin_vpair_f32_fma (v256, v256, v256);
+    VPAIR_F32_FMA vpair_fma_v8sf4 {mma}
+
+  v256 __builtin_vpair_f32_fms (v256, v256, v256);
+    VPAIR_F32_FMS vpair_fms_v8sf4 {mma}
+
+  v256 __builtin_vpair_f32_max (v256, v256);
+    VPAIR_F32_MAX vpair_smax_v8sf3 {mma}
+
+  v256 __builtin_vpair_f32_min (v256, v256);
+    VPAIR_F32_MIN vpair_smin_v8sf3 {mma}
+
+  v256 __builtin_vpair_f32_mul (v256, v256);
+    VPAIR_F32_MUL vpair_mul_v8sf3 {mma}
+
+  v256 __builtin_vpair_f32_nabs (v256);
+    VPAIR_F32_NABS vpair_nabs_v8sf2 {mma}
+
+  v256 __builtin_vpair_f32_neg (v256);
+    VPAIR_F32_NEG vpair_neg_v8sf2 {mma}
+
+  v256 __builtin_vpair_f32_nfma (v256, v256, v256);
+    VPAIR_F32_NFMA vpair_nfma_v8sf4 {mma}
+
+  v256 __builtin_vpair_f32_nfms (v256, v256, v256);
+    VPAIR_F32_NFMS vpair_nfms_v8sf4 {mma}
+
+  v256 __builtin_vpair_f32_splat (float);
+    VPAIR_F32_SPLAT vpair_splat_v8sf {mma}
+
+  v256 __builtin_vpair_f32_sub (v256, v256);
+    VPAIR_F32_SUB vpair_sub_v8sf3 {mma}
+
+;; Vector pair built-in functions with double elements
+  v256 __builtin_vpair_f64_abs (v256);
+    VPAIR_F64_ABS vpair_abs_v4df2 {mma}
+
+  v256 __builtin_vpair_f64_add (v256, v256);
+    VPAIR_F64_ADD vpair_add_v4df3 {mma}
+
+  v256 __builtin_vpair_f64_div (v256, v256);
+    VPAIR_F64_DIV vpair_div_v4df3 {mma}
+
+  v256 __builtin_vpair_f64_fma (v256, v256, v256);
+    VPAIR_F64_FMA vpair_fma_v4df4 {mma}
+
+  v256 __builtin_vpair_f64_fms (v256, v256, v256);
+    VPAIR_F64_FMS vpair_fms_v4df4 {mma}
+
+  v256 __builtin_vpair_f64_max (v256, v256);
+    VPAIR_F64_MAX vpair_smax_v4df3 {mma}
+
+  v256 __builtin_vpair_f64_min (v256, v256);
+    VPAIR_F64_MIN vpair_smin_v4df3 {mma}
+
+  v256 __builtin_vpair_f64_mul (v256, v256);
+    VPAIR_F64_MUL vpair_mul_v4df3 {mma}
+
+  v256 __builtin_vpair_f64_nabs (v256);
+    VPAIR_F64_NABS vpair_nabs_v4df2 {mma}
+
+  v256 __builtin_vpair_f64_neg (v256);
+    VPAIR_F64_NEG vpair_neg_v4df2 {mma}
+
+  v256 __builtin_vpair_f64_nfma (v256, v256, v256);
+    VPAIR_F64_NFMA vpair_nfma_v4df4 {mma}
+
+  v256 __builtin_vpair_f64_nfms (v256, v256, v256);
+    VPAIR_F64_NFMS vpair_nfms_v4df4 {mma}
+
+  v256 __builtin_vpair_f64_splat (double);
+    VPAIR_F64_SPLAT vpair_splat_v4df {mma}
+
+  v256 __builtin_vpair_f64_sub (v256, v256);
+    VPAIR_F64_SUB vpair_sub_v4df3 {mma}
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 09a57a806fa..aed4081c87b 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -162,6 +162,31 @@ extern bool rs6000_pcrel_p (void);
 extern bool rs6000_fndecl_pcrel_p (const_tree);
 extern void rs6000_output_addr_vec_elt (FILE *, int);
 
+/* If we are splitting a vector pair unary operator into two separate vector
+   operations, we need to generate a NEG if this is NABS.  */
+
+enum vpair_split_unary {
+  VPAIR_SPLIT_NORMAL,		/* No extra processing is needed.  */
+  VPAIR_SPLIT_NEGATE		/* Wrap operation with a NEG.  */
+};
+
+extern void vpair_split_unary (rtx [], machine_mode, enum rtx_code,
+			       enum vpair_split_unary);
+extern void vpair_split_binary (rtx [], machine_mode, enum rtx_code);
+
+/* When we are splitting a vector pair FMA operation into two vector operations, we
+   may need to modify the code generated.  This enumeration encodes the
+   different choices.  */
+
+enum vpair_split_fma {
+  VPAIR_SPLIT_FMA,		/* Fused multiply-add.  */
+  VPAIR_SPLIT_FMS,		/* Fused multiply-subtract.  */
+  VPAIR_SPLIT_NFMA,		/* Fused negate multiply-add.  */
+  VPAIR_SPLIT_NFMS		/* Fused negate multiply-subtract.  */
+};
+
+extern void vpair_split_fma (rtx [], machine_mode, enum vpair_split_fma);
+
 /* Different PowerPC instruction formats that are used by GCC.  There are
    various other instruction formats used by the PowerPC hardware, but these
    formats are not currently used by GCC.  */
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index f3aa1c15f68..055cc55ffc9 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -29391,7 +29391,145 @@ rs6000_opaque_type_invalid_use_p (gimple *stmt)
 
   return false;
 }
+
+/* Split vector pair unary operations.  */
+
+void
+vpair_split_unary (rtx operands[],			/* Dest, input.  */
+		   machine_mode vmode,			/* Vector mode.  */
+		   enum rtx_code code,			/* Operator code.  */
+		   enum vpair_split_unary action)	/* Action to take.  */
+{
+  rtx op0 = operands[0];
+  machine_mode mode0 = GET_MODE (op0);
+  gcc_assert (GET_MODE_SIZE (mode0) == 32);
+  rtx op0_a = simplify_gen_subreg (vmode, op0, mode0, 0);
+  rtx op0_b = simplify_gen_subreg (vmode, op0, mode0, 16);
+
+  rtx op1 = operands[1];
+  machine_mode mode1 = GET_MODE (op1);
+  gcc_assert (GET_MODE_SIZE (mode0) == 32);
+  rtx op1_a = simplify_gen_subreg (vmode, op1, mode1, 0);
+  rtx op1_b = simplify_gen_subreg (vmode, op1, mode1, 16);
+
+  rtx operation_a = gen_rtx_fmt_e (code, vmode, op1_a);
+  rtx operation_b = gen_rtx_fmt_e (code, vmode, op1_b);
+
+  if (action == VPAIR_SPLIT_NEGATE)
+    {
+      operation_a = gen_rtx_NEG (vmode, operation_a);
+      operation_b = gen_rtx_NEG (vmode, operation_b);
+    }
+
+  emit_insn (gen_rtx_SET (op0_a, operation_a));
+  emit_insn (gen_rtx_SET (op0_b, operation_b));
+  return;
+}
+
+/* Split vector pair binary operations.  */
+
+void
+vpair_split_binary (rtx operands[],			/* Dest, 2 inputs.  */
+		    machine_mode vmode,			/* Vector mode.  */
+		    enum rtx_code code)			/* Operator code.  */
+{
+  rtx op0 = operands[0];
+  machine_mode mode0 = GET_MODE (op0);
+  gcc_assert (GET_MODE_SIZE (mode0) == 32);
+  rtx op0_a = simplify_gen_subreg (vmode, op0, mode0, 0);
+  rtx op0_b = simplify_gen_subreg (vmode, op0, mode0, 16);
+
+  rtx op1 = operands[1];
+  machine_mode mode1 = GET_MODE (op1);
+  gcc_assert (GET_MODE_SIZE (mode1) == 32);
+  rtx op1_a = simplify_gen_subreg (vmode, op1, mode1, 0);
+  rtx op1_b = simplify_gen_subreg (vmode, op1, mode1, 16);
+
+  rtx op2 = operands[2];
+  machine_mode mode2 = GET_MODE (op2);
+  gcc_assert (GET_MODE_SIZE (mode2) == 32);
+  rtx op2_a = simplify_gen_subreg (vmode, op2, mode2, 0);
+  rtx op2_b = simplify_gen_subreg (vmode, op2, mode2, 16);
+
+  rtx operation_a = gen_rtx_fmt_ee (code, vmode, op1_a, op2_a);
+  rtx operation_b = gen_rtx_fmt_ee (code, vmode, op1_b, op2_b);
+
+  emit_insn (gen_rtx_SET (op0_a, operation_a));
+  emit_insn (gen_rtx_SET (op0_b, operation_b));
+  return;
+}
+
+/* Split vector pair fma operations.  */
+
+void
+vpair_split_fma (rtx operands[],			/* Dest, 3 inputs.  */
+		 machine_mode vmode,			/* Vector mode.  */
+		 enum vpair_split_fma action)		/* Action to take.  */
+{
+  rtx op0 = operands[0];
+  machine_mode mode0 = GET_MODE (op0);
+  gcc_assert (GET_MODE_SIZE (mode0) == 32);
+  rtx op0_a = simplify_gen_subreg (vmode, op0, mode0, 0);
+  rtx op0_b = simplify_gen_subreg (vmode, op0, mode0, 16);
+
+  rtx op1 = operands[1];
+  machine_mode mode1 = GET_MODE (op1);
+  gcc_assert (GET_MODE_SIZE (mode1) == 32);
+  rtx op1_a = simplify_gen_subreg (vmode, op1, mode1, 0);
+  rtx op1_b = simplify_gen_subreg (vmode, op1, mode1, 16);
+
+  rtx op2 = operands[2];
+  machine_mode mode2 = GET_MODE (op2);
+  gcc_assert (GET_MODE_SIZE (mode2) == 32);
+  rtx op2_a = simplify_gen_subreg (vmode, op2, mode2, 0);
+  rtx op2_b = simplify_gen_subreg (vmode, op2, mode2, 16);
+
+  rtx op3 = operands[3];
+  machine_mode mode3 = GET_MODE (op3);
+  gcc_assert (GET_MODE_SIZE (mode3) == 32);
+  rtx op3_a = simplify_gen_subreg (vmode, op3, mode3, 0);
+  rtx op3_b = simplify_gen_subreg (vmode, op3, mode3, 16);
+
+  switch (action)
+    {
+    case VPAIR_SPLIT_FMA:
+    case VPAIR_SPLIT_NFMA:
+      break;
+
+    case VPAIR_SPLIT_FMS:
+    case VPAIR_SPLIT_NFMS:
+      op3_a = gen_rtx_NEG (vmode, op3_a);
+      op3_b = gen_rtx_NEG (vmode, op3_b);
+      break;
+
+    default:
+      gcc_unreachable ();
+    }
+
+  rtx operation_a = gen_rtx_fmt_eee (FMA, vmode, op1_a, op2_a, op3_a);
+  rtx operation_b = gen_rtx_fmt_eee (FMA, vmode, op1_b, op2_b, op3_b);
+
+  switch (action)
+    {
+    case VPAIR_SPLIT_FMA:
+    case VPAIR_SPLIT_FMS:
+      break;
+
+    case VPAIR_SPLIT_NFMA:
+    case VPAIR_SPLIT_NFMS:
+      operation_a = gen_rtx_NEG (vmode, operation_a);
+      operation_b = gen_rtx_NEG (vmode, operation_b);
+      break;
+
+    default:
+      gcc_unreachable ();
+    }
 
+  emit_insn (gen_rtx_SET (op0_a, operation_a));
+  emit_insn (gen_rtx_SET (op0_b, operation_b));
+  return;
+}
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-rs6000.h"
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 4acb4031ae0..129e1ce74e2 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -15834,6 +15834,7 @@
 (include "vsx.md")
 (include "altivec.md")
 (include "mma.md")
+(include "vector-pair.md")
 (include "dfp.md")
 (include "crypto.md")
 (include "htm.md")
diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000
index b3ce09d523b..64655ef38b8 100644
--- a/gcc/config/rs6000/t-rs6000
+++ b/gcc/config/rs6000/t-rs6000
@@ -128,6 +128,7 @@ MD_INCLUDES = $(srcdir)/config/rs6000/rs64.md \
 	$(srcdir)/config/rs6000/vsx.md \
 	$(srcdir)/config/rs6000/altivec.md \
 	$(srcdir)/config/rs6000/mma.md \
+	$(srcdir)/config/rs6000/vector-pair.md \
 	$(srcdir)/config/rs6000/crypto.md \
 	$(srcdir)/config/rs6000/htm.md \
 	$(srcdir)/config/rs6000/dfp.md \
diff --git a/gcc/config/rs6000/vector-pair.md b/gcc/config/rs6000/vector-pair.md
new file mode 100644
index 00000000000..7a81acbdc05
--- /dev/null
+++ b/gcc/config/rs6000/vector-pair.md
@@ -0,0 +1,580 @@
+;; Vector pair arithmetic support.
+;; Copyright (C) 2020-2023 Free Software Foundation, Inc.
+;; Contributed by Peter Bergner <bergner@linux.ibm.com> and
+;;		  Michael Meissner <meissner@linux.ibm.com>
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published
+;; by the Free Software Foundation; either version 3, or (at your
+;; option) any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but WITHOUT
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+;; License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+;;
+;; This file adds support for doing vector operations on pairs of vector
+;; registers.  Most of the instructions use vector pair instructions to load
+;; and possibly store registers, but splitting the operation after register
+;; allocation to do 2 separate operations.  The second scheduler pass can
+;; interleave other instructions between these pairs of instructions if
+;; possible.
+
+;; We use UNSPEC to identify the representation for the operation rather than
+;; SUBREG, because SUBREG tends to generate extra moves.
+
+(define_c_enum "unspec"
+  [UNSPEC_VPAIR_ABS
+   UNSPEC_VPAIR_DIV
+   UNSPEC_VPAIR_FMA
+   UNSPEC_VPAIR_MINUS
+   UNSPEC_VPAIR_MULT
+   UNSPEC_VPAIR_NEG
+   UNSPEC_VPAIR_PLUS
+   UNSPEC_VPAIR_SMAX
+   UNSPEC_VPAIR_SMIN
+   UNSPEC_VPAIR_ZERO
+   UNSPEC_VPAIR_SPLAT])
+
+;; Vector pair element ID that defines the scaler element within the vector pair.
+(define_c_enum "vpair_element"
+  [VPAIR_ELEMENT_FLOAT
+   VPAIR_ELEMENT_DOUBLE])
+
+(define_int_iterator VPAIR_FP_ELEMENT [VPAIR_ELEMENT_FLOAT
+				       VPAIR_ELEMENT_DOUBLE])
+
+;; Map vector pair element ID to the vector mode after the vector pair has been
+;; split.
+(define_int_attr VPAIR_VMODE [(VPAIR_ELEMENT_FLOAT  "V4SF")
+			      (VPAIR_ELEMENT_DOUBLE "V2DF")])
+
+;; Map vector pair element ID to the name used on the define_insn (in lower
+;; case).
+(define_int_attr vpair_modename [(VPAIR_ELEMENT_FLOAT  "v8sf")
+				 (VPAIR_ELEMENT_DOUBLE "v4df")])
+
+;; Unary/binary arithmetic iterator on vector pairs.
+(define_int_iterator VPAIR_FP_UNARY  [UNSPEC_VPAIR_ABS
+				      UNSPEC_VPAIR_NEG])
+
+(define_int_iterator VPAIR_FP_BINARY [UNSPEC_VPAIR_DIV
+				      UNSPEC_VPAIR_MINUS
+				      UNSPEC_VPAIR_MULT
+				      UNSPEC_VPAIR_PLUS
+				      UNSPEC_VPAIR_SMAX
+				      UNSPEC_VPAIR_SMIN])
+
+;; Map the vpair operator unspec number to the standard name.
+(define_int_attr vpair_stdname [(UNSPEC_VPAIR_ABS    "abs")
+				(UNSPEC_VPAIR_DIV    "div")
+				(UNSPEC_VPAIR_FMA    "fma")
+				(UNSPEC_VPAIR_MINUS  "sub")
+				(UNSPEC_VPAIR_MULT   "mul")
+				(UNSPEC_VPAIR_NEG    "neg")
+				(UNSPEC_VPAIR_PLUS   "add")
+				(UNSPEC_VPAIR_SMAX   "smax")
+				(UNSPEC_VPAIR_SMIN   "smin")])
+
+;; Map the vpair operator unspec number to the RTL operator.
+(define_int_attr VPAIR_OP [(UNSPEC_VPAIR_ABS    "ABS")
+			   (UNSPEC_VPAIR_DIV    "DIV")
+			   (UNSPEC_VPAIR_FMA    "FMA")
+			   (UNSPEC_VPAIR_MINUS  "MINUS")
+			   (UNSPEC_VPAIR_MULT   "MULT")
+			   (UNSPEC_VPAIR_NEG    "NEG")
+			   (UNSPEC_VPAIR_PLUS   "PLUS")
+			   (UNSPEC_VPAIR_SMAX   "SMAX")
+			   (UNSPEC_VPAIR_SMIN   "SMIN")])
+
+;; Map the scalar element ID into the appropriate insn type.
+(define_int_attr vpair_type [(VPAIR_ELEMENT_FLOAT  "vecfloat")
+			     (VPAIR_ELEMENT_DOUBLE "vecdouble")])
+
+;; Map the scalar element ID into the appropriate insn type for divide.
+(define_int_attr vpair_divtype [(VPAIR_ELEMENT_FLOAT  "vecfdiv")
+				(VPAIR_ELEMENT_DOUBLE "vecdiv")])
+
+;; Mode iterator for the vector modes that we provide splat operations for.
+(define_mode_iterator VPAIR_SPLAT_VMODE [V4SF V2DF])
+
+;; Map element mode to 128-bit vector mode for splat operations
+(define_mode_attr VPAIR_SPLAT_ELEMENT_TO_VMODE [(SF "V4SF")
+						(DF "V2DF")])
+
+;; Map either element mode or vector mode into the name for the splat insn.
+(define_mode_attr vpair_splat_name [(SF   "v8sf")
+				    (DF   "v4df")
+				    (V4SF "v8sf")
+				    (V2DF "v4df")])
+
+;; Initialize a vector pair to 0
+(define_insn_and_split "vpair_zero"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa")
+	(unspec:OO [(const_int 0)] UNSPEC_VPAIR_ZERO))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 1) (match_dup 3))
+   (set (match_dup 2) (match_dup 3))]
+{
+  rtx op0 = operands[0];
+
+  operands[1] = simplify_gen_subreg (V2DFmode, op0, OOmode, 0);
+  operands[2] = simplify_gen_subreg (V2DFmode, op0, OOmode, 16);
+  operands[3] = CONST0_RTX (V2DFmode);
+}
+  [(set_attr "length" "8")
+   (set_attr "type" "vecperm")])
+
+;; Create a vector pair with a value splat'ed (duplicated) to all of the
+;; elements.
+(define_expand "vpair_splat_<vpair_splat_name>"
+  [(use (match_operand:OO 0 "vsx_register_operand"))
+   (use (match_operand:SFDF 1 "input_operand"))]
+  "TARGET_MMA"
+{
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+  machine_mode element_mode = <MODE>mode;
+
+  if (op1 == CONST0_RTX (element_mode))
+    {
+      emit_insn (gen_vpair_zero (op0));
+      DONE;
+    }
+
+  machine_mode vector_mode = <VPAIR_SPLAT_ELEMENT_TO_VMODE>mode;
+  rtx vec = gen_reg_rtx (vector_mode);
+  unsigned num_elements = GET_MODE_NUNITS (vector_mode);
+  rtvec elements = rtvec_alloc (num_elements);
+  for (size_t i = 0; i < num_elements; i++)
+    RTVEC_ELT (elements, i) = copy_rtx (op1);
+
+  rs6000_expand_vector_init (vec, gen_rtx_PARALLEL (vector_mode, elements));
+  emit_insn (gen_vpair_splat_<vpair_splat_name>_internal (op0, vec));
+  DONE;
+})
+
+;; Inner splat support.  Operand1 is the vector splat created above.  Allow
+;; operand 1 to overlap with the output registers to eliminate one move
+;; instruction.
+(define_insn_and_split "vpair_splat_<vpair_splat_name>_internal"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa")
+	(unspec:OO
+	 [(match_operand:VPAIR_SPLAT_VMODE 1 "vsx_register_operand" "0,wa")]
+	 UNSPEC_VPAIR_SPLAT))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  rtx op0 = operands[0];
+  rtx op0_a = simplify_gen_subreg (<MODE>mode, op0, OOmode, 0);
+  rtx op0_b = simplify_gen_subreg (<MODE>mode, op0, OOmode, 16);
+  rtx op1 = operands[1];
+  unsigned op1_regno = reg_or_subregno (op1);
+
+  /* Check if the input is one of the output registers.  */
+  if (op1_regno == reg_or_subregno (op0_a))
+    emit_move_insn (op0_b, op1);
+
+  else if (op1_regno == reg_or_subregno (op0_b))
+    emit_move_insn (op0_a, op1);
+
+  else
+    {
+      emit_move_insn (op0_a, op1);
+      emit_move_insn (op0_b, op1);
+    }
+
+  DONE;
+}
+  [(set_attr "length" "*,8")
+   (set_attr "type" "vecmove")])
+
+;; Vector pair unary operations.  The last argument in the UNSPEC is a
+;; CONST_INT which identifies what the scalar element is.
+(define_insn_and_split "vpair_<vpair_stdname>_<vpair_modename>2"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa")
+	(unspec:OO
+	 [(match_operand:OO 1 "vsx_register_operand" "wa")
+	  (const_int VPAIR_FP_ELEMENT)]
+	 VPAIR_FP_UNARY))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  vpair_split_unary (operands, <VPAIR_VMODE>mode, <VPAIR_OP>,
+		     VPAIR_SPLIT_NORMAL);
+  DONE;
+}
+  [(set_attr "length" "8")
+   (set_attr "type" "<vpair_type>")])
+
+;; Optimize vector pair (neg (abs)).
+(define_insn_and_split "vpair_nabs_<vpair_modename>2"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa")
+	(unspec:OO
+	 [(unspec:OO
+	   [(match_operand:OO 1 "vsx_register_operand" "wa")
+	    (const_int VPAIR_FP_ELEMENT)]
+	   UNSPEC_VPAIR_ABS)
+	  (const_int VPAIR_FP_ELEMENT)]
+	 UNSPEC_VPAIR_NEG))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  vpair_split_unary (operands, <VPAIR_VMODE>mode, ABS, VPAIR_SPLIT_NEGATE);
+  DONE;
+}
+  [(set_attr "length" "8")
+   (set_attr "type" "<vpair_type>")])
+
+;; Vector pair binary operations.  The last argument in the UNSPEC is a
+;; CONST_INT which identifies what the scalar element is.
+(define_insn_and_split "vpair_<vpair_stdname>_<vpair_modename>3"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa")
+	(unspec:OO
+	 [(match_operand:OO 1 "vsx_register_operand" "wa")
+	  (match_operand:OO 2 "vsx_register_operand" "wa")
+	  (const_int VPAIR_FP_ELEMENT)]
+	 VPAIR_FP_BINARY))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  vpair_split_binary (operands, <VPAIR_VMODE>mode, <VPAIR_OP>);
+  DONE;
+}
+  [(set_attr "length" "8")
+   (set (attr "type") (if_then_else (match_test "<VPAIR_OP> == DIV")
+				    (const_string "<vpair_divtype>")
+				    (const_string "<vpair_type>")))])
+
+;; Optimize vector pair add of a negative value into a subtract.
+(define_insn_and_split "*vpair_add_neg_<vpair_modename>3"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa")
+	(unspec:OO
+	 [(match_operand:OO 1 "vsx_register_operand" "wa")
+	  (unspec:OO
+	   [(match_operand:OO 2 "vsx_register_operand" "wa")
+	    (const_int VPAIR_FP_ELEMENT)]
+	   UNSPEC_VPAIR_NEG)
+	  (const_int VPAIR_FP_ELEMENT)]
+	 VPAIR_FP_BINARY))]
+  "TARGET_MMA"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(unspec:OO
+	 [(match_dup 1)
+	  (match_dup 2)
+	  (const_int VPAIR_FP_ELEMENT)]
+	 UNSPEC_VPAIR_MINUS))]
+{
+}
+  [(set_attr "length" "8")
+   (set_attr "type" "<vpair_type>")])
+
+;; Vector pair fused-multiply (FMA) operations.  The last argument in the
+;; UNSPEC is a CONST_INT which identifies what the scalar element is.
+(define_insn_and_split "vpair_fma_<vpair_modename>4"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa")
+	(unspec:OO
+	 [(match_operand:OO 1 "vsx_register_operand" "%wa,wa")
+	  (match_operand:OO 2 "vsx_register_operand" "wa,0")
+	  (match_operand:OO 3 "vsx_register_operand" "0,wa")
+	  (const_int VPAIR_FP_ELEMENT)]
+	 UNSPEC_VPAIR_FMA))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  vpair_split_fma (operands, <VPAIR_VMODE>mode, VPAIR_SPLIT_FMA);
+  DONE;
+}
+  [(set_attr "length" "8")
+   (set_attr "type" "<vpair_type>")])
+
+;; Vector pair fused multiply-subtract
+(define_insn_and_split "vpair_fms_<vpair_modename>4"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa")
+	(unspec:OO
+	 [(match_operand:OO 1 "vsx_register_operand" "%wa,wa")
+	  (match_operand:OO 2 "vsx_register_operand" "wa,0")
+	  (unspec:OO
+	   [(match_operand:OO 3 "vsx_register_operand" "0,wa")
+	    (const_int VPAIR_FP_ELEMENT)]
+	   UNSPEC_VPAIR_NEG)
+	  (const_int VPAIR_FP_ELEMENT)]
+	 UNSPEC_VPAIR_FMA))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  vpair_split_fma (operands, <VPAIR_VMODE>mode, VPAIR_SPLIT_FMS);
+  DONE;
+}
+  [(set_attr "length" "8")
+   (set_attr "type" "<vpair_type>")])
+
+;; Vector pair negate fused multiply-add
+(define_insn_and_split "vpair_nfma_<vpair_modename>4"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa")
+	(unspec:OO
+	 [(unspec:OO
+	   [(match_operand:OO 1 "vsx_register_operand" "%wa,wa")
+	    (match_operand:OO 2 "vsx_register_operand" "wa,0")
+	    (match_operand:OO 3 "vsx_register_operand" "0,wa")
+	    (const_int VPAIR_FP_ELEMENT)]
+	   UNSPEC_VPAIR_FMA)
+	  (const_int VPAIR_FP_ELEMENT)]
+	 UNSPEC_VPAIR_NEG))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  vpair_split_fma (operands, <VPAIR_VMODE>mode, VPAIR_SPLIT_NFMA);
+  DONE;
+}
+  [(set_attr "length" "8")
+   (set_attr "type" "<vpair_type>")])
+
+;; Vector pair fused multiply-subtract
+(define_insn_and_split "vpair_nfms_<vpair_modename>4"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa")
+	(unspec:OO
+	 [(unspec:OO
+	   [(match_operand:OO 1 "vsx_register_operand" "%wa,wa")
+	    (match_operand:OO 2 "vsx_register_operand" "wa,0")
+	    (unspec:OO
+	     [(match_operand:OO 3 "vsx_register_operand" "0,wa")
+	      (const_int VPAIR_FP_ELEMENT)]
+	     UNSPEC_VPAIR_NEG)
+	    (const_int VPAIR_FP_ELEMENT)]
+	   UNSPEC_VPAIR_FMA)
+	  (const_int VPAIR_FP_ELEMENT)]
+	 UNSPEC_VPAIR_NEG))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  vpair_split_fma (operands, <VPAIR_VMODE>mode, VPAIR_SPLIT_NFMS);
+  DONE;
+}
+  [(set_attr "length" "8")
+   (set_attr "type" "<vpair_type>")])
+
+;; Optimize vector pair multiply and vector pair add into vector pair fma,
+;; providing the compiler would do this optimization for scalar and vectors.
+;; Unlike most of the define_insn_and_splits, this can be done before register
+;; allocation.
+(define_insn_and_split "*vpair_fma_<vpair_modename>_merge"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa")
+	(unspec:OO
+	 [(unspec:OO
+	   [(match_operand:OO 1 "vsx_register_operand" "%wa,wa")
+	    (match_operand:OO 2 "vsx_register_operand" "wa,0")
+	    (const_int VPAIR_FP_ELEMENT)]
+	   UNSPEC_VPAIR_MULT)
+	  (match_operand:OO 3 "vsx_register_operand" "0,wa")
+	  (const_int VPAIR_FP_ELEMENT)]
+	 UNSPEC_VPAIR_PLUS))]
+  "TARGET_MMA && flag_fp_contract_mode == FP_CONTRACT_FAST"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(unspec:OO
+	 [(match_dup 1)
+	  (match_dup 2)
+	  (match_dup 3)
+	  (const_int VPAIR_FP_ELEMENT)]
+	 UNSPEC_VPAIR_FMA))]
+{
+}
+  [(set_attr "length" "8")
+   (set_attr "type" "<vpair_type>")])
+
+;; Merge multiply and subtract.
+(define_insn_and_split "*vpair_fma_<vpair_modename>_merge"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa")
+	(unspec:OO
+	 [(unspec:OO
+	   [(match_operand:OO 1 "vsx_register_operand" "%wa,wa")
+	    (match_operand:OO 2 "vsx_register_operand" "wa,0")
+	    (const_int VPAIR_FP_ELEMENT)]
+	   UNSPEC_VPAIR_MULT)
+	  (match_operand:OO 3 "vsx_register_operand" "0,wa")
+	  (const_int VPAIR_FP_ELEMENT)]
+	 UNSPEC_VPAIR_MINUS))]
+  "TARGET_MMA && flag_fp_contract_mode == FP_CONTRACT_FAST"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(unspec:OO
+	 [(match_dup 1)
+	  (match_dup 2)
+	  (unspec:OO
+	   [(match_dup 3)
+	    (const_int VPAIR_FP_ELEMENT)]
+	   UNSPEC_VPAIR_NEG)
+	  (const_int VPAIR_FP_ELEMENT)]
+	 UNSPEC_VPAIR_FMA))]
+{
+}
+  [(set_attr "length" "8")
+   (set_attr "type" "<vpair_type>")])
+
+(define_insn_and_split "*vpair_fma_<vpair_modename>_merge2"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa")
+	(unspec:OO
+	 [(unspec:OO
+	   [(match_operand:OO 1 "vsx_register_operand" "%wa,wa")
+	    (match_operand:OO 2 "vsx_register_operand" "wa,0")
+	    (const_int VPAIR_FP_ELEMENT)]
+	   UNSPEC_VPAIR_MULT)
+	  (unspec:OO
+	   [(match_operand:OO 3 "vsx_register_operand" "0,wa")
+	    (const_int VPAIR_FP_ELEMENT)]
+	   UNSPEC_VPAIR_NEG)
+	  (const_int VPAIR_FP_ELEMENT)]
+	 UNSPEC_VPAIR_PLUS))]
+  "TARGET_MMA && flag_fp_contract_mode == FP_CONTRACT_FAST"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(unspec:OO
+	 [(match_dup 1)
+	  (match_dup 2)
+	  (unspec:OO
+	   [(match_dup 3)
+	    (const_int VPAIR_FP_ELEMENT)]
+	   UNSPEC_VPAIR_NEG)
+	  (const_int VPAIR_FP_ELEMENT)]
+	 UNSPEC_VPAIR_FMA))]
+{
+}
+  [(set_attr "length" "8")
+   (set_attr "type" "<vpair_type>")])
+
+;; Merge negate, multiply, and add.
+(define_insn_and_split "*vpair_nfma_<vpair_modename>_merge"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa")
+	(unspec:OO
+	 [(unspec:OO
+	   [(unspec:OO
+	     [(match_operand:OO 1 "vsx_register_operand" "%wa,wa")
+	      (match_operand:OO 2 "vsx_register_operand" "wa,0")
+	      (const_int VPAIR_FP_ELEMENT)]
+	     UNSPEC_VPAIR_MULT)
+	    (match_operand:OO 3 "vsx_register_operand" "0,wa")
+	    (const_int VPAIR_FP_ELEMENT)]
+	   UNSPEC_VPAIR_PLUS)
+	  (const_int VPAIR_FP_ELEMENT)]
+	 UNSPEC_VPAIR_NEG))]
+  "TARGET_MMA && flag_fp_contract_mode == FP_CONTRACT_FAST"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(unspec:OO
+	 [(unspec:OO
+	   [(match_dup 1)
+	    (match_dup 2)
+	    (match_dup 3)
+	    (const_int VPAIR_FP_ELEMENT)]
+	   UNSPEC_VPAIR_FMA)
+	  (const_int VPAIR_FP_ELEMENT)]
+	 UNSPEC_VPAIR_NEG))]
+{
+}
+  [(set_attr "length" "8")
+   (set_attr "type" "<vpair_type>")])
+
+;; Merge negate, multiply, and subtract.
+(define_insn_and_split "*vpair_nfms_<vpair_modename>_merge"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa")
+	(unspec:OO
+	 [(unspec:OO
+	   [(unspec:OO
+	     [(match_operand:OO 1 "vsx_register_operand" "%wa,wa")
+	      (match_operand:OO 2 "vsx_register_operand" "wa,0")
+	      (const_int VPAIR_FP_ELEMENT)]
+	     UNSPEC_VPAIR_MULT)
+	    (match_operand:OO 3 "vsx_register_operand" "0,wa")
+	    (const_int VPAIR_FP_ELEMENT)]
+	   UNSPEC_VPAIR_MINUS)
+	  (const_int VPAIR_FP_ELEMENT)]
+	 UNSPEC_VPAIR_NEG))]
+  "TARGET_MMA && flag_fp_contract_mode == FP_CONTRACT_FAST"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(unspec:OO
+	 [(unspec:OO
+	   [(match_dup 1)
+	    (match_dup 2)
+	    (unspec:OO
+	     [(match_dup 3)
+	      (const_int VPAIR_FP_ELEMENT)]
+	     UNSPEC_VPAIR_NEG)
+	    (const_int VPAIR_FP_ELEMENT)]
+	   UNSPEC_VPAIR_FMA)
+	  (const_int VPAIR_FP_ELEMENT)]
+	 UNSPEC_VPAIR_NEG))]
+{
+}
+  [(set_attr "length" "8")
+   (set_attr "type" "<vpair_type>")])
+
+(define_insn_and_split "*vpair_nfms_<vpair_modename>_merge2"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa")
+	(unspec:OO
+	 [(unspec:OO
+	   [(unspec:OO
+	     [(match_operand:OO 1 "vsx_register_operand" "%wa,wa")
+	      (match_operand:OO 2 "vsx_register_operand" "wa,0")
+	      (const_int VPAIR_FP_ELEMENT)]
+	     UNSPEC_VPAIR_MULT)
+	    (unspec:OO
+	     [(match_operand:OO 3 "vsx_register_operand" "0,wa")
+	      (const_int VPAIR_FP_ELEMENT)]
+	     UNSPEC_VPAIR_NEG)
+	    (const_int VPAIR_FP_ELEMENT)]
+	   UNSPEC_VPAIR_PLUS)
+	  (const_int VPAIR_FP_ELEMENT)]
+	 UNSPEC_VPAIR_NEG))]
+  "TARGET_MMA && flag_fp_contract_mode == FP_CONTRACT_FAST"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(unspec:OO
+	 [(unspec:OO
+	   [(match_dup 1)
+	    (match_dup 2)
+	    (unspec:OO
+	     [(match_dup 3)
+	      (const_int VPAIR_FP_ELEMENT)]
+	     UNSPEC_VPAIR_NEG)
+	    (const_int VPAIR_FP_ELEMENT)]
+	   UNSPEC_VPAIR_FMA)
+	  (const_int VPAIR_FP_ELEMENT)]
+	 UNSPEC_VPAIR_NEG))]
+{
+}
+  [(set_attr "length" "8")
+   (set_attr "type" "<vpair_type>")])
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 0bc586d120e..d455d0c5624 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -15827,6 +15827,7 @@ instructions, but allow the compiler to schedule those calls.
 * NDS32 Built-in Functions::
 * Nvidia PTX Built-in Functions::
 * Basic PowerPC Built-in Functions::
+* PowerPC Vector Pair Built-in Functions::
 * PowerPC AltiVec/VSX Built-in Functions::
 * PowerPC Hardware Transactional Memory Built-in Functions::
 * PowerPC Atomic Memory Operation Functions::
@@ -23857,6 +23858,90 @@ int vec_any_le (vector unsigned __int128, vector unsigned __int128);
 @end smallexample
 
 
+@node PowerPC Vector Pair Built-in Functions
+@subsection PowerPC Vector Pair Built-in Functions
+
+GCC provides functions to speed up processing by using the type
+@code{__vector_pair} to hold two 128-bit vectors on processors that
+support ISA 3.1 (power10).  The @code{__vector_pair} type and the
+vector pair built-in functions require the MMA instruction set
+(@option{-mmma}) to be enabled, which is on by default for
+@option{-mcpu=power10}.
+
+By default, @code{__vector_pair} types are loaded into vectors with a
+single load vector pair instruction.  The processing for the built-in
+function is done as two separate vector instructions on each of the
+two 128-bit vectors stored in the vector pair.  The
+@code{__vector_pair} type is usually stored with a single vector pair
+store instruction.
+
+The @code{nabs} built-in is a combination of @code{neg} and
+@code{abs}.
+
+The @code{fms} built-in is a combination of @code{fma} and @code{neg}
+of the third element.
+
+The @code{nfma} built-in is a combination of @code{neg} of the
+@code{fma} built-in.
+
+The @code{nfms} built-in is a combination of @code{neg} of the
+@code{fms} built-in.
+
+The following built-in function is independent on the type of the
+underlying vector:
+
+@smallexample
+__vector_pair __builtin_vpair_zero ();
+@end smallexample
+
+The following built-in functions operate on pairs of
+@code{vector float} values:
+
+@smallexample
+__vector_pair __builtin_vpair_f32_abs (__vector_pair);
+__vector_pair __builtin_vpair_f32_add (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_f32_div (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_f32_fma (__vector_pair, __vector_pair,
+                                       __vector_pair);
+__vector_pair __builtin_vpair_f32_fms (__vector_pair, __vector_pair,
+                                       __vector_pair);
+__vector_pair __builtin_vpair_f32_max (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_f32_min (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_f32_mul (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_f32_nabs (__vector_pair);
+__vector_pair __builtin_vpair_f32_neg (__vector_pair);
+__vector_pair __builtin_vpair_f32_nfma (__vector_pair, __vector_pair,
+                                       __vector_pair);
+__vector_pair __builtin_vpair_f32_nfms (__vector_pair, __vector_pair,
+                                       __vector_pair);
+__vector_pair __builtin_vpair_f32_splat (float);
+__vector_pair __builtin_vpair_f32_sub (__vector_pair, __vector_pair);
+@end smallexample
+
+The following built-in functions operate on pairs of
+@code{vector double} values:
+
+@smallexample
+__vector_pair __builtin_vpair_f64_abs (__vector_pair);
+__vector_pair __builtin_vpair_f64_add (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_f64_div (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_f64_fma (__vector_pair, __vector_pair,
+                                       __vector_pair);
+__vector_pair __builtin_vpair_f64_fms (__vector_pair, __vector_pair,
+                                       __vector_pair);
+__vector_pair __builtin_vpair_f64_max (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_f64_min (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_f64_mul (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_f64_nabs (__vector_pair);
+__vector_pair __builtin_vpair_f64_neg (__vector_pair);
+__vector_pair __builtin_vpair_f64_nfma (__vector_pair, __vector_pair,
+                                       __vector_pair);
+__vector_pair __builtin_vpair_f64_nfms (__vector_pair, __vector_pair,
+                                       __vector_pair);
+__vector_pair __builtin_vpair_f64_splat (double);
+__vector_pair __builtin_vpair_f64_sub (__vector_pair, __vector_pair);
+@end smallexample
+
 @node PowerPC Hardware Transactional Memory Built-in Functions
 @subsection PowerPC Hardware Transactional Memory Built-in Functions
 GCC provides two interfaces for accessing the Hardware Transactional
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-1.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-1.c
new file mode 100644
index 00000000000..a6dbc457639
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-1.c
@@ -0,0 +1,87 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test whether the vector builtin code generates the expected instructions for
+   vector pairs with 4 double elements.  */
+
+void
+test_add (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 xvadddp, 1 stxvp.  */
+  *dest = __builtin_vpair_f64_add (*x, *y);
+}
+
+void
+test_sub (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 xvsubdp, 1 stxvp.  */
+  *dest = __builtin_vpair_f64_sub (*x, *y);
+}
+
+void
+test_multiply (__vector_pair *dest,
+	       __vector_pair *x,
+	       __vector_pair *y)
+{
+  /* 2 lxvp, 2 xvmuldp, 1 stxvp.  */
+  *dest = __builtin_vpair_f64_mul (*x, *y);
+}
+
+void
+test_min (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 xvmindp, 1 stxvp.  */
+  *dest = __builtin_vpair_f64_min (*x, *y);
+}
+
+void
+test_max (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 xvmaxdp, 1 stxvp.  */
+  *dest = __builtin_vpair_f64_max (*x, *y);
+}
+
+void
+test_negate (__vector_pair *dest,
+	     __vector_pair *x)
+{
+  /* 1 lxvp, 2 xvnegdp, 1 stxvp.  */
+  *dest = __builtin_vpair_f64_neg (*x);
+}
+
+void
+test_abs (__vector_pair *dest,
+	  __vector_pair *x)
+{
+  /* 1 lxvp, 2 xvabsdp, 1 stxvp.  */
+  *dest = __builtin_vpair_f64_abs (*x);
+}
+
+void
+test_negative_abs (__vector_pair *dest,
+		   __vector_pair *x)
+{
+  /* 2 lxvp, 2 xvnabsdp, 1 stxvp.  */
+  __vector_pair ab = __builtin_vpair_f64_abs (*x);
+  *dest = __builtin_vpair_f64_neg (ab);
+}
+
+/* { dg-final { scan-assembler-times {\mlxvp\M}     13 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M}     8 } } */
+/* { dg-final { scan-assembler-times {\mxvabsdp\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mxvadddp\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mxvmaxdp\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mxvmindp\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mxvmuldp\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mxvnabsdp\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mxvnegdp\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mxvsubdp\M}   2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-10.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-10.c
new file mode 100644
index 00000000000..d2ee4dd0dd9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-10.c
@@ -0,0 +1,61 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -Ofast -ffp-contract=fast" } */
+
+/* Test whether the vector builtin code merges multiply, add/subtract, and
+   negate into fma operations.  */
+
+void
+test_fma (__vector_pair *p,
+	  __vector_pair *q,
+	  __vector_pair *r,
+	  __vector_pair *s)
+{
+  /* lxvp, 2 xvmadd{a,m}sp, stxvp.  */
+  __vector_pair mul = __builtin_vpair_f32_mul (*q, *r);
+  *p = __builtin_vpair_f32_add (mul, *s);
+}
+
+void
+test_fms (__vector_pair *p,
+	  __vector_pair *q,
+	  __vector_pair *r,
+	  __vector_pair *s)
+{
+  /* lxvp, 2 xvmsub{a,m}sp, stxvp.  */
+  __vector_pair mul = __builtin_vpair_f32_mul (*q, *r);
+  __vector_pair neg = __builtin_vpair_f32_neg (*s);
+  *p = __builtin_vpair_f32_add (mul, neg);
+}
+
+void
+test_nfma (__vector_pair *p,
+	   __vector_pair *q,
+	   __vector_pair *r,
+	   __vector_pair *s)
+{
+  /* lxvp, 2 xvnmadd{a,m}sp, stxvp.  */
+  __vector_pair mul = __builtin_vpair_f32_mul (*q, *r);
+  __vector_pair muladd = __builtin_vpair_f32_add (mul, *s);
+  *p = __builtin_vpair_f32_neg (muladd);
+}
+
+void
+test_nfms (__vector_pair *p,
+	   __vector_pair *q,
+	   __vector_pair *r,
+	   __vector_pair *s)
+{
+  /* lxvp, 2 xvnmsub{a,m}sp, stxvp.  */
+  __vector_pair mul = __builtin_vpair_f32_mul (*q, *r);
+  __vector_pair neg = __builtin_vpair_f32_neg (*s);
+  __vector_pair muladd = __builtin_vpair_f32_add (mul, neg);
+  *p = __builtin_vpair_f32_neg (muladd);
+}
+
+/* { dg-final { scan-assembler-times {\mlxvp\M}       12 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M}       4 } } */
+/* { dg-final { scan-assembler-times {\mxvmadd.sp\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mxvmsub.sp\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mxvnmadd.sp\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mxvnmsub.sp\M}  2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-11.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-11.c
new file mode 100644
index 00000000000..e635b599aed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-11.c
@@ -0,0 +1,65 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -ffp-contract=off" } */
+
+/* Test whether the vector builtin code do not merge multiply, add/subtract,
+   and negate into fma operations if -ffp-contract is off.  */
+
+void
+test_fma (__vector_pair *p,
+	  __vector_pair *q,
+	  __vector_pair *r,
+	  __vector_pair *s)
+{
+  /* lxvp, 2 xvmuldp, 2 xvadddp, stxvp.  */
+  __vector_pair mul = __builtin_vpair_f64_mul (*q, *r);
+  *p = __builtin_vpair_f64_add (mul, *s);
+}
+
+void
+test_fms (__vector_pair *p,
+	  __vector_pair *q,
+	  __vector_pair *r,
+	  __vector_pair *s)
+{
+  /* lxvp, 2 xvmuldp, 2 xvsubdp, stxvp.  */
+  __vector_pair mul = __builtin_vpair_f64_mul (*q, *r);
+  __vector_pair neg = __builtin_vpair_f64_neg (*s);
+  *p = __builtin_vpair_f64_add (mul, neg);
+}
+
+void
+test_nfma (__vector_pair *p,
+	   __vector_pair *q,
+	   __vector_pair *r,
+	   __vector_pair *s)
+{
+  /* lxvp, 2 xvmuldp, 2 xvadddp, 2 xvnegdp, stxvp.  */
+  __vector_pair mul = __builtin_vpair_f64_mul (*q, *r);
+  __vector_pair muladd = __builtin_vpair_f64_add (mul, *s);
+  *p = __builtin_vpair_f64_neg (muladd);
+}
+
+void
+test_nfms (__vector_pair *p,
+	   __vector_pair *q,
+	   __vector_pair *r,
+	   __vector_pair *s)
+{
+  /* lxvp, 2 xvmuldp, 2 xvsubdp, 2 xvnegdp, stxvp.  */
+  __vector_pair mul = __builtin_vpair_f64_mul (*q, *r);
+  __vector_pair neg = __builtin_vpair_f64_neg (*s);
+  __vector_pair muladd = __builtin_vpair_f64_add (mul, neg);
+  *p = __builtin_vpair_f64_neg (muladd);
+}
+
+/* { dg-final { scan-assembler-times {\mlxvp\M}       12 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M}       4 } } */
+/* { dg-final { scan-assembler-times {\mxvadddp\M}     4 } } */
+/* { dg-final { scan-assembler-times {\mxvmuldp\M}     8 } } */
+/* { dg-final { scan-assembler-times {\mxvnegdp\M}     4 } } */
+/* { dg-final { scan-assembler-times {\mxvsubdp\M}     4 } } */
+/* { dg-final { scan-assembler-not   {\mxvmadd.dp\M}     } } */
+/* { dg-final { scan-assembler-not   {\mxvmsub.dp\M}     } } */
+/* { dg-final { scan-assembler-not   {\mxvnmadd.dp\M}    } } */
+/* { dg-final { scan-assembler-not   {\mxvnmsub.dp\M}    } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-12.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-12.c
new file mode 100644
index 00000000000..4997279473e
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-12.c
@@ -0,0 +1,65 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -ffp-contract=off" } */
+
+/* Test whether the vector builtin code do not merge multiply, add/subtract,
+   and negate into fma operations if -ffp-contract is off.  */
+
+void
+test_fma (__vector_pair *p,
+	  __vector_pair *q,
+	  __vector_pair *r,
+	  __vector_pair *s)
+{
+  /* lxvp, 2 xvmulsp, 2 xvaddsp, stxvp.  */
+  __vector_pair mul = __builtin_vpair_f32_mul (*q, *r);
+  *p = __builtin_vpair_f32_add (mul, *s);
+}
+
+void
+test_fms (__vector_pair *p,
+	  __vector_pair *q,
+	  __vector_pair *r,
+	  __vector_pair *s)
+{
+  /* lxvp, 2 xvmulsp, 2 xvsubsp, stxvp.  */
+  __vector_pair mul = __builtin_vpair_f32_mul (*q, *r);
+  __vector_pair neg = __builtin_vpair_f32_neg (*s);
+  *p = __builtin_vpair_f32_add (mul, neg);
+}
+
+void
+test_nfma (__vector_pair *p,
+	   __vector_pair *q,
+	   __vector_pair *r,
+	   __vector_pair *s)
+{
+  /* lxvp, 2 xvmulsp, 2 xvaddsp, 2 xvnegsp, stxvp.  */
+  __vector_pair mul = __builtin_vpair_f32_mul (*q, *r);
+  __vector_pair muladd = __builtin_vpair_f32_add (mul, *s);
+  *p = __builtin_vpair_f32_neg (muladd);
+}
+
+void
+test_nfms (__vector_pair *p,
+	   __vector_pair *q,
+	   __vector_pair *r,
+	   __vector_pair *s)
+{
+  /* lxvp, 2 xvmulsp, 2 xvsubsp, 2 xvnegsp, stxvp.  */
+  __vector_pair mul = __builtin_vpair_f32_mul (*q, *r);
+  __vector_pair neg = __builtin_vpair_f32_neg (*s);
+  __vector_pair muladd = __builtin_vpair_f32_add (mul, neg);
+  *p = __builtin_vpair_f32_neg (muladd);
+}
+
+/* { dg-final { scan-assembler-times {\mlxvp\M}       12 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M}       4 } } */
+/* { dg-final { scan-assembler-times {\mxvaddsp\M}     4 } } */
+/* { dg-final { scan-assembler-times {\mxvmulsp\M}     8 } } */
+/* { dg-final { scan-assembler-times {\mxvnegsp\M}     4 } } */
+/* { dg-final { scan-assembler-times {\mxvsubsp\M}     4 } } */
+/* { dg-final { scan-assembler-not   {\mxvmadd.sp\M}     } } */
+/* { dg-final { scan-assembler-not   {\mxvmsub.sp\M}     } } */
+/* { dg-final { scan-assembler-not   {\mxvnmadd.sp\M}    } } */
+/* { dg-final { scan-assembler-not   {\mxvnmsub.sp\M}    } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-2.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-2.c
new file mode 100644
index 00000000000..2f663c5780c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-2.c
@@ -0,0 +1,86 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test whether the vector builtin code generates the expected instructions for
+   vector pairs with 8 float elements.  */
+
+void
+test_add (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 xvaddsp, 1 stxvp.  */
+  *dest = __builtin_vpair_f32_add (*x, *y);
+}
+
+void
+test_sub (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 xvsubsp, 1 stxvp.  */
+  *dest = __builtin_vpair_f32_sub (*x, *y);
+}
+
+void
+test_multiply (__vector_pair *dest,
+	       __vector_pair *x,
+	       __vector_pair *y)
+{
+  /* 2 lxvp, 2 xvmulsp, 1 stxvp.  */
+  *dest = __builtin_vpair_f32_mul (*x, *y);
+}
+
+void
+test_max (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 xvmaxsp, 1 stxvp.  */
+  *dest = __builtin_vpair_f32_max (*x, *y);
+}
+
+void
+test_min (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 xvminsp, 1 stxvp.  */
+  *dest = __builtin_vpair_f32_min (*x, *y);
+}
+
+void
+test_negate (__vector_pair *dest,
+	     __vector_pair *x)
+{
+  /* 1 lxvp, 2 xvnegsp, 1 stxvp.  */
+  *dest = __builtin_vpair_f32_neg (*x);
+}
+
+void
+test_abs (__vector_pair *dest,
+	  __vector_pair *x)
+{
+  /* 1 lxvp, 2 xvabssp, 1 stxvp.  */
+  *dest = __builtin_vpair_f32_abs (*x);
+}
+
+void
+test_negative_abs (__vector_pair *dest,
+		   __vector_pair *x)
+{
+  /* 2 lxvp, 2 xvnabssp, 1 stxvp.  */
+  __vector_pair ab = __builtin_vpair_f32_abs (*x);
+  *dest = __builtin_vpair_f32_neg (ab);
+}
+
+/* { dg-final { scan-assembler-times {\mlxvp\M}     13 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M}     8 } } */
+/* { dg-final { scan-assembler-times {\mxvabssp\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mxvaddsp\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mxvmaxsp\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mxvminsp\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mxvmulsp\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mxvnabssp\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mxvnegsp\M}   2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-3.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-3.c
new file mode 100644
index 00000000000..43b91461759
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-3.c
@@ -0,0 +1,57 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test whether the vector builtin code generates the expected FMA instructions
+   for vector pairs with 4 double elements.  */
+
+void
+test_fma (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y,
+	  __vector_pair *z)
+{
+  /* 3 lxvp, 2 xvmadd{a,q}sp, 1 stxvp.  */
+  *dest = __builtin_vpair_f64_fma (*x, *y, *z);
+}
+
+void
+test_fms (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y,
+	  __vector_pair *z)
+{
+  /* 3 lxvp, 2 xvmsub{a,q}sp, 1 stxvp.  */
+  __vector_pair n = __builtin_vpair_f64_neg (*z);
+  *dest = __builtin_vpair_f64_fma (*x, *y, n);
+}
+
+void
+test_nfma (__vector_pair *dest,
+	   __vector_pair *x,
+	   __vector_pair *y,
+	   __vector_pair *z)
+{
+  /* 3 lxvp, 2 xvnmadd{a,q}sp, 1 stxvp.  */
+  __vector_pair w = __builtin_vpair_f64_fma (*x, *y, *z);
+  *dest = __builtin_vpair_f64_neg (w);
+}
+
+void
+test_nfms (__vector_pair *dest,
+	   __vector_pair *x,
+	   __vector_pair *y,
+	   __vector_pair *z)
+{
+  /* 3 lxvp, 2 xvnmsub{a,q}sp, 1 stxvp.  */
+  __vector_pair n = __builtin_vpair_f64_neg (*z);
+  __vector_pair w = __builtin_vpair_f64_fma (*x, *y, n);
+  *dest = __builtin_vpair_f64_neg (w);
+}
+
+/* { dg-final { scan-assembler-times {\mlxvp\M}       12 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M}       4 } } */
+/* { dg-final { scan-assembler-times {\mxvmadd.dp\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mxvnmadd.dp\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mxvnmsub.dp\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mxvmsub.dp\M}   2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-4.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-4.c
new file mode 100644
index 00000000000..d5c55d3883c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-4.c
@@ -0,0 +1,57 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test whether the vector builtin code generates the expected FMA instructions
+   for vector pairs with 8 float elements.  */
+
+void
+test_fma (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y,
+	  __vector_pair *z)
+{
+  /* 3 lxvp, 2 xvmadd{a,q}sp, 1 stxvp.  */
+  *dest = __builtin_vpair_f32_fma (*x, *y, *z);
+}
+
+void
+test_fms (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y,
+	  __vector_pair *z)
+{
+  /* 3 lxvp, 2 xvmsub{a,q}sp, 1 stxvp.  */
+  __vector_pair n = __builtin_vpair_f32_neg (*z);
+  *dest = __builtin_vpair_f32_fma (*x, *y, n);
+}
+
+void
+test_nfma (__vector_pair *dest,
+	   __vector_pair *x,
+	   __vector_pair *y,
+	   __vector_pair *z)
+{
+  /* 3 lxvp, 2 xvnmadd{a,q}sp, 1 stxvp.  */
+  __vector_pair w = __builtin_vpair_f32_fma (*x, *y, *z);
+  *dest = __builtin_vpair_f32_neg (w);
+}
+
+void
+test_nfms (__vector_pair *dest,
+	   __vector_pair *x,
+	   __vector_pair *y,
+	   __vector_pair *z)
+{
+  /* 3 lxvp, 2 xvnmsub{a,q}sp, 1 stxvp.  */
+  __vector_pair n = __builtin_vpair_f32_neg (*z);
+  __vector_pair w = __builtin_vpair_f32_fma (*x, *y, n);
+  *dest = __builtin_vpair_f32_neg (w);
+}
+
+/* { dg-final { scan-assembler-times {\mlxvp\M}       12 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M}       4 } } */
+/* { dg-final { scan-assembler-times {\mxvmadd.sp\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mxvnmadd.sp\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mxvnmsub.sp\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mxvmsub.sp\M}   2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-5.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-5.c
new file mode 100644
index 00000000000..9b645e626e1
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-5.c
@@ -0,0 +1,56 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test whether the vector builtin code generates the expected instructions for
+   vector pairs zero and splat functions for vector pairs containing
+   doubles.  */
+
+void
+test_zero (__vector_pair *p)
+{
+  /* 2 xxspltib/xxlxor.  */
+  *p = __builtin_vpair_zero ();
+}
+
+void
+test_splat_zero (__vector_pair *p)
+{
+  /* 2 xxspltib/xxlxor.  */
+  *p = __builtin_vpair_f64_splat (0.0);
+}
+
+void
+test_splat_one (__vector_pair *p)
+{
+  /* xxspltidp, xxlor.  */
+  *p = __builtin_vpair_f64_splat (1.0);
+}
+
+void
+test_splat_pi (__vector_pair *p)
+{
+  /* plxv, xxlor (note, we cannot use xxspltidp).  */
+  *p = __builtin_vpair_f64_splat (3.1415926535);
+}
+
+void
+test_splat_arg (__vector_pair *p, double x)
+{
+  /* xxpermdi, xxlor.  */
+  *p = __builtin_vpair_f64_splat (x);
+}
+
+void
+test_splat_mem (__vector_pair *p, double *q)
+{
+  /* lxvdsx, xxlor.  */
+  *p = __builtin_vpair_f64_splat (*q);
+}
+
+/* { dg-final { scan-assembler-times {\mlxvdsx\M}              1 } } */
+/* { dg-final { scan-assembler-times {\mp?lxvx?\M}             1 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M}               6 } } */
+/* { dg-final { scan-assembler-times {\mxxpermdi\M}            1 } } */
+/* { dg-final { scan-assembler-times {\mxxspltib\M|\mxxlxor\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mxxspltidp\M}           1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-6.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-6.c
new file mode 100644
index 00000000000..5ec53d4bfc3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-6.c
@@ -0,0 +1,56 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test whether the vector builtin code generates the expected instructions for
+   vector pairs zero and splat functions for vector pairs containing
+   floats.  */
+
+void
+test_zero (__vector_pair *p)
+{
+  /* 2 xxspltib/xxlxor.  */
+  *p = __builtin_vpair_zero ();
+}
+
+void
+test_splat_zero (__vector_pair *p)
+{
+  /* 2 xxspltib/xxlxor.  */
+  *p = __builtin_vpair_f32_splat (0.0f);
+}
+
+void
+test_splat_one (__vector_pair *p)
+{
+  /* xxspltiw, xxlor.  */
+  *p = __builtin_vpair_f32_splat (1.0f);
+}
+
+void
+test_splat_pi (__vector_pair *p)
+{
+  /* xxspltiw, xxlor.  */
+  *p = __builtin_vpair_f32_splat (3.1415926535f);
+}
+
+void
+test_splat_arg (__vector_pair *p, float x)
+{
+  /* xscvdpspn, xxspltw, xxlor.  */
+  *p = __builtin_vpair_f32_splat (x);
+}
+
+void
+test_splat_mem (__vector_pair *p, float *q)
+{
+  /* xlvwsx, xxlor.  */
+  *p = __builtin_vpair_f32_splat (*q);
+}
+
+/* { dg-final { scan-assembler-times {\mlxvwsx\M}              1 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M}               6 } } */
+/* { dg-final { scan-assembler-times {\mxscvdpspn\M}           1 } } */
+/* { dg-final { scan-assembler-times {\mxxspltib\M|\mxxlxor\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}            2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltw\M}             1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-7.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-7.c
new file mode 100644
index 00000000000..51a400cb4b3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-7.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test whether the vector builtin code merges plus and neg into a minus
+   operation.  */
+
+void
+test_minus (__vector_pair *p, __vector_pair *q, __vector_pair *r)
+{
+  *p = __builtin_vpair_f64_add (*q, __builtin_vpair_f64_neg (*r));
+}
+
+/* { dg-final { scan-assembler-times {\mlxvp\M}     2 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M}    1 } } */
+/* { dg-final { scan-assembler-times {\mxvsubdp\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxvadddp\M}    } } */
+/* { dg-final { scan-assembler-not   {\mxvnegdp\M}    } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-8.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-8.c
new file mode 100644
index 00000000000..67957e3bdea
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-8.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test whether the vector builtin code merges plus and neg into a minus
+   operation.  */
+
+void
+test_minus (__vector_pair *p, __vector_pair *q, __vector_pair *r)
+{
+  *p = __builtin_vpair_f32_add (*q, __builtin_vpair_f32_neg (*r));
+}
+
+/* { dg-final { scan-assembler-times {\mlxvp\M}     2 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M}    1 } } */
+/* { dg-final { scan-assembler-times {\mxvsubsp\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxvaddsp\M}    } } */
+/* { dg-final { scan-assembler-not   {\mxvnegsp\M}    } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-9.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-9.c
new file mode 100644
index 00000000000..eacf8dae9d8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-9.c
@@ -0,0 +1,61 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -Ofast -ffp-contract=fast" } */
+
+/* Test whether the vector builtin code merges multiply, add/subtract, and
+   negate into fma operations.  */
+
+void
+test_fma (__vector_pair *p,
+	  __vector_pair *q,
+	  __vector_pair *r,
+	  __vector_pair *s)
+{
+  /* lxvp, 2 xvmadd{a,m}dp, stxvp.  */
+  __vector_pair mul = __builtin_vpair_f64_mul (*q, *r);
+  *p = __builtin_vpair_f64_add (mul, *s);
+}
+
+void
+test_fms (__vector_pair *p,
+	  __vector_pair *q,
+	  __vector_pair *r,
+	  __vector_pair *s)
+{
+  /* lxvp, 2 xvmsub{a,m}dp, stxvp.  */
+  __vector_pair mul = __builtin_vpair_f64_mul (*q, *r);
+  __vector_pair neg = __builtin_vpair_f64_neg (*s);
+  *p = __builtin_vpair_f64_add (mul, neg);
+}
+
+void
+test_nfma (__vector_pair *p,
+	   __vector_pair *q,
+	   __vector_pair *r,
+	   __vector_pair *s)
+{
+  /* lxvp, 2 xvnmadd{a,m}dp, stxvp.  */
+  __vector_pair mul = __builtin_vpair_f64_mul (*q, *r);
+  __vector_pair muladd = __builtin_vpair_f64_add (mul, *s);
+  *p = __builtin_vpair_f64_neg (muladd);
+}
+
+void
+test_nfms (__vector_pair *p,
+	   __vector_pair *q,
+	   __vector_pair *r,
+	   __vector_pair *s)
+{
+  /* lxvp, 2 xvnmsub{a,m}dp, stxvp.  */
+  __vector_pair mul = __builtin_vpair_f64_mul (*q, *r);
+  __vector_pair neg = __builtin_vpair_f64_neg (*s);
+  __vector_pair muladd = __builtin_vpair_f64_add (mul, neg);
+  *p = __builtin_vpair_f64_neg (muladd);
+}
+
+/* { dg-final { scan-assembler-times {\mlxvp\M}       12 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M}       4 } } */
+/* { dg-final { scan-assembler-times {\mxvmadd.dp\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mxvmsub.dp\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mxvnmadd.dp\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mxvnmsub.dp\M}  2 } } */