[gcc(refs/users/meissner/heads/work148-vpair)] <intro for the vector pair built-in patches>

public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed

* [gcc(refs/users/meissner/heads/work148-vpair)] <intro for the vector pair built-in patches>
@ 2023-11-28  6:06 Michael Meissner
  0 siblings, 0 replies; only message in thread
From: Michael Meissner @ 2023-11-28  6:06 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:766acc67bdbe5e00784e294602a82e674c728626

commit 766acc67bdbe5e00784e294602a82e674c728626
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Tue Nov 28 01:06:35 2023 -0500

    <intro for the vector pair built-in patches>
    
    These set of patches add support for using the vector pair load (lxvp, plxvp,
    and lxvpx) instructions and the vector pair store (stxvp, pstxvp, and stxvpx)
    that were introduced with ISA 3.1 on Power10 systems.
    
    With GCC 13, the only place vector pairs (and vector quads) were used were to
    feed into the MMA subsystem.  These patches do not use the MMA subsystem, but
    it gives users a way to write code that is extremely memory bandwidth
    intensive.
    
    There are two main ways to add vector pair support to the GCC compiler:
    built-in functions vs. __attribute__((__vector_size__(32))).
    
    The first method is to add a set of built-in functions that use the vector pair
    type and it allows the user to write loops and such using the vector pair type
    (__vector_pair).  Loads are normally done using the load vector pair
    instructions.  Then the operation is done as a post reload split to do the two
    independent vector operations on the two 128-bit vectors located in the vector
    pair.  When the type is stored, normally a store vector pair instruction is
    used.  By keeping the value within a vector pair through register allocation,
    the compiler does not generate extra move instructions which can slow down the
    loop.
    
    The second method is to add support for the V4DF, V8SF, etc. types.  By doing
    so, you can use the attribute __vector_size__(32)) to declare variables that
    are vector pairs, and the GCC compiler will generate the appropriate code.  I
    implemented a limited prototype of this support, but it has some problems that
    I haven't addressed.  One potential problem with using the 32-byte vector size
    is it can generate worse code for options that aren't covered withe as the
    compiler unpacks things and re-packs them.  The compiler would also generate
    these unpacks and packs if you are generating code for a power9 system.  There
    are a bunch of test cases that fail with my prototype implementation that I
    haven't addressed yet.
    
    After discussions within our group, it was decided that using built-in
    functions is the way to go at this time, and these patches are implement those
    functions.
    
    In terms of benchmarks, I wrote two benchmarks:
    
       1)   One benchmark is a saxpy type loop: value[i] += (a[i] * b[i]).  That is
            a loop with 3 loads and a store per loop.
    
       2)   Another benchmark produces a scalar sun of an entire vector.  This is a
            loop that just has a single load and no store.
    
    For the saxpy type loop, I get the following general numbers for both float and
    double:
    
       1)   The vector pair built-in functions are roughly 10% faster than using
            normal vector processing.
    
       2)   The vector pair built-in functions are roughly 19-20% faster than if I
            write the loop using the vector pair loads using the exist built-ins,
            and then manually split the values and do the arithmetic and single
            vector stores,
    
       3)   The vector pair built-in functions are roughly 35-40% faster than if I
            write the loop using the existing built-ins for both vector pair load
            and vector pair store.  If I apply the patches that Peter Bergner has
            been writing for PR target/109116, then it improves the speed of the
            existing built-ins for assembling and disassembling vector pairs.  In
            this case, the vector pair built-in functions are 20-25% faster,
            instead of 35-40% faster.  This is due to the patch eliminating extra
            vector moves.
    
    Unfortunately, for floating point, doing the sum of the whole vector is slower
    using the new vector pair built-in functions using a simple loop (compared to
    using the existing built-ins for disassembling vector pairs.  If I write more
    complex loops that manually unroll the loop, then the floating point vector
    pair built-in functions become like the integer vector pair integer built-in
    functions.  So there is some amount of tuning that will need to be done.
    
    There are 4 patches within this group of patches.
    
        1)  The first patch adds vector pair support for 32-bit and 64-bit floating
            point operations.  The operations provided are absolute value,
            addition, fused multiply-add, minimu, maximum, multiplication,
            negation, and subtraction.  I did not add divde or square root because
            these instructions take long enough to compute that you don't get any
            advantage of using the vector pair load/store instructions.
    
        2)  The second patch add vector pair support for 8-bit, 16-bit, 32-bit, and
            64-bit integer operations.  The operations provided include addition,
            bitwise and, bitwise inclusive or, bitwise exclusive or, bitwise not,
            both signed and unsigned minimum/maximu, negation, and subtraction.  I
            did not add multiply because the PowerPC architecture does not provide
            single instructions to do integer vector multiply on the whole vector.
            I could add shifts and rotates, but I didn't think memory intensive
            code used these operations.
    
        3)  The third patch adds methods to create vector pair values (zero, splat
            from a scalar value, and combine two 128-bit vectors), as well as a
            convenient method to exact one 128-bit vector from a vector pair.
    
        4)  The fourth patch adds horizontal addition for 32-bit, 64-bit floating
            point, and 64-bit integers.  I do wonder if there are more horizontal
            reductions that should be done.
    
    I have built and tested these patches on:
    
        *   A little endian power10 server using --with-cpu=power10
        *   A little endian power9 server using --with-cpu=power9
        *   A big endian power9 server using --with-cpu=power9.
    
    Can I check these patches into the master branch?
    
    ====================
    
    Add support for floating point vector pair built-in functions.
    
    This patch adds a series of built-in functions to allow users to write code to
    do a number of simple operations where the loop is done using the __vector_pair
    type.  The __vector_pair type is an opaque type.  These built-in functions keep
    the two 128-bit vectors within the __vector_pair together, and split the
    operation after register allocation.
    
    This patch provides vector pair operations for 32-bit floating point and 64-bit
    floating point.
    
    2023-11-17  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/rs6000-builtins.def (__builtin_vpair_f32_*): Add vector
            pair built-in functions for float.
            (__builtin_vpair_f64_*): Add vector pair built-in functions for double.
            * config/rs6000/rs6000-protos.h (split_unary_vector_pair): Add
            declaration.
            (split_binary_vector_pair): Likewise.
            (split_fma_vector_pair): Likewise.
            * config/rs6000/rs6000.cc (split_unary_vector_pair): New helper function
            for vector pair built-in functions.
            (split_binary_vector_pair): Likewise.
            (split_fma_vector_pair): Likewise.
            * config/rs6000/rs6000.md (toplevel): Include vector-pair.md.
            * config/rs6000/t-rs6000 (MD_INCLUDES): Add vector-pair.md.
            * config/rs6000/vector-pair.md: New file.
            * doc/extend.texi (PowerPC Vector Pair Built-in Functions): Document the
            floating point and general vector pair built-in functions.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/vector-pair-1.c: New test.
            * gcc.target/powerpc/vector-pair-2.c: New test.
            * gcc.target/powerpc/vector-pair-3.c: New test.
            * gcc.target/powerpc/vector-pair-4.c: New test.
    
    ====================
    
    Add support for integer point vector pair built-in functions.
    
    This patch adds a series of built-in functions to allow users to write code to
    do a number of simple operations where the loop is done using the __vector_pair
    type.  The __vector_pair type is an opaque type.  These built-in functions keep
    the two 128-bit vectors within the __vector_pair together, and split the
    operation after register allocation.
    
    This patch provides vector pair operations for 8, 16, 32, and 64-bit integers.
    
    I have built and tested these patches on:
    
        *   A little endian power10 server using --with-cpu=power10
        *   A little endian power9 server using --with-cpu=power9
        *   A big endian power9 server using --with-cpu=power9.
    
    Can I check this patch into the master branch after the preceeding patch is
    checked in?
    
    2023-11-17  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/rs6000-builtins.def (__builtin_vpair_i8*): Add built-in
            functions for integer vector pairs.
            (__builtin_vpair_i16*): Likeise.
            (__builtin_vpair_i32*): Likeise.
            (__builtin_vpair_i64*): Likeise.
            * config/rs6000/vector-pair.md (UNSPEC_VPAIR_V32QI): New unspec.
            (UNSPEC_VPAIR_V16HI): Likewise.
            (UNSPEC_VPAIR_V8SI): Likewise.
            (UNSPEC_VPAIR_V4DI): Likewise.
            (VP_INT_BINARY): New iterator for integer vector pair.
            (vp_insn): Add supoort for integer vector pairs.
            (vp_ireg): New code attribute for integer vector pairs.
            (vp_ipredicate): Likewise.
            (VP_INT): New int interator for integer vector pairs.
            (VP_VEC_MODE): Likewise.
            (vp_pmode): Likewise.
            (vp_vmode): Likewise.
            (vp_neg_reg): New int interator for integer vector pairs.
            (vpair_neg_<vp_pmode>): Add integer vector pair support insns.
            (vpair_not_<vp_pmode>2): Likewise.
            (vpair_<vp_insn>_<vp_pmode>3): Likewise.
            (vpair_andc_<vp_pmode): Likewise.
            (*vpair_iorc_<vp_pmode>): Likewise.
            (vpair_nand_<vp_pmode>_1): Likewise.
            (vpair_nand_<vp_pmode>_2): Likewise.
            (vpair_nor_<vp_pmode>_1): Likewise.
            (vpair_nor_<vp_pmode>_2): Likewise.
            * doc/extend.texi (PowerPC Vector Pair Built-in Functions): Document the
            integer vector pair built-in functions.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/vector-pair-5.c: New test.
            * gcc.target/powerpc/vector-pair-6.c: New test.
            * gcc.target/powerpc/vector-pair-7.c: New test.
            * gcc.target/powerpc/vector-pair-8.c: New test.
    
    ====================
    
    Add support for initializing and extracting from vector pairs.
    
    This patch adds a series of built-in functions to allow users to write code to
    do a number of simple operations where the loop is done using the __vector_pair
    type.  The __vector_pair type is an opaque type.  These built-in functions keep
    the two 128-bit vectors within the __vector_pair together, and split the
    operation after register allocation.
    
    This patch provides vector pair operations for loading up a vector pair with all
    0's, duplicated (splat) from a scalar type, or combining two vectors in a vector
    pair.  This patch also provides vector pair builtins to extract one vector
    element of a vector pair.
    
    I have built and tested these patches on:
    
        *   A little endian power10 server using --with-cpu=power10
        *   A little endian power9 server using --with-cpu=power9
        *   A big endian power9 server using --with-cpu=power9.
    
    Can I check this patch into the master branch after the preceeding patches have
    been checked in?
    
    2023-11-17  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/predicates.md (mma_assemble_input_operand): Allow any
            16-byte vector, not just V16QImode.
            * config/rs6000/rs6000-builtins.def (__builtin_vpair_zero): New vector
            pair initialization built-in functions.
            (__builtin_vpair_*_assemble): Likeise.
            (__builtin_vpair_*_splat): Likeise.
            (__builtin_vpair_*_extract_vector): New vector pair extraction built-in
            functions.
            * config/rs6000/vector-pair.md (UNSPEC_VPAIR_V32QI): New unspec.
            (UNSPEC_VPAIR_V16HI): Likewise.
            (UNSPEC_VPAIR_V8SI): Likewise.
            (UNSPEC_VPAIR_V4DI): Likewise.
            (VP_INT_BINARY): New iterator for integer vector pair.
            (vp_insn): Add supoort for integer vector pairs.
            (vp_ireg): New code attribute for integer vector pairs.
            (vp_ipredicate): Likewise.
            (VP_INT): New int interator for integer vector pairs.
            (VP_VEC_MODE): Likewise.
            (vp_pmode): Likewise.
            (vp_vmode): Likewise.
            (vp_neg_reg): New int interator for integer vector pairs.
            (vpair_neg_<vp_pmode>): Add integer vector pair support insns.
            (vpair_not_<vp_pmode>2): Likewise.
            (vpair_<vp_insn>_<vp_pmode>3): Likewise.
            (vpair_andc_<vp_pmode): Likewise.
            (vpair_iorc_<vp_pmode>): Likewise.
            (vpair_nand_<vp_pmode>_1): Likewise.
            (vpair_nand_<vp_pmode>_2): Likewise.
            (vpair_nor_<vp_pmode>_1): Likewise.
            (vpair_nor_<vp_pmode>_2): Likewise.
            * doc/extend.texi (PowerPC Vector Pair Built-in Functions): Document the
            integer vector pair built-in functions.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/vector-pair-5.c: New test.
            * gcc.target/powerpc/vector-pair-6.c: New test.
            * gcc.target/powerpc/vector-pair-7.c: New test.
            * gcc.target/powerpc/vector-pair-8.c: New test.
    
    ====================
    
    Add support for doing a horizontal add on vector pair elements.
    
    This patch adds a series of built-in functions to allow users to write code to
    do a number of simple operations where the loop is done using the __vector_pair
    type.  The __vector_pair type is an opaque type.  These built-in functions keep
    the two 128-bit vectors within the __vector_pair together, and split the
    operation after register allocation.
    
    This patch provides vector pair built-in functions to do a horizontal add on
    vector pair elements.  Only floating point and 64-bit horizontal adds are
    provided in this patch.
    
    I have built and tested these patches on:
    
        *   A little endian power10 server using --with-cpu=power10
        *   A little endian power9 server using --with-cpu=power9
        *   A big endian power9 server using --with-cpu=power9.
    
    Can I check this patch into the master branch after the preceeding patches have
    been checked in?
    
    2023-11-17  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/rs6000-builtins.def (__builtin_vpair_f32_add_elements):
            New built-in function.
            (__builtin_vpair_f64_add_elements): Likewise.
            (__builtin_vpair_i64_add_elements): Likewise.
            (__builtin_vpair_i64u_add_elements): Likewise.
            * config/rs6000/vector-pair.md (UNSPEC_VPAIR_REDUCE_PLUS_F32): New
            unspec.
            (UNSPEC_VPAIR_REDUCE_PLUS_F64): Likewise.
            (UNSPEC_VPAIR_REDUCE_PLUS_I64): Likewise.
            (vpair_reduc_plus_scale_v8sf): New insn.
            (vpair_reduc_plus_scale_v4df): Likewise.
            (vpair_reduc_plus_scale_v4di): Likewise.
            * doc/extend.texi (__builtin_vpair_f32_add_elements): Document.
            (__builtin_vpair_f64_add_elements): Likewise.
            (__builtin_vpair_i64_add_elements): Likewise.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/vector-pair-16.c: New test.
    
    ====================
    
    Add overloads for __builtin_vpair_assemble.
    
    2023-11-17  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/rs6000-overloads.def (__builtin_vpair_assemble): Add
            overloads.
    
    ====================
    
    Rename things so it can be combined with the vsize branch.
    
    2023-11-17  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/rs6000-builtins.def (__builtin_vpair*): Rename all insn
            names from VPAIR... to VPAIR_FUNC... to allow building the combined
            vsubreg branch.
            * config/rs6000/rs6000-overload.def (__builtin_vpair*): Likewise.
            * config/rs6000/rs6000.md (toplevel): Include vpair-func.md instead of
            vector-pair.md.
            * config/rs6000/t-rs6000: (MD_INCLUDES): Change vector-pair.md to
            vpair-func.md.
            * config/rs6000/vpair-func.md: Rename from vector-pair.md to
            vpair-func.md.  Change all VPAIR names to be VPAIR_FUNC.

Diff:
---
 gcc/config/rs6000/predicates.md                   |   2 +-
 gcc/config/rs6000/rs6000-builtins.def             | 303 ++++++++
 gcc/config/rs6000/rs6000-overload.def             |  22 +
 gcc/config/rs6000/rs6000-protos.h                 |   5 +
 gcc/config/rs6000/rs6000.cc                       |  74 ++
 gcc/config/rs6000/rs6000.md                       |   1 +
 gcc/config/rs6000/t-rs6000                        |   1 +
 gcc/config/rs6000/vpair-func.md                   | 881 ++++++++++++++++++++++
 gcc/doc/extend.texi                               | 165 ++++
 gcc/testsuite/gcc.target/powerpc/vector-pair-1.c  | 135 ++++
 gcc/testsuite/gcc.target/powerpc/vector-pair-10.c |  86 +++
 gcc/testsuite/gcc.target/powerpc/vector-pair-11.c |  84 +++
 gcc/testsuite/gcc.target/powerpc/vector-pair-12.c | 156 ++++
 gcc/testsuite/gcc.target/powerpc/vector-pair-13.c | 139 ++++
 gcc/testsuite/gcc.target/powerpc/vector-pair-14.c | 141 ++++
 gcc/testsuite/gcc.target/powerpc/vector-pair-15.c | 139 ++++
 gcc/testsuite/gcc.target/powerpc/vector-pair-16.c |  45 ++
 gcc/testsuite/gcc.target/powerpc/vector-pair-2.c  | 134 ++++
 gcc/testsuite/gcc.target/powerpc/vector-pair-3.c  |  60 ++
 gcc/testsuite/gcc.target/powerpc/vector-pair-4.c  |  60 ++
 gcc/testsuite/gcc.target/powerpc/vector-pair-5.c  | 193 +++++
 gcc/testsuite/gcc.target/powerpc/vector-pair-6.c  | 193 +++++
 gcc/testsuite/gcc.target/powerpc/vector-pair-7.c  | 193 +++++
 gcc/testsuite/gcc.target/powerpc/vector-pair-8.c  | 194 +++++
 gcc/testsuite/gcc.target/powerpc/vector-pair-9.c  |  13 +
 25 files changed, 3418 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index ef7d3f214c4..922a77716c4 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -1301,7 +1301,7 @@
 
 ;; Return 1 if this operand is valid for a MMA assemble accumulator insn.
 (define_special_predicate "mma_assemble_input_operand"
-  (match_test "(mode == V16QImode
+  (match_test "(VECTOR_MODE_P (mode) && GET_MODE_SIZE (mode) == 16
 		&& (vsx_register_operand (op, mode)
 		    || (MEM_P (op)
 			&& (indexed_or_indirect_address (XEXP (op, 0), mode)
diff --git a/gcc/config/rs6000/rs6000-builtins.def b/gcc/config/rs6000/rs6000-builtins.def
index ce40600e803..c66923e3c50 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -4131,3 +4131,306 @@
 
   void __builtin_vsx_stxvp (v256, unsigned long, const v256 *);
     STXVP nothing {mma,pair}
+
+;; General vector pair built-in functions
+
+  v256 __builtin_vpair_zero ();
+    VPAIR_FUNC_ZERO vpair_func_zero {mma}
+
+;; vector pair built-in functions for 8 32-bit float values
+
+  v256 __builtin_vpair_f32_abs (v256);
+    VPAIR_FUNC_F32_ABS vpair_func_abs_v8sf2 {mma,pair}
+
+  v256 __builtin_vpair_f32_add (v256, v256);
+    VPAIR_FUNC_F32_ADD vpair_func_add_v8sf3 {mma,pair}
+
+  float __builtin_vpair_f32_add_elements (v256);
+    VPAIR_FUNC_F32_ADD_ELEMENTS vpair_func_reduc_plus_scale_v8sf {mma,pair}
+
+  v256 __builtin_vpair_f32_assemble (vf, vf);
+    VPAIR_FUNC_F32_ASSEMBLE vpair_func_assemble_v8sf {mma,pair}
+
+  vf __builtin_vpair_f32_extract_vector (v256, const int<1>);
+    VPAIR_FUNC_F32_EXTRACT_VECTOR vpair_func_extract_vector_v8sf {mma,pair}
+
+  v256 __builtin_vpair_f32_fma (v256, v256, v256);
+    VPAIR_FUNC_F32_FMA vpair_func_fma_v8sf4 {mma,pair}
+
+  v256 __builtin_vpair_f32_max (v256, v256);
+    VPAIR_FUNC_F32_MAX vpair_func_smax_v8sf3 {mma,pair}
+
+  v256 __builtin_vpair_f32_min (v256, v256);
+    VPAIR_FUNC_F32_MIN vpair_func_smin_v8sf3 {mma,pair}
+
+  v256 __builtin_vpair_f32_mul (v256, v256);
+    VPAIR_FUNC_F32_MUL vpair_func_mul_v8sf3 {mma,pair}
+
+  v256 __builtin_vpair_f32_neg (v256);
+    VPAIR_FUNC_F32_NEG vpair_func_neg_v8sf2 {mma,pair}
+
+  v256 __builtin_vpair_f32_splat (float);
+    VPAIR_FUNC_F32_SPLAT vpair_func_splat_v8sf {mma,pair}
+
+  v256 __builtin_vpair_f32_sub (v256, v256);
+    VPAIR_FUNC_F32_SUB vpair_func_sub_v8sf3 {mma,pair}
+
+;; vector pair built-in functions for 4 64-bit double values
+
+  v256 __builtin_vpair_f64_abs (v256);
+    VPAIR_FUNC_F64_ABS vpair_func_abs_v4df2 {mma,pair}
+
+  v256 __builtin_vpair_f64_add (v256, v256);
+    VPAIR_FUNC_F64_ADD vpair_func_add_v4df3 {mma,pair}
+
+  double __builtin_vpair_f64_add_elements (v256);
+    VPAIR_FUNC_F64_ADD_ELEMENTS vpair_func_reduc_plus_scale_v4df {mma,pair}
+
+v256 __builtin_vpair_f64_assemble (vd, vd);
+    VPAIR_FUNC_F64_ASSEMBLE vpair_func_assemble_v4df {mma,pair}
+
+  vd __builtin_vpair_f64_extract_vector (v256, const int<1>);
+    VPAIR_FUNC_F64_EXTRACT_VECTOR vpair_func_extract_vector_v4df {mma,pair}
+
+  v256 __builtin_vpair_f64_fma (v256, v256, v256);
+    VPAIR_FUNC_F64_FMA vpair_func_fma_v4df4 {mma,pair}
+
+  v256 __builtin_vpair_f64_max (v256, v256);
+    VPAIR_FUNC_F64_MAX vpair_func_smax_v4df3 {mma,pair}
+
+  v256 __builtin_vpair_f64_min (v256, v256);
+    VPAIR_FUNC_F64_MIN vpair_func_smin_v4df3 {mma,pair}
+
+  v256 __builtin_vpair_f64_mul (v256, v256);
+    VPAIR_FUNC_F64_MUL vpair_func_mul_v4df3 {mma,pair}
+
+  v256 __builtin_vpair_f64_neg (v256);
+    VPAIR_FUNC_F64_NEG vpair_func_neg_v4df2 {mma,pair}
+
+  v256 __builtin_vpair_f64_splat (double);
+    VPAIR_FUNC_F64_SPLAT vpair_func_splat_v4df {mma,pair}
+
+  v256 __builtin_vpair_f64_sub (v256, v256);
+    VPAIR_FUNC_F64_SUB vpair_func_sub_v4df3 {mma,pair}
+
+;; vector pair built-in functions for 32 8-bit unsigned char or
+;; signed char values
+
+  v256 __builtin_vpair_i8_add (v256, v256);
+    VPAIR_FUNC_I8_ADD vpair_func_add_v32qi3 {mma,pair}
+
+  v256 __builtin_vpair_i8_and (v256, v256);
+    VPAIR_FUNC_I8_AND vpair_func_and_v32qi3 {mma,pair}
+
+  v256 __builtin_vpair_i8_assemble (vsc, vsc);
+    VPAIR_FUNC_I8_ASSEMBLE vpair_func_assemble_v32qi {mma,pair}
+
+  vsc __builtin_vpair_i8_extract_vector (v256, const int<1>);
+    VPAIR_FUNC_I8_EXTRACT_VECTOR vpair_func_extract_vector_v32qi {mma,pair}
+
+  v256 __builtin_vpair_i8_ior (v256, v256);
+    VPAIR_FUNC_I8_IOR vpair_func_ior_v32qi3 {mma,pair}
+
+  v256 __builtin_vpair_i8_max (v256, v256);
+    VPAIR_FUNC_I8_MAX vpair_func_smax_v32qi3 {mma,pair}
+
+  v256 __builtin_vpair_i8_min (v256, v256);
+    VPAIR_FUNC_I8_MIN vpair_func_smin_v32qi3 {mma,pair}
+
+  v256 __builtin_vpair_i8_neg (v256);
+    VPAIR_FUNC_I8_NEG vpair_func_neg_v32qi2 {mma,pair}
+
+  v256 __builtin_vpair_i8_not (v256);
+    VPAIR_FUNC_I8_NOT vpair_func_not_v32qi2 {mma,pair}
+
+  v256 __builtin_vpair_i8_splat (signed char);
+    VPAIR_FUNC_I8_SPLAT vpair_func_splat_v32qi {mma,pair}
+
+  v256 __builtin_vpair_i8_sub (v256, v256);
+    VPAIR_FUNC_I8_SUB vpair_func_sub_v32qi3 {mma,pair}
+
+  v256 __builtin_vpair_i8_xor (v256, v256);
+    VPAIR_FUNC_I8_XOR vpair_func_xor_v32qi3 {mma,pair}
+
+  v256 __builtin_vpair_i8u_assemble (vuc, vuc);
+    VPAIR_FUNC_I8U_ASSEMBLE vpair_func_assemble_v32qi {mma,pair}
+
+  vuc __builtin_vpair_i8u_extract_vector (v256, const int<1>);
+    VPAIR_FUNC_I8U_EXTRACT_VECTOR vpair_func_extract_vector_v32qi {mma,pair}
+
+  v256 __builtin_vpair_i8u_max (v256, v256);
+    VPAIR_FUNC_I8U_MAX vpair_func_umax_v32qi3 {mma,pair}
+
+  v256 __builtin_vpair_i8u_min (v256, v256);
+    VPAIR_FUNC_I8U_MIN vpair_func_umin_v32qi3 {mma,pair}
+
+  v256 __builtin_vpair_i8u_splat (unsigned char);
+    VPAIR_FUNC_I8U_SPLAT vpair_func_splat_v32qi {mma,pair}
+
+;; vector pair built-in functions for 16 16-bit unsigned short or
+;; signed short values
+
+  v256 __builtin_vpair_i16_add (v256, v256);
+    VPAIR_FUNC_I16_ADD vpair_func_add_v16hi3 {mma,pair}
+
+  v256 __builtin_vpair_i16_and (v256, v256);
+    VPAIR_FUNC_I16_AND vpair_func_and_v16hi3 {mma,pair}
+
+  v256 __builtin_vpair_i16_assemble (vss, vss);
+    VPAIR_FUNC_I16_ASSEMBLE vpair_func_assemble_v16hi {mma,pair}
+
+  vss __builtin_vpair_i16_extract_vector (v256, const int<1>);
+    VPAIR_FUNC_I16_EXTRACT_VECTOR vpair_func_extract_vector_v16hi {mma,pair}
+
+  v256 __builtin_vpair_i16_ior (v256, v256);
+    VPAIR_FUNC_I16_IOR vpair_func_ior_v16hi3 {mma,pair}
+
+  v256 __builtin_vpair_i16_max (v256, v256);
+    VPAIR_FUNC_I16_MAX vpair_func_smax_v16hi3 {mma,pair}
+
+  v256 __builtin_vpair_i16_min (v256, v256);
+    VPAIR_FUNC_I16_MIN vpair_func_smin_v16hi3 {mma,pair}
+
+  v256 __builtin_vpair_i16_neg (v256);
+    VPAIR_FUNC_I16_NEG vpair_func_neg_v16hi2 {mma,pair}
+
+  v256 __builtin_vpair_i16_not (v256);
+    VPAIR_FUNC_I16_NOT vpair_func_not_v16hi2 {mma,pair}
+
+  v256 __builtin_vpair_i16_splat (short);
+    VPAIR_FUNC_I16_SPLAT vpair_func_splat_v16hi {mma,pair}
+
+  v256 __builtin_vpair_i16_sub (v256, v256);
+    VPAIR_FUNC_I16_SUB vpair_func_sub_v16hi3 {mma,pair}
+
+  v256 __builtin_vpair_i16_xor (v256, v256);
+    VPAIR_FUNC_I16_XOR vpair_func_xor_v16hi3 {mma,pair}
+
+  v256 __builtin_vpair_i16u_assemble (vus, vus);
+    VPAIR_FUNC_I16U_ASSEMBLE vpair_func_assemble_v16hi {mma,pair}
+
+  vus __builtin_vpair_i16u_extract_vector (v256, const int<1>);
+    VPAIR_FUNC_I16U_EXTRACT_VECTOR vpair_func_extract_vector_v16hi {mma,pair}
+
+  v256 __builtin_vpair_i16u_max (v256, v256);
+    VPAIR_FUNC_I16U_MAX vpair_func_umax_v16hi3 {mma,pair}
+
+  v256 __builtin_vpair_i16u_min (v256, v256);
+    VPAIR_FUNC_I16U_MIN vpair_func_umin_v16hi3 {mma,pair}
+
+  v256 __builtin_vpair_i16u_splat (unsigned short);
+    VPAIR_FUNC_I16U_SPLAT vpair_func_splat_v16hi {mma,pair}
+
+;; vector pair built-in functions for 8 32-bit unsigned int or
+;; signed int values
+
+  v256 __builtin_vpair_i32_add (v256, v256);
+    VPAIR_FUNC_I32_ADD vpair_func_add_v8si3 {mma,pair}
+
+  v256 __builtin_vpair_i32_and (v256, v256);
+    VPAIR_FUNC_I32_AND vpair_func_and_v8si3 {mma,pair}
+
+  v256 __builtin_vpair_i32_assemble (vsi, vsi);
+    VPAIR_FUNC_I32_ASSEMBLE vpair_func_assemble_v8si {mma,pair}
+
+  vsi __builtin_vpair_i32_extract_vector (v256, const int<1>);
+    VPAIR_FUNC_I32_EXTRACT_VECTOR vpair_func_extract_vector_v8si {mma,pair}
+
+  v256 __builtin_vpair_i32_ior (v256, v256);
+    VPAIR_FUNC_I32_IOR vpair_func_ior_v8si3 {mma,pair}
+
+  v256 __builtin_vpair_i32_max (v256, v256);
+    VPAIR_FUNC_I32_MAX vpair_func_smax_v8si3 {mma,pair}
+
+  v256 __builtin_vpair_i32_min (v256, v256);
+    VPAIR_FUNC_I32_MIN vpair_func_smin_v8si3 {mma,pair}
+
+  v256 __builtin_vpair_i32_neg (v256);
+    VPAIR_FUNC_I32_NEG vpair_func_neg_v8si2 {mma,pair}
+
+  v256 __builtin_vpair_i32_not (v256);
+    VPAIR_FUNC_I32_NOT vpair_func_not_v8si2 {mma,pair}
+
+  v256 __builtin_vpair_i32_splat (int);
+    VPAIR_FUNC_I32_SPLAT vpair_func_splat_v8si {mma,pair}
+
+  v256 __builtin_vpair_i32_sub (v256, v256);
+    VPAIR_FUNC_I32_SUB vpair_func_sub_v8si3 {mma,pair}
+
+  v256 __builtin_vpair_i32_xor (v256, v256);
+    VPAIR_FUNC_I32_XOR vpair_func_xor_v8si3 {mma,pair}
+
+  v256 __builtin_vpair_i32u_assemble (vui, vui);
+    VPAIR_FUNC_I32U_ASSEMBLE vpair_func_assemble_v8si {mma,pair}
+
+  vui __builtin_vpair_i32u_extract_vector (v256, const int<1>);
+    VPAIR_FUNC_I32U_EXTRACT_VECTOR vpair_func_extract_vector_v8si {mma,pair}
+
+  v256 __builtin_vpair_i32u_max (v256, v256);
+    VPAIR_FUNC_I32U_MAX vpair_func_umax_v8si3 {mma,pair}
+
+  v256 __builtin_vpair_i32u_min (v256, v256);
+    VPAIR_FUNC_I32U_MIN vpair_func_umin_v8si3 {mma,pair}
+
+  v256 __builtin_vpair_i32u_splat (unsigned int);
+    VPAIR_FUNC_I32U_SPLAT vpair_func_splat_v8si {mma,pair}
+
+;; vector pair built-in functions for 4 64-bit unsigned long long or
+;; signed long long values
+
+  v256 __builtin_vpair_i64_add (v256, v256);
+    VPAIR_FUNC_I64_ADD vpair_func_add_v4di3 {mma,pair}
+
+  long long __builtin_vpair_i64_add_elements (v256);
+    VPAIR_FUNC_I64_ADD_ELEMENTS vpair_func_reduc_plus_scale_v4di {mma,pair,no32bit}
+
+  v256 __builtin_vpair_i64_and (v256, v256);
+    VPAIR_FUNC_I64_AND vpair_func_and_v4di3 {mma,pair}
+
+  v256 __builtin_vpair_i64_assemble (vsll, vsll);
+    VPAIR_FUNC_I64_ASSEMBLE vpair_func_assemble_v4di {mma,pair}
+
+  vsll __builtin_vpair_i64_extract_vector (v256, const int<1>);
+    VPAIR_FUNC_I64_EXTRACT_VECTOR vpair_func_extract_vector_v4di {mma,pair}
+
+  v256 __builtin_vpair_i64_ior (v256, v256);
+    VPAIR_FUNC_I64_IOR vpair_func_ior_v4di3 {mma,pair}
+
+  v256 __builtin_vpair_i64_max (v256, v256);
+    VPAIR_FUNC_I64_MAX vpair_func_smax_v4di3 {mma,pair}
+
+  v256 __builtin_vpair_i64_min (v256, v256);
+    VPAIR_FUNC_I64_MIN vpair_func_smin_v4di3 {mma,pair}
+
+  v256 __builtin_vpair_i64_neg (v256);
+    VPAIR_FUNC_I64_NEG vpair_func_neg_v4di2 {mma,pair}
+
+  v256 __builtin_vpair_i64_not (v256);
+    VPAIR_FUNC_I64_NOT vpair_func_not_v4di2 {mma,pair}
+
+  v256 __builtin_vpair_i64_splat (long long);
+    VPAIR_FUNC_I64_SPLAT vpair_func_splat_v4di {mma,pair}
+
+  v256 __builtin_vpair_i64_sub (v256, v256);
+    VPAIR_FUNC_I64_SUB vpair_func_sub_v4di3 {mma,pair}
+
+  v256 __builtin_vpair_i64_xor (v256, v256);
+    VPAIR_FUNC_I64_XOR vpair_func_xor_v4di3 {mma,pair}
+
+  unsigned long long __builtin_vpair_i64u_add_elements (v256);
+    VPAIR_FUNC_I64U_ADD_ELEMENTS vpair_func_reduc_plus_scale_v4di {mma,pair,no32bit}
+
+  v256 __builtin_vpair_i64u_assemble (vull, vull);
+    VPAIR_FUNC_I64U_ASSEMBLE vpair_func_assemble_v4di {mma,pair}
+
+  vull __builtin_vpair_i64u_extract_vector (v256, const int<1>);
+    VPAIR_FUNC_I64U_EXTRACT_VECTOR vpair_func_extract_vector_v4di {mma,pair}
+
+  v256 __builtin_vpair_i64u_max (v256, v256);
+    VPAIR_FUNC_I64U_MAX vpair_func_umax_v4di3 {mma,pair}
+
+  v256 __builtin_vpair_i64u_min (v256, v256);
+    VPAIR_FUNC_I64U_MIN vpair_func_umin_v4di3 {mma,pair}
+
+  v256 __builtin_vpair_i64u_splat (unsigned long long);
+    VPAIR_FUNC_I64U_SPLAT vpair_func_splat_v4di {mma,pair}
diff --git a/gcc/config/rs6000/rs6000-overload.def b/gcc/config/rs6000/rs6000-overload.def
index 38d92fcf1f0..0f965b91a7c 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -6197,3 +6197,25 @@
     VUPKLSW  VUPKLSW_DEPR1
   vbll __builtin_vec_vupklsw (vbi);
     VUPKLSW  VUPKLSW_DEPR2
+
+[VPAIR_ASSEMBLE, vpair_assemble, __builtin_vpair_assemble]
+  v256 __builtin_vpair_assemble (vf, vf);
+    VPAIR_FUNC_F32_ASSEMBLE
+  v256 __builtin_vpair_assemble (vd, vd);
+    VPAIR_FUNC_F64_ASSEMBLE
+  v256 __builtin_vpair_assemble (vull, vull);
+    VPAIR_FUNC_I64U_ASSEMBLE
+  v256 __builtin_vpair_assemble (vsll, vsll);
+    VPAIR_FUNC_I64_ASSEMBLE
+  v256 __builtin_vpair_assemble (vui, vui);
+    VPAIR_FUNC_I32U_ASSEMBLE
+  v256 __builtin_vpair_assemble (vsi, vsi);
+    VPAIR_FUNC_I32_ASSEMBLE
+  v256 __builtin_vpair_assemble (vus, vus);
+    VPAIR_FUNC_I16U_ASSEMBLE
+  v256 __builtin_vpair_assemble (vss, vss);
+    VPAIR_FUNC_I16_ASSEMBLE
+  v256 __builtin_vpair_assemble (vuc, vuc);
+    VPAIR_FUNC_I8U_ASSEMBLE
+  v256 __builtin_vpair_assemble (vsc, vsc);
+    VPAIR_FUNC_I8_ASSEMBLE
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index f70118ea40f..bbd899d7562 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -138,6 +138,11 @@ extern void rs6000_emit_swsqrt (rtx, rtx, bool);
 extern void output_toc (FILE *, rtx, int, machine_mode);
 extern void rs6000_fatal_bad_address (rtx);
 extern rtx create_TOC_reference (rtx, rtx);
+extern void split_unary_vector_pair (machine_mode, rtx [], rtx (*)(rtx, rtx));
+extern void split_binary_vector_pair (machine_mode, rtx [],
+				      rtx (*)(rtx, rtx, rtx));
+extern void split_fma_vector_pair (machine_mode, rtx [],
+				   rtx (*)(rtx, rtx, rtx, rtx));
 extern void rs6000_split_multireg_move (rtx, rtx);
 extern void rs6000_emit_le_vsx_permute (rtx, rtx, machine_mode);
 extern void rs6000_emit_le_vsx_move (rtx, rtx, machine_mode);
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 0dd21e67dde..2c30bfb0e70 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -27408,6 +27408,80 @@ rs6000_split_logical (rtx operands[3],
   return;
 }
 
+/* Split a unary vector pair insn into two separate vector insns.  */
+
+void
+split_unary_vector_pair (machine_mode mode,		/* vector mode.  */
+			 rtx operands[],		/* dest, src.  */
+			 rtx (*func)(rtx, rtx))		/* create insn.  */
+{
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+  machine_mode orig_mode = GET_MODE (op0);
+
+  rtx reg0_vector0 = simplify_gen_subreg (mode, op0, orig_mode, 0);
+  rtx reg1_vector0 = simplify_gen_subreg (mode, op1, orig_mode, 0);
+  rtx reg0_vector1 = simplify_gen_subreg (mode, op0, orig_mode, 16);
+  rtx reg1_vector1 = simplify_gen_subreg (mode, op1, orig_mode, 16);
+
+  emit_insn (func (reg0_vector0, reg1_vector0));
+  emit_insn (func (reg0_vector1, reg1_vector1));
+  return;
+}
+
+/* Split a binary vector pair insn into two separate vector insns.  */
+
+void
+split_binary_vector_pair (machine_mode mode,		/* vector mode.  */
+			 rtx operands[],		/* dest, src.  */
+			 rtx (*func)(rtx, rtx, rtx))	/* create insn.  */
+{
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+  rtx op2 = operands[2];
+  machine_mode orig_mode = GET_MODE (op0);
+
+  rtx reg0_vector0 = simplify_gen_subreg (mode, op0, orig_mode, 0);
+  rtx reg1_vector0 = simplify_gen_subreg (mode, op1, orig_mode, 0);
+  rtx reg2_vector0 = simplify_gen_subreg (mode, op2, orig_mode, 0);
+  rtx reg0_vector1 = simplify_gen_subreg (mode, op0, orig_mode, 16);
+  rtx reg1_vector1 = simplify_gen_subreg (mode, op1, orig_mode, 16);
+  rtx reg2_vector1 = simplify_gen_subreg (mode, op2, orig_mode, 16);
+
+  emit_insn (func (reg0_vector0, reg1_vector0, reg2_vector0));
+  emit_insn (func (reg0_vector1, reg1_vector1, reg2_vector1));
+  return;
+}
+
+/* Split a fused multiply-add vector pair insn into two separate vector
+   insns.  */
+
+void
+split_fma_vector_pair (machine_mode mode,		/* vector mode.  */
+		       rtx operands[],			/* dest, src.  */
+		       rtx (*func)(rtx, rtx, rtx, rtx))	/* create insn.  */
+{
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+  rtx op2 = operands[2];
+  rtx op3 = operands[3];
+  machine_mode orig_mode = GET_MODE (op0);
+
+  rtx reg0_vector0 = simplify_gen_subreg (mode, op0, orig_mode, 0);
+  rtx reg1_vector0 = simplify_gen_subreg (mode, op1, orig_mode, 0);
+  rtx reg2_vector0 = simplify_gen_subreg (mode, op2, orig_mode, 0);
+  rtx reg3_vector0 = simplify_gen_subreg (mode, op3, orig_mode, 0);
+
+  rtx reg0_vector1 = simplify_gen_subreg (mode, op0, orig_mode, 16);
+  rtx reg1_vector1 = simplify_gen_subreg (mode, op1, orig_mode, 16);
+  rtx reg2_vector1 = simplify_gen_subreg (mode, op2, orig_mode, 16);
+  rtx reg3_vector1 = simplify_gen_subreg (mode, op3, orig_mode, 16);
+
+  emit_insn (func (reg0_vector0, reg1_vector0, reg2_vector0, reg3_vector0));
+  emit_insn (func (reg0_vector1, reg1_vector1, reg2_vector1, reg3_vector1));
+  return;
+}
+
 /* Emit instructions to move SRC to DST.  Called by splitters for
    multi-register moves.  It will emit at most one instruction for
    each register that is accessed; that is, it won't emit li/lis pairs
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index dcf1f3526f5..1243fad9753 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -15767,6 +15767,7 @@
 (include "vsx.md")
 (include "altivec.md")
 (include "mma.md")
+(include "vpair-func.md")
 (include "dfp.md")
 (include "crypto.md")
 (include "htm.md")
diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000
index f183b42ce1d..592e6cdf1e2 100644
--- a/gcc/config/rs6000/t-rs6000
+++ b/gcc/config/rs6000/t-rs6000
@@ -128,6 +128,7 @@ MD_INCLUDES = $(srcdir)/config/rs6000/rs64.md \
 	$(srcdir)/config/rs6000/vsx.md \
 	$(srcdir)/config/rs6000/altivec.md \
 	$(srcdir)/config/rs6000/mma.md \
+	$(srcdir)/config/rs6000/vpair-func.md \
 	$(srcdir)/config/rs6000/crypto.md \
 	$(srcdir)/config/rs6000/htm.md \
 	$(srcdir)/config/rs6000/dfp.md \
diff --git a/gcc/config/rs6000/vpair-func.md b/gcc/config/rs6000/vpair-func.md
new file mode 100644
index 00000000000..0f967c8dab6
--- /dev/null
+++ b/gcc/config/rs6000/vpair-func.md
@@ -0,0 +1,881 @@
+;; Vector pair arithmetic support.
+;; Copyright (C) 2020-2023 Free Software Foundation, Inc.
+;; Contributed by Peter Bergner <bergner@linux.ibm.com> and
+;;		  Michael Meissner <meissner@linux.ibm.com>
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published
+;; by the Free Software Foundation; either version 3, or (at your
+;; option) any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but WITHOUT
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+;; License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+;;
+;; This file adds support for doing vector operations on pairs of vector
+;; registers.  Most of the instructions use vector pair instructions to load
+;; and possibly store registers, but splitting the operation after register
+;; allocation to do 2 separate operations.  The second scheduler pass can
+;; interleave other instructions between these pairs of instructions if
+;; possible.
+
+(define_c_enum "unspec"
+  [UNSPEC_VPAIR_FUNC_V4DF
+   UNSPEC_VPAIR_FUNC_V8SF
+   UNSPEC_VPAIR_FUNC_V32QI
+   UNSPEC_VPAIR_FUNC_V16HI
+   UNSPEC_VPAIR_FUNC_V8SI
+   UNSPEC_VPAIR_FUNC_V4DI
+   UNSPEC_VPAIR_FUNC_ZERO
+   UNSPEC_VPAIR_FUNC_SPLAT
+   UNSPEC_VPAIR_FUNC_REDUCE_PLUS_F32
+   UNSPEC_VPAIR_FUNC_REDUCE_PLUS_F64
+   UNSPEC_VPAIR_FUNC_REDUCE_PLUS_I64
+   ])
+
+;; Iterator doing unary/binary arithmetic on vector pairs
+(define_code_iterator VP_FUNC_FP_UNARY  [abs neg])
+(define_code_iterator VP_FUNC_FP_BINARY [minus mult plus smin smax])
+
+(define_code_iterator VP_FUNC_INT_BINARY  [and ior minus plus smax smin umax umin xor])
+
+;; Return the insn name from the VP_* code iterator
+(define_code_attr vp_func_insn [(abs      "abs")
+				(and      "and")
+				(ior      "ior")
+				(minus    "sub")
+				(mult     "mul")
+				(not      "one_cmpl")
+				(neg      "neg")
+				(plus     "add")
+				(smin     "smin")
+				(smax     "smax")
+				(umin     "umin")
+				(umax     "umax")
+				(xor      "xor")])
+
+;; Return the register constraint ("v" or "wa") for the integer code iterator
+;; used.  For arithmetic operations, we need to use "v" in order to use the
+;; Altivec instruction.  For logical operations, we can use wa.
+(define_code_attr vp_func_ireg [(and   "wa")
+				(ior   "wa")
+				(minus "v")
+				(not   "wa")
+				(neg   "v")
+				(plus  "v")
+				(smax  "v")
+				(smin  "v")
+				(umax  "v")
+				(umin  "v")
+				(xor   "wa")])
+
+;; Return the register previdcate for the integer code iterator used
+(define_code_attr vp_func_ipredicate [(and   "vsx_register_operand")
+				      (ior   "vsx_register_operand")
+				      (minus "altivec_register_operand")
+				      (not   "vsx_register_operand")
+				      (neg   "altivec_register_operand")
+				      (plus  "altivec_register_operand")
+				      (smax  "altivec_register_operand")
+				      (smin  "altivec_register_operand")
+				      (umax  "altivec_register_operand")
+				      (umin  "altivec_register_operand")
+				      (xor   "vsx_register_operand")])
+
+;; Iterator for creating the unspecs for vector pair built-ins
+(define_int_iterator VP_FUNC_FP [UNSPEC_VPAIR_FUNC_V4DF
+				 UNSPEC_VPAIR_FUNC_V8SF])
+
+(define_int_iterator VP_FUNC_INT [UNSPEC_VPAIR_FUNC_V4DI
+				  UNSPEC_VPAIR_FUNC_V8SI
+				  UNSPEC_VPAIR_FUNC_V16HI
+				  UNSPEC_VPAIR_FUNC_V32QI])
+
+(define_int_iterator VP_FUNC_ALL [UNSPEC_VPAIR_FUNC_V4DF
+				  UNSPEC_VPAIR_FUNC_V8SF
+				  UNSPEC_VPAIR_FUNC_V4DI
+				  UNSPEC_VPAIR_FUNC_V8SI
+				  UNSPEC_VPAIR_FUNC_V16HI
+				  UNSPEC_VPAIR_FUNC_V32QI])
+
+;; Map VP_* to vector mode of the arguments after they are split
+(define_int_attr VP_VEC_MODE [(UNSPEC_VPAIR_FUNC_V4DF  "V2DF")
+			      (UNSPEC_VPAIR_FUNC_V8SF  "V4SF")
+			      (UNSPEC_VPAIR_FUNC_V32QI "V16QI")
+			      (UNSPEC_VPAIR_FUNC_V16HI "V8HI")
+			      (UNSPEC_VPAIR_FUNC_V8SI  "V4SI")
+			      (UNSPEC_VPAIR_FUNC_V4DI  "V2DI")])
+
+;; Map VP_* to a lower case name to identify the vector pair.
+(define_int_attr vp_pmode [(UNSPEC_VPAIR_FUNC_V4DF  "v4df")
+			   (UNSPEC_VPAIR_FUNC_V8SF  "v8sf")
+			   (UNSPEC_VPAIR_FUNC_V32QI "v32qi")
+			   (UNSPEC_VPAIR_FUNC_V16HI "v16hi")
+			   (UNSPEC_VPAIR_FUNC_V8SI  "v8si")
+			   (UNSPEC_VPAIR_FUNC_V4DI  "v4di")])
+
+;; Map VP_* to a lower case name to identify the vector after the vector pair
+;; has been split.
+(define_int_attr vp_vmode [(UNSPEC_VPAIR_FUNC_V4DF  "v2df")
+			   (UNSPEC_VPAIR_FUNC_V8SF  "v4sf")
+			   (UNSPEC_VPAIR_FUNC_V32QI "v16qi")
+			   (UNSPEC_VPAIR_FUNC_V16HI "v8hi")
+			   (UNSPEC_VPAIR_FUNC_V8SI  "v4si")
+			   (UNSPEC_VPAIR_FUNC_V4DI  "v2di")])
+
+;; Map VP_FUNC_INT to constraints used for the negate scratch register.  For vectors
+;; of QI and HI, we need to change -a into 0 - a since we don't have a negate
+;; operation.  We do have a vnegw/vnegd operation for SI and DI modes.
+(define_int_attr vp_neg_reg [(UNSPEC_VPAIR_FUNC_V32QI "&v")
+			     (UNSPEC_VPAIR_FUNC_V16HI "&v")
+			     (UNSPEC_VPAIR_FUNC_V8SI  "X")
+			     (UNSPEC_VPAIR_FUNC_V4DI  "X")])
+
+;; Moddes of the vector element to splat to vector pair
+(define_mode_iterator VP_FUNC_SPLAT [DF SF DI SI HI QI])
+
+;; Moddes of the vector to splat to vector pair
+(define_mode_iterator VP_FUNC_SPLAT_VEC [V2DF V4SF V2DI V4SI V8HI V16QI])
+
+;; MAP VP_FUNC_SPLAT and VP_FUNC_SPLAT_VEC to the mode of the vector pair operation
+(define_mode_attr vp_func_splat_pmode [(DF    "v4df")
+				       (V2DF  "v4df")
+				       (SF    "v8sf")
+				       (V4SF  "v8sf")
+				       (DI    "v4di")
+				       (V2DI  "v4di")
+				       (SI    "v8si")
+				       (V4SI  "v8si")
+				       (HI    "v16hi")
+				       (V8HI  "v16hi")
+				       (QI    "v32qi")
+				       (V16QI "v32qi")])
+
+;; MAP VP_FUNC_SPLAT to the mode of the vector containing the element
+(define_mode_attr VP_FUNC_SPLAT_VMODE [(DF "V2DF")
+				       (SF "V4SF")
+				       (DI "V2DI")
+				       (SI "V4SI")
+				       (HI "V8HI")
+				       (QI "V16QI")])
+
+;; Initialize a vector pair to 0
+(define_insn_and_split "vpair_func_zero"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa")
+	(unspec:OO [(const_int 0)] UNSPEC_VPAIR_FUNC_ZERO))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 1) (match_dup 3))
+   (set (match_dup 2) (match_dup 3))]
+{
+  rtx op0 = operands[0];
+  unsigned offset_hi = (WORDS_BIG_ENDIAN) ? 0 : 16;
+  unsigned offset_lo = (WORDS_BIG_ENDIAN) ? 16 : 0;
+
+  operands[1] = simplify_gen_subreg (V2DImode, op0, OOmode, offset_hi);
+  operands[2] = simplify_gen_subreg (V2DImode, op0, OOmode, offset_lo);
+  operands[3] = CONST0_RTX (V2DImode);
+}
+  [(set_attr "length" "8")])
+
+;; Assemble a vector pair from two vectors.  Unlike
+;; __builtin_mma_assemble_pair, this function produces a vector pair output
+;; directly and it takes all of the vector types.
+;;
+;; We cannot update the two output registers atomically, so mark the output as
+;; an early clobber so we don't accidentally clobber the input operands.  */
+
+(define_insn_and_split "vpair_func_assemble_<vp_pmode>"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=&wa")
+	(unspec:OO
+	 [(match_operand:<VP_VEC_MODE> 1 "mma_assemble_input_operand" "mwa")
+	  (match_operand:<VP_VEC_MODE> 2 "mma_assemble_input_operand" "mwa")]
+	 VP_FUNC_ALL))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  rtx src = gen_rtx_UNSPEC (OOmode,
+			    gen_rtvec (2, operands[1], operands[2]),
+			    UNSPEC_VSX_ASSEMBLE);
+  rs6000_split_multireg_move (operands[0], src);
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+;; Extract one of the two 128-bit vectors from a vector pair.
+(define_insn_and_split "vpair_func_extract_vector_<vp_pmode>"
+  [(set (match_operand:<VP_VEC_MODE> 0 "vsx_register_operand" "=wa")
+	(unspec:<VP_VEC_MODE>
+	 [(match_operand:OO 1 "vsx_register_operand" "wa")
+	  (match_operand 2 "const_0_to_1_operand" "n")]
+	 VP_FUNC_ALL))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (match_dup 3))]
+{
+  machine_mode vmode = <VP_VEC_MODE>mode;
+  unsigned reg_num = UINTVAL (operands[2]);
+  if (!WORDS_BIG_ENDIAN)
+    reg_num = 1 - reg_num;
+	   
+  operands[3] = simplify_gen_subreg (vmode, operands[1], OOmode, reg_num * 16);
+})
+
+;; Optimize extracting an 128-bit vector from a vector pair in memory.
+(define_insn_and_split "*vpair_func_extract_vector_<vp_pmode>_mem"
+  [(set (match_operand:<VP_VEC_MODE> 0 "vsx_register_operand" "=wa")
+	(unspec:<VP_VEC_MODE>
+	 [(match_operand:OO 1 "memory_operand" "o")
+	  (match_operand 2 "const_0_to_1_operand" "n")]
+	 VP_FUNC_ALL))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (match_dup 3))]
+{
+  operands[3] = adjust_address (operands[1], <VP_VEC_MODE>mode,
+				16 * INTVAL (operands[2]));
+}
+  [(set_attr "type" "vecload")])
+
+;; Create a vector pair with a value splat'ed (duplicated) to all of the
+;; elements.
+(define_expand "vpair_func_splat_<vp_func_splat_pmode>"
+  [(use (match_operand:OO 0 "vsx_register_operand"))
+   (use (match_operand:VP_FUNC_SPLAT 1 "input_operand"))]
+  "TARGET_MMA"
+{
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+  machine_mode element_mode = <MODE>mode;
+  machine_mode vector_mode = <VP_FUNC_SPLAT_VMODE>mode;
+
+  if (op1 == CONST0_RTX (element_mode))
+    {
+      emit_insn (gen_vpair_func_zero (op0));
+      DONE;
+    }
+
+  rtx vec = gen_reg_rtx (vector_mode);
+  unsigned num_elements = GET_MODE_NUNITS (vector_mode);
+  rtvec elements = rtvec_alloc (num_elements);
+  for (size_t i = 0; i < num_elements; i++)
+    RTVEC_ELT (elements, i) = copy_rtx (op1);
+
+  rs6000_expand_vector_init (vec, gen_rtx_PARALLEL (vector_mode, elements));
+  emit_insn (gen_vpair_func_splat_<vp_func_splat_pmode>_internal (op0, vec));
+  DONE;
+})
+
+;; Inner splat support.  Operand1 is the vector splat created above.  Allow
+;; operand 1 to overlap with the output registers to eliminate one move
+;; instruction.
+(define_insn_and_split "vpair_func_splat_<vp_func_splat_pmode>_internal"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa")
+	(unspec:OO
+	 [(match_operand:VP_FUNC_SPLAT_VEC 1 "vsx_register_operand" "0,wa")]
+	 UNSPEC_VPAIR_FUNC_SPLAT))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+  rtx op0_vector0 = simplify_gen_subreg (<MODE>mode, op0, OOmode, 0);
+  rtx op0_vector1 = simplify_gen_subreg (<MODE>mode, op0, OOmode, 16);
+
+  /* Check if the input is one of the output registers.  */
+  if (rtx_equal_p (op0_vector0, op1))
+    emit_move_insn (op0_vector1, op1);
+
+  else if (rtx_equal_p (op0_vector1, op1))
+    emit_move_insn (op0_vector0, op1);
+
+  else
+    {
+      emit_move_insn (op0_vector0, op1);
+      emit_move_insn (op0_vector1, op1);
+    }
+
+  DONE;
+}
+  [(set_attr "length" "*,8")
+   (set_attr "type" "vecmove")])
+
+\f
+;; Vector pair floating point unary operations
+(define_insn_and_split "vpair_func_<vp_func_insn>_<vp_pmode>2"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa")
+	(unspec:OO [(VP_FUNC_FP_UNARY:OO
+		     (match_operand:OO 1 "vsx_register_operand" "wa"))]
+		   VP_FUNC_FP))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_unary_vector_pair (<VP_VEC_MODE>mode, operands,
+			   gen_<vp_func_insn><vp_vmode>2);
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+;; Optimize vector pair negate of absolute value
+(define_insn_and_split "vpair_func_nabs_<vp_pmode>2"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa")
+	(unspec:OO
+	 [(neg:OO
+	   (unspec:OO
+	    [(abs:OO (match_operand:OO 1 "vsx_register_operand" "ww"))]
+	    VP_FUNC_FP))]
+	 VP_FUNC_FP))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_unary_vector_pair (<VP_VEC_MODE>mode, operands,
+			   gen_vsx_nabs<vp_vmode>2);
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+;; Vector pair floating binary operations
+(define_insn_and_split "vpair_func_<vp_func_insn>_<vp_pmode>3"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa")
+	(unspec:OO [(VP_FUNC_FP_BINARY:OO
+		     (match_operand:OO 1 "vsx_register_operand" "wa")
+		     (match_operand:OO 2 "vsx_register_operand" "wa"))]
+		   VP_FUNC_FP))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_binary_vector_pair (<VP_VEC_MODE>mode, operands,
+			    gen_<vp_func_insn><vp_vmode>3);
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+;; Vector pair fused multiply-add floating point operations
+(define_insn_and_split "vpair_func_fma_<vp_pmode>4"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa")
+	(unspec:OO
+	 [(fma:OO
+	   (match_operand:OO 1 "vsx_register_operand" "%wa,wa")
+	   (match_operand:OO 2 "vsx_register_operand" "wa,0")
+	   (match_operand:OO 3 "vsx_register_operand" "0,wa"))]
+	 VP_FUNC_FP))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_fma_vector_pair (<VP_VEC_MODE>mode, operands,
+			 gen_fma<vp_vmode>4);
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+(define_insn_and_split "vpair_func_fms_<vp_pmode>4"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa")
+	(unspec:OO
+	 [(fma:OO
+	   (match_operand:OO 1 "vsx_register_operand" "%wa,wa")
+	   (match_operand:OO 2 "vsx_register_operand" "wa,0")
+	   (unspec:OO
+	    [(neg:OO (match_operand:OO 3 "vsx_register_operand" "0,wa"))]
+	     VP_FUNC_FP))]
+	 VP_FUNC_FP))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_fma_vector_pair (<VP_VEC_MODE>mode, operands,
+			 gen_fms<vp_vmode>4);
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+(define_insn_and_split "vpair_func_nfma_<vp_pmode>4"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa")
+	(unspec:OO
+	 [(neg:OO
+	   (unspec:OO
+	    [(fma:OO
+	      (match_operand:OO 1 "vsx_register_operand" "%wa,wa")
+	      (match_operand:OO 2 "vsx_register_operand" "wa,0")
+	      (match_operand:OO 3 "vsx_register_operand" "0,wa"))]
+	    VP_FUNC_FP))]
+	 VP_FUNC_FP))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_fma_vector_pair (<VP_VEC_MODE>mode, operands,
+			 gen_nfma<vp_vmode>4);
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+(define_insn_and_split "vpair_func_nfms_<vp_pmode>4"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa")
+	(unspec:OO
+	 [(neg:OO
+	   (unspec:OO
+	    [(fma:OO
+	      (match_operand:OO 1 "vsx_register_operand" "%wa,wa")
+	      (match_operand:OO 2 "vsx_register_operand" "wa,0")
+	      (unspec:OO
+	       [(neg:OO (match_operand:OO 3 "vsx_register_operand" "0,wa"))]
+	       VP_FUNC_FP))]
+	   VP_FUNC_FP))]
+	 VP_FUNC_FP))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_fma_vector_pair (<VP_VEC_MODE>mode, operands,
+			 gen_nfms<vp_vmode>4);
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+;; Optimize vector pair (a * b) + c into vector pair fma (a, b, c).
+(define_insn_and_split "*vpair_func_fma_fpcontract_<vp_pmode>4"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa")
+	(unspec:OO
+	 [(plus:OO
+	   (unspec:OO
+	    [(mult:OO
+	      (match_operand:OO 1 "vsx_register_operand" "%wa,wa")
+	      (match_operand:OO 2 "vsx_register_operand" "wa,0"))]
+	    VP_FUNC_FP)
+	   (match_operand:OO 3 "vsx_register_operand" "0,wa"))]
+	 VP_FUNC_FP))]
+  "TARGET_MMA && flag_fp_contract_mode == FP_CONTRACT_FAST"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(unspec:OO
+	 [(fma:OO
+	   (match_dup 1)
+	   (match_dup 2)
+	   (match_dup 3))]
+	 VP_FUNC_FP))]
+{
+}
+  [(set_attr "length" "8")])
+
+;; Optimize vector pair (a * b) - c into vector pair fma (a, b, -c)
+(define_insn_and_split "*vpair_func_fms_fpcontract_<vp_pmode>4"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa")
+	(unspec:OO
+	 [(minus:OO
+	   (unspec:OO
+	    [(mult:OO
+	      (match_operand:OO 1 "vsx_register_operand" "%wa,wa")
+	      (match_operand:OO 2 "vsx_register_operand" "wa,0"))]
+	    VP_FUNC_FP)
+	   (match_operand:OO 3 "vsx_register_operand" "0,wa"))]
+	 VP_FUNC_FP))]
+  "TARGET_MMA && flag_fp_contract_mode == FP_CONTRACT_FAST"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(unspec:OO
+	 [(fma:OO
+	   (match_dup 1)
+	   (match_dup 2)
+	   (unspec:OO
+	    [(neg:OO
+	      (match_dup 3))]
+	    VP_FUNC_FP))]
+	 VP_FUNC_FP))]
+{
+}
+  [(set_attr "length" "8")])
+
+
+;; Optimize vector pair -((a * b) + c) into vector pair -fma (a, b, c).
+(define_insn_and_split "*vpair_func_nfma_fpcontract_<vp_pmode>4"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa")
+	(unspec:OO
+	 [(neg:OO
+	   (unspec:OO
+	    [(plus:OO
+	      (unspec:OO
+	       [(mult:OO
+		 (match_operand:OO 1 "vsx_register_operand" "%wa,wa")
+		 (match_operand:OO 2 "vsx_register_operand" "wa,0"))]
+	       VP_FUNC_FP)
+	      (match_operand:OO 3 "vsx_register_operand" "0,wa"))]
+	    VP_FUNC_FP))]
+	 VP_FUNC_FP))]
+  "TARGET_MMA && flag_fp_contract_mode == FP_CONTRACT_FAST"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(unspec:OO
+	 [(neg:OO
+	   (unspec:OO
+	    [(fma:OO
+	      (match_dup 1)
+	      (match_dup 2)
+	      (match_dup 3))]
+	    VP_FUNC_FP))]
+	 VP_FUNC_FP))]
+{
+}
+  [(set_attr "length" "8")])
+
+;; Optimize vector pair -((a * b) - c) into vector pair -fma (a, b, -c)
+(define_insn_and_split "*vpair_func_nfms_fpcontract_<vp_pmode>4"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa,wa")
+	(unspec:OO
+	 [(neg:OO
+	   (unspec:OO
+	    [(minus:OO
+	      (unspec:OO
+	       [(mult:OO
+		 (match_operand:OO 1 "vsx_register_operand" "%wa,wa")
+		 (match_operand:OO 2 "vsx_register_operand" "wa,0"))]
+	       VP_FUNC_FP)
+	      (match_operand:OO 3 "vsx_register_operand" "0,wa"))]
+	    VP_FUNC_FP))]
+	 VP_FUNC_FP))]
+  "TARGET_MMA && flag_fp_contract_mode == FP_CONTRACT_FAST"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(unspec:OO
+	 [(neg:OO
+	   (unspec:OO
+	    [(fma:OO
+	      (match_dup 1)
+	      (match_dup 2)
+	      (unspec:OO
+	       [(neg:OO
+		 (match_dup 3))]
+	       VP_FUNC_FP))]
+	    VP_FUNC_FP))]
+	 VP_FUNC_FP))]
+{
+}
+  [(set_attr "length" "8")])
+
+\f
+;; Add all elements in a pair of V4SF vectors.
+(define_insn_and_split "vpair_func_reduc_plus_scale_v8sf"
+  [(set (match_operand:SF 0 "vsx_register_operand" "=wa")
+	(unspec:SF [(match_operand:OO 1 "vsx_register_operand" "v")]
+		   UNSPEC_VPAIR_FUNC_REDUCE_PLUS_F32))
+   (clobber (match_scratch:V4SF 2 "=&v"))
+   (clobber (match_scratch:V4SF 3 "=&v"))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(pc)]
+{
+  rtx op0 = operands[0];
+  rtx op1 = operands[1];
+  rtx tmp1 = operands[2];
+  rtx tmp2 = operands[3];
+  unsigned r = reg_or_subregno (op1);
+  rtx op1_hi = gen_rtx_REG (V4SFmode, r);
+  rtx op1_lo = gen_rtx_REG (V4SFmode, r + 1);
+
+  emit_insn (gen_addv4sf3 (tmp1, op1_hi, op1_lo));
+  emit_insn (gen_altivec_vsldoi_v4sf (tmp2, tmp1, tmp1, GEN_INT (8)));
+  emit_insn (gen_addv4sf3 (tmp2, tmp1, tmp2));
+  emit_insn (gen_altivec_vsldoi_v4sf (tmp1, tmp2, tmp2, GEN_INT (4)));
+  emit_insn (gen_addv4sf3 (tmp2, tmp1, tmp2));
+  emit_insn (gen_vsx_xscvspdp_scalar2 (op0, tmp2));
+  DONE;
+}
+  [(set_attr "length" "24")])
+
+;; Add all elements in a pair of V2DF vectors
+(define_insn_and_split "vpair_func_reduc_plus_scale_v4df"
+  [(set (match_operand:DF 0 "vsx_register_operand" "=&wa")
+	(unspec:DF [(match_operand:OO 1 "vsx_register_operand" "wa")]
+		   UNSPEC_VPAIR_FUNC_REDUCE_PLUS_F64))
+   (clobber (match_scratch:DF 2 "=&wa"))
+   (clobber (match_scratch:V2DF 3 "=&wa"))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 3)
+	(plus:V2DF (match_dup 4)
+		   (match_dup 5)))
+   (set (match_dup 2)
+	(vec_select:DF (match_dup 3)
+		       (parallel [(match_dup 6)])))
+   (set (match_dup 0)
+	(plus:DF (match_dup 7)
+		 (match_dup 2)))]
+{
+  unsigned reg1 = reg_or_subregno (operands[1]);
+  unsigned reg3 = reg_or_subregno (operands[3]);
+
+  operands[4] = gen_rtx_REG (V2DFmode, reg1);
+  operands[5] = gen_rtx_REG (V2DFmode, reg1 + 1);
+  operands[6] = GEN_INT (BYTES_BIG_ENDIAN ? 1 : 0);
+  operands[7] = gen_rtx_REG (DFmode, reg3);
+})
+
+\f
+;; Vector pair integer negate support.
+(define_insn_and_split "vpair_func_neg_<vp_pmode>2"
+  [(set (match_operand:OO 0 "altivec_register_operand" "=v")
+	(unspec:OO [(neg:OO
+		     (match_operand:OO 1 "altivec_register_operand" "v"))]
+		   VP_FUNC_INT))
+   (clobber (match_scratch:<VP_VEC_MODE> 2 "=<vp_neg_reg>"))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 2) (match_dup 3))
+   (set (match_dup 4) (minus:<VP_VEC_MODE> (match_dup 2)
+					   (match_dup 5)))
+   (set (match_dup 6) (minus:<VP_VEC_MODE> (match_dup 2)
+					   (match_dup 7)))]
+{
+  unsigned reg0 = reg_or_subregno (operands[0]);
+  unsigned reg1 = reg_or_subregno (operands[1]);
+  machine_mode vmode = <VP_VEC_MODE>mode;
+
+  operands[3] = CONST0_RTX (vmode);
+
+  operands[4] = gen_rtx_REG (vmode, reg0);
+  operands[5] = gen_rtx_REG (vmode, reg1);
+
+  operands[6] = gen_rtx_REG (vmode, reg0 + 1);
+  operands[7] = gen_rtx_REG (vmode, reg1 + 1);
+
+  /* If the vector integer size is 32 or 64 bits, we can use the vneg{w,d}
+     instructions.  */
+  if (vmode == V4SImode)
+    {
+      emit_insn (gen_negv4si2 (operands[4], operands[5]));
+      emit_insn (gen_negv4si2 (operands[6], operands[7]));
+      DONE;
+    }
+  else if (vmode == V2DImode)
+    {
+      emit_insn (gen_negv2di2 (operands[4], operands[5]));
+      emit_insn (gen_negv2di2 (operands[6], operands[7]));
+      DONE;
+    }
+}
+  [(set_attr "length" "8")])
+
+;; Vector pair integer not support.
+(define_insn_and_split "vpair_func_not_<vp_pmode>2"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa")
+	(unspec:OO [(not:OO (match_operand:OO 1 "vsx_register_operand" "wa"))]
+		   VP_FUNC_INT))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_unary_vector_pair (<VP_VEC_MODE>mode, operands,
+			   gen_one_cmpl<vp_vmode>2);
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+;; Vector pair integer binary operations.
+(define_insn_and_split "vpair_func_<vp_func_insn>_<vp_pmode>3"
+  [(set (match_operand:OO 0 "<vp_func_ipredicate>" "=<vp_func_ireg>")
+	(unspec:OO [(VP_FUNC_INT_BINARY:OO
+		     (match_operand:OO 1 "<vp_func_ipredicate>" "<vp_func_ireg>")
+		     (match_operand:OO 2 "<vp_func_ipredicate>" "<vp_func_ireg>"))]
+		   VP_FUNC_INT))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_binary_vector_pair (<VP_VEC_MODE>mode, operands,
+			    gen_<vp_func_insn><vp_vmode>3);
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+;; Optimize vector pair a & ~b
+(define_insn_and_split "*vpair_func_andc_<vp_pmode>"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa")
+	(unspec:OO [(and:OO
+		     (unspec:OO
+		      [(not:OO
+			(match_operand:OO 1 "vsx_register_operand" "wa"))]
+		      VP_FUNC_INT)
+		     (match_operand:OO 2 "vsx_register_operand" "wa"))]
+		   VP_FUNC_INT))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_binary_vector_pair (<VP_VEC_MODE>mode, operands,
+			    gen_andc<vp_vmode>3);
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+;; Optimize vector pair a | ~b
+(define_insn_and_split "*vpair_func_iorc_<vp_pmode>"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa")
+	(unspec:OO [(ior:OO
+		     (unspec:OO
+		      [(not:OO
+			(match_operand:OO 1 "vsx_register_operand" "wa"))]
+		      VP_FUNC_INT)
+		     (match_operand:OO 2 "vsx_register_operand" "wa"))]
+		   VP_FUNC_INT))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_binary_vector_pair (<VP_VEC_MODE>mode, operands,
+			    gen_orc<vp_vmode>3);
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+;; Optiomize vector pair ~(a & b) or ((~a) | (~b))
+(define_insn_and_split "*vpair_func_nand_<vp_pmode>_1"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa")
+	(unspec:OO
+	 [(not:OO
+	   (unspec:OO [(and:OO
+			(match_operand:OO 1 "vsx_register_operand" "wa")
+			(match_operand:OO 2 "vsx_register_operand" "wa"))]
+		      VP_FUNC_INT))]
+	 VP_FUNC_INT))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_binary_vector_pair (<VP_VEC_MODE>mode, operands,
+			    gen_nand<vp_vmode>3);
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+(define_insn_and_split "*vpair_func_nand_<vp_pmode>_2"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa")
+	(unspec:OO
+	 [(ior:OO
+	   (unspec:OO
+	    [(not:OO
+	      (match_operand:OO 1 "vsx_register_operand" "wa"))]
+	    VP_FUNC_INT)
+	   (unspec:OO
+	    [(not:OO
+	      (match_operand:OO 2 "vsx_register_operand" "wa"))]
+	    VP_FUNC_INT))]
+	 VP_FUNC_INT))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_binary_vector_pair (<VP_VEC_MODE>mode, operands,
+			    gen_nand<vp_vmode>3);
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+;; Optiomize vector pair ~(a | b)  or ((~a) & (~b))
+(define_insn_and_split "*vpair_func_nor_<vp_pmode>_1"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa")
+	(unspec:OO
+	 [(not:OO
+	   (unspec:OO [(ior:OO
+			(match_operand:OO 1 "vsx_register_operand" "wa")
+			(match_operand:OO 2 "vsx_register_operand" "wa"))]
+		      VP_FUNC_INT))]
+	 VP_FUNC_INT))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_binary_vector_pair (<VP_VEC_MODE>mode, operands,
+			    gen_nor<vp_vmode>3);
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+(define_insn_and_split "*vpair_func_nor_<vp_pmode>_2"
+  [(set (match_operand:OO 0 "vsx_register_operand" "=wa")
+	(unspec:OO
+	 [(ior:OO
+	   (unspec:OO
+	    [(not:OO (match_operand:OO 1 "vsx_register_operand" "wa"))]
+	    VP_FUNC_INT)
+	   (unspec:OO
+	    [(not:OO (match_operand:OO 2 "vsx_register_operand" "wa"))]
+	    VP_FUNC_INT))]
+	 VP_FUNC_INT))]
+  "TARGET_MMA"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_binary_vector_pair (<VP_VEC_MODE>mode, operands,
+			    gen_nor<vp_vmode>3);
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+;; Add all elements in a pair of V2DI vectors
+(define_insn_and_split "vpair_func_reduc_plus_scale_v4di"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=&r")
+	(unspec:DI [(match_operand:OO 1 "altivec_register_operand" "v")]
+		   UNSPEC_VPAIR_FUNC_REDUCE_PLUS_I64))
+   (clobber (match_scratch:V2DI 2 "=&v"))
+   (clobber (match_scratch:DI 3 "=&r"))]
+  "TARGET_MMA && TARGET_POWERPC64"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 2)
+	(plus:V2DI (match_dup 4)
+		   (match_dup 5)))
+   (set (match_dup 3)
+	(vec_select:DI (match_dup 2)
+		       (parallel [(const_int 0)])))
+   (set (match_dup 0)
+	(vec_select:DI (match_dup 2)
+		       (parallel [(const_int 1)])))
+   (set (match_dup 0)
+	(plus:DI (match_dup 0)
+		 (match_dup 3)))]
+{
+  unsigned reg1 = reg_or_subregno (operands[1]);
+
+  operands[4] = gen_rtx_REG (V2DImode, reg1);
+  operands[5] = gen_rtx_REG (V2DImode, reg1 + 1);
+}
+  [(set_attr "length" "16")])
diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index 1ae589aeb29..2ec3ad0625c 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -15277,6 +15277,7 @@ instructions, but allow the compiler to schedule those calls.
 * NDS32 Built-in Functions::
 * Nvidia PTX Built-in Functions::
 * Basic PowerPC Built-in Functions::
+* PowerPC Vector Pair Built-in Functions Available on ISA 3.1::
 * PowerPC AltiVec/VSX Built-in Functions::
 * PowerPC Hardware Transactional Memory Built-in Functions::
 * PowerPC Atomic Memory Operation Functions::
@@ -21606,6 +21607,170 @@ int vec_any_le (vector unsigned __int128, vector unsigned __int128);
 @end smallexample
 
 
+@node PowerPC Vector Pair Built-in Functions Available on ISA 3.1
+@subsection PowerPC Vector Pair Built-in Functions Available on ISA 3.1
+
+GCC provides functions to speed up processing by using the type
+@code{__vector_pair} to hold two 128-bit vectors on processors that
+support ISA 3.1 (power10).  The @code{__vector_pair} type and the
+vector pair built-in functions require the MMA instruction set
+(@option{-mmma}) to be enabled, which is on by default for
+@option{-mcpu=power10}.
+
+By default, @code{__vector_pair} types are loaded into vectors with a
+single load vector pair instruction.  The processing for the built-in
+function is done as two separate vector instructions on each of the
+two 128-bit vectors stored in the vector pair.  The
+@code{__vector_pair} type is usually stored with a single vector pair
+store instruction.
+
+The following built-in functions are independent on the type of the
+underlying vector:
+
+@smallexample
+__vector_pair __builtin_vpair_zero ();
+@end smallexample
+
+The following built-in functions operate on pairs of
+@code{vector float} values:
+
+@smallexample
+__vector_pair __builtin_vpair_f32_abs (__vector_pair);
+__vector_pair __builtin_vpair_f32_add (__vector_pair, __vector_pair);
+float __builtin_vpair_f32_add_elements (__vector_pair);
+__vector_pair __builtin_vpair_f32_assemble (vector float, vector float);
+vector float __builtin_vpair_f32_extract_vector (__vector_pair, int);
+__vector_pair __builtin_vpair_f32_fma (__vector_pair, __vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_f32_max (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_f32_min (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_f32_mul (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_f32_neg (__vector_pair);
+__vector_pair __builtin_vpair_f32_splat (float);
+__vector_pair __builtin_vpair_f32_sub (__vector_pair, __vector_pair);
+@end smallexample
+
+The following built-in functions operate on pairs of
+@code{vector double} values:
+
+@smallexample
+__vector_pair __builtin_vpair_f64_abs (__vector_pair);
+__vector_pair __builtin_vpair_f64_add (__vector_pair, __vector_pair);
+double __builtin_vpair_f64_add_elements (__vector_pair);
+__vector_pair __builtin_vpair_f64_assemble (vector double, vector double);
+vector double __builtin_vpair_f64_extract_vector (__vector_pair, int);
+__vector_pair __builtin_vpair_f64_fma (__vector_pair, __vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_f64_mul (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_f64_neg (__vector_pair);
+__vector_pair __builtin_vpair_f64_max (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_f64_min (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_f64_splat (double);
+__vector_pair __builtin_vpair_f64_sub (__vector_pair, __vector_pair);
+@end smallexample
+
+The following built-in functions operate on pairs of
+@code{vector long long} or @code{vector unsigned long long} values:
+
+@smallexample
+__vector_pair __builtin_vpair_i64_add (__vector_pair, __vector_pair);
+long long __builtin_vpair_i64_add_elements (__vector_pair);
+__vector_pair __builtin_vpair_i64_and (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_i64_assemble (vector long long,
+                                            vector long long);
+vector long long __builtin_vpair_i64_extract_vector (__vector_pair, int);
+__vector_pair __builtin_vpair_i64_ior (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_i64_max (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_i64_min (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_i64_neg (__vector_pair);
+__vector_pair __builtin_vpair_i64_not (__vector_pair);
+__vector_pair __builtin_vpair_i64_splat (long long);
+__vector_pair __builtin_vpair_i64_sub (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_i64_xor (__vector_pair, __vector_pair);
+
+__vector_pair __builtin_vpair_i64u_assemble (vector unsigned long long,
+                                             vector unsigned long long);
+vector unsigned long long __builtin_vpair_i64u_extract_vector (__vector_pair, int);
+__vector_pair __builtin_vpair_i64u_max (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_i64u_min (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_i64u_splat (unsigned long long);
+@end smallexample
+
+The following built-in functions operate on pairs of
+@code{vector int} or @code{vector unsigned int} values:
+
+@smallexample
+__vector_pair __builtin_vpair_i32_add (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_i32_and (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_i32_assemble (vector int, vector int);
+vector int __builtin_vpair_i32_extract_vector (__vector_pair, int);
+__vector_pair __builtin_vpair_i32_ior (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_i32_neg (__vector_pair);
+__vector_pair __builtin_vpair_i32_not (__vector_pair);
+__vector_pair __builtin_vpair_i32_max (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_i32_min (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_i32_splat (int);
+__vector_pair __builtin_vpair_i32_sub (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_i32_xor (__vector_pair, __vector_pair);
+
+__vector_pair __builtin_vpair_i32u_assemble (vector unsigned int,
+                                             vector unsigned int);
+vector unsigned int __builtin_vpair_i32u_extract_vector (__vector_pair, int);
+__vector_pair __builtin_vpair_i32u_max (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_i32u_min (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_i32u_splat (unsigned int);
+@end smallexample
+
+The following built-in functions operate on pairs of
+@code{vector short} or @code{vector unsigned short} values:
+
+@smallexample
+__vector_pair __builtin_vpair_i16_add (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_i16_and (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_i16_assemble (vector short,
+                                            vector short);
+__vector_pair __builtin_vpair_i16_splat (short);
+vector short __builtin_vpair_i16_extract_vector (__vector_pair, int);
+__vector_pair __builtin_vpair_i16_ior (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_i16_max (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_i16_min (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_i16_neg (__vector_pair);
+__vector_pair __builtin_vpair_i16_not (__vector_pair);
+__vector_pair __builtin_vpair_i16_sub (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_i16_xor (__vector_pair, __vector_pair);
+
+__vector_pair __builtin_vpair_i16u_assemble (vector unsigned short,
+                                             vector unsigned short);
+vector unsigned short __builtin_vpair_i16u_extract_vector (__vector_pair, int);
+__vector_pair __builtin_vpair_i16u_splat (unsigned short);
+__vector_pair __builtin_vpair_i16u_max (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_i16u_min (__vector_pair, __vector_pair);
+@end smallexample
+
+The following built-in functions operate on pairs of
+@code{vector signed char} or @code{vector unsigned char} values:
+
+@smallexample
+__vector_pair __builtin_vpair_i8_add (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_i8_and (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_i8_assemble (vector signed char,
+                                           vector signed char);
+vector signed char __builtin_vpair_i8_extract_vector (__vector_pair, int);
+__vector_pair __builtin_vpair_i8_splat (signed char);
+__vector_pair __builtin_vpair_i8_ior (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_i8_max (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_i8_min (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_i8_neg (__vector_pair);
+__vector_pair __builtin_vpair_i8_not (__vector_pair);
+__vector_pair __builtin_vpair_i8_sub (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_i8_xor (__vector_pair, __vector_pair);
+
+__vector_pair __builtin_vpair_i8u_assemble (vector unsigned char,
+                                            vector unsigned char4);
+vector unsigned char __builtin_vpair_i8u_extract_vector (__vector_pair, int);
+__vector_pair __builtin_vpair_i8_umax (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_i8_umin (__vector_pair, __vector_pair);
+__vector_pair __builtin_vpair_i8u_splat (unsigned char);
+@end smallexample
+
 @node PowerPC Hardware Transactional Memory Built-in Functions
 @subsection PowerPC Hardware Transactional Memory Built-in Functions
 GCC provides two interfaces for accessing the Hardware Transactional
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-1.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-1.c
new file mode 100644
index 00000000000..e74840cebc0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-1.c
@@ -0,0 +1,135 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test whether the vector buitin code generates the expected instructions for
+   vector pairs with 4 double elements.  */
+
+void
+test_add (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 xvadddp, 1 stxvp.  */
+  *dest = __builtin_vpair_f64_add (*x, *y);
+}
+
+void
+test_sub (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 xvsubdp, 1 stxvp.  */
+  *dest = __builtin_vpair_f64_sub (*x, *y);
+}
+
+void
+test_multiply (__vector_pair *dest,
+	       __vector_pair *x,
+	       __vector_pair *y)
+{
+  /* 2 lxvp, 2 xvmuldp, 1 stxvp.  */
+  *dest = __builtin_vpair_f64_mul (*x, *y);
+}
+
+void
+test_min (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 xvmindp, 1 stxvp.  */
+  *dest = __builtin_vpair_f64_min (*x, *y);
+}
+
+void
+test_max (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 xvmaxdp, 1 stxvp.  */
+  *dest = __builtin_vpair_f64_max (*x, *y);
+}
+
+void
+test_negate (__vector_pair *dest,
+	     __vector_pair *x)
+{
+  /* 1 lxvp, 2 xvnegdp, 1 stxvp.  */
+  *dest = __builtin_vpair_f64_neg (*x);
+}
+
+void
+test_abs (__vector_pair *dest,
+	  __vector_pair *x)
+{
+  /* 1 lxvp, 2 xvabsdp, 1 stxvp.  */
+  *dest = __builtin_vpair_f64_abs (*x);
+}
+
+void
+test_negative_abs (__vector_pair *dest,
+		   __vector_pair *x)
+{
+  /* 2 lxvp, 2 xvnabsdp, 1 stxvp.  */
+  __vector_pair ab = __builtin_vpair_f64_abs (*x);
+  *dest = __builtin_vpair_f64_neg (ab);
+}
+
+void
+test_fma (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y,
+	  __vector_pair *z)
+{
+  /* 3 lxvp, 2 xvmadd{a,q}dp, 1 stxvp.  */
+  *dest = __builtin_vpair_f64_fma (*x, *y, *z);
+}
+
+void
+test_fms (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y,
+	  __vector_pair *z)
+{
+  /* 3 lxvp, 2 xvmsub{a,q}dp, 1 stxvp.  */
+  __vector_pair n = __builtin_vpair_f64_neg (*z);
+  *dest = __builtin_vpair_f64_fma (*x, *y, n);
+}
+
+void
+test_nfma (__vector_pair *dest,
+	   __vector_pair *x,
+	   __vector_pair *y,
+	   __vector_pair *z)
+{
+  /* 3 lxvp, 2 xvnmadd{a,q}dp, 1 stxvp.  */
+  __vector_pair w = __builtin_vpair_f64_fma (*x, *y, *z);
+  *dest = __builtin_vpair_f64_neg (w);
+}
+
+void
+test_nfms (__vector_pair *dest,
+	   __vector_pair *x,
+	   __vector_pair *y,
+	   __vector_pair *z)
+{
+  /* 3 lxvp, 2 xvnmsub{a,q}dp, 1 stxvp.  */
+  __vector_pair n = __builtin_vpair_f64_neg (*z);
+  __vector_pair w = __builtin_vpair_f64_fma (*x, *y, n);
+  *dest = __builtin_vpair_f64_neg (w);
+}
+
+/* { dg-final { scan-assembler-times {\mlxvp\M}        25 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M}       12 } } */
+/* { dg-final { scan-assembler-times {\mxvabsdp\M}      2 } } */
+/* { dg-final { scan-assembler-times {\mxvadddp\M}      2 } } */
+/* { dg-final { scan-assembler-times {\mxvmadd.dp\M}    2 } } */
+/* { dg-final { scan-assembler-times {\mxvmaxdp\M}      2 } } */
+/* { dg-final { scan-assembler-times {\mxvmindp\M}      2 } } */
+/* { dg-final { scan-assembler-times {\mxvmsub.dp\M}    2 } } */
+/* { dg-final { scan-assembler-times {\mxvmuldp\M}      2 } } */
+/* { dg-final { scan-assembler-times {\mxvnabsdp\M}     2 } } */
+/* { dg-final { scan-assembler-times {\mxvnegdp\M}      2 } } */
+/* { dg-final { scan-assembler-times {\mxvnmadd.dp\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mxvnmsub.dp\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mxvsubdp\M}      2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-10.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-10.c
new file mode 100644
index 00000000000..df1c4019245
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-10.c
@@ -0,0 +1,86 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test the vector pair built-in functions for creation and extraction of
+   vector pair operations using 32-bit floats.  */
+
+void
+test_f32_splat_0 (__vector_pair *p)
+{
+  /* 2 xxspltib, 1 stxvp.  */
+  *p = __builtin_vpair_f32_splat (0.0f);
+}
+
+void
+test_f32_splat_1 (__vector_pair *p)
+{
+  /* 1 xxspltiw, 1 xxlor, 1 stxvp.  */
+  *p = __builtin_vpair_f32_splat (1.0f);
+}
+
+void
+test_f32_splat_var (__vector_pair *p,
+		    float f)
+{
+  /* 1 xscvdpspn, 1 xxspltw, 1 xxlor, 1 stxvp.  */
+  *p = __builtin_vpair_f32_splat (f);
+}
+
+void
+test_f32_splat_mem (__vector_pair *p,
+		    float *q)
+{
+  /* 1 lxvwsx, 1 xxlor, 1 stxvp.  */
+  *p = __builtin_vpair_f32_splat (*q);
+}
+
+void
+test_f32_assemble (__vector_pair *p,
+		   vector float v1,
+		   vector float v2)
+{
+  /* 2 xxlor, 1 stxvp.  */
+  *p = __builtin_vpair_f32_assemble (v1, v2);
+}
+
+vector float
+test_f32_extract_0_reg (__vector_pair *p)
+{
+  /* 1 lxvp, 1 xxlor.  */
+  __vector_pair vp = *p;
+  __asm__ (" # extract in register %x0" : "+wa" (vp));
+  return __builtin_vpair_f32_extract_vector (vp, 0);
+}
+
+vector float
+test_f32_extract_1_reg (__vector_pair *p)
+{
+  /* 1 lxvp, 1 xxlor.  */
+  __vector_pair vp = *p;
+  __asm__ (" # extract in register %x0" : "+wa" (vp));
+  return __builtin_vpair_f32_extract_vector (vp, 0);
+}
+
+vector float
+test_f32_extract_0_mem (__vector_pair *p)
+{
+  /* 1 lxv.  */
+  return __builtin_vpair_f32_extract_vector (p[1], 0);
+}
+
+vector float
+test_f32_extract_1_mem (__vector_pair *p)
+{
+  /* 1 lxv.  */
+  return __builtin_vpair_f32_extract_vector (p[2], 1);
+}
+
+/* { dg-final { scan-assembler-times {\mlxv\M}       2 } } */
+/* { dg-final { scan-assembler-times {\mlxvp\M}      2 } } */
+/* { dg-final { scan-assembler-times {\mlxvwsx\M}    1 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M}     5 } } */
+/* { dg-final { scan-assembler-times {\mxscvdpspn\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mxxspltib\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  1 } } */
+/* { dg-final { scan-assembler-times {\mxxspltw\M}   1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-11.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-11.c
new file mode 100644
index 00000000000..397d7f60f45
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-11.c
@@ -0,0 +1,84 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test the vector pair built-in functions for creation and extraction of
+   vector pair operations using 64-bit doubles.  */
+
+void
+test_f64_splat_0 (__vector_pair *p)
+{
+  /* 2 xxspltib.  */
+  *p = __builtin_vpair_f64_splat (0.0);
+}
+
+void
+test_f64_splat_1 (__vector_pair *p)
+{
+  /* 1 xxspltidp, 1 xxlor.  */
+  *p = __builtin_vpair_f64_splat (1.0);
+}
+
+void
+test_f64_splat_var (__vector_pair *p,
+		    double d)
+{
+  /* 1 xxpermdi, 1 xxlor.  */
+  *p = __builtin_vpair_f64_splat (d);
+}
+
+void
+test_f64_splat_mem (__vector_pair *p,
+		    double *q)
+{
+  /* 1 lxvdsx, 1 xxlor.  */
+  *p = __builtin_vpair_f64_splat (*q);
+}
+
+void
+test_f64_assemble (__vector_pair *p,
+		   vector double v1,
+		   vector double v2)
+{
+  /* 2 xxlor, 1 stxvp.  */
+  *p = __builtin_vpair_f64_assemble (v1, v2);
+}
+
+vector double
+test_f64_extract_0_reg (__vector_pair *p)
+{
+  /* 1 lxvp, 1 xxlor.  */
+  __vector_pair vp = *p;
+  __asm__ (" # extract in register %x0" : "+wa" (vp));
+  return __builtin_vpair_f64_extract_vector (vp, 0);
+}
+
+vector double
+test_f64_extract_1_reg (__vector_pair *p)
+{
+  /* 1 lxvp, 1 xxlor.  */
+  __vector_pair vp = *p;
+  __asm__ (" # extract in register %x0" : "+wa" (vp));
+  return __builtin_vpair_f64_extract_vector (vp, 0);
+}
+
+vector double
+test_f64_extract_0_mem (__vector_pair *p)
+{
+  /* 1 lxv.  */
+  return __builtin_vpair_f64_extract_vector (p[1], 0);
+}
+
+vector double
+test_f64_extract_1_mem (__vector_pair *p)
+{
+  /* 1 lxv.  */
+  return __builtin_vpair_f64_extract_vector (p[2], 1);
+}
+
+/* { dg-final { scan-assembler-times {\mlxvdsx\M}    1 } } */
+/* { dg-final { scan-assembler-times {\mlxvp\M}      2 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M}     5 } } */
+/* { dg-final { scan-assembler-times {\mxxpermdi\M}  1 } } */
+/* { dg-final { scan-assembler-times {\mxxspltib\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-12.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-12.c
new file mode 100644
index 00000000000..0990dfe28d5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-12.c
@@ -0,0 +1,156 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test the vector pair built-in functions for creation and extraction of
+   vector pair operations using 64-bit integers.  */
+
+void
+test_i64_splat_0 (__vector_pair *p)
+{
+  /* 2 xxspltib, 1 stxvp.  */
+  *p = __builtin_vpair_i64_splat (0);
+}
+
+void
+test_i64_splat_1 (__vector_pair *p)
+{
+  /* 1 xxspltib, 1 vextsb2d, 1 xxlor, 1 stxvp.  */
+  *p = __builtin_vpair_i64_splat (1);
+}
+
+void
+test_i64_splat_var (__vector_pair *p,
+		    long long ll)
+{
+  /* 1 xscvdpspn, 1 xxspltw, 1 xxlor, 1 stxvp.  */
+  *p = __builtin_vpair_i64_splat (ll);
+}
+
+void
+test_i64_splat_mem (__vector_pair *p,
+		    long long *q)
+{
+  /* 1 lxvwsx, 1 xxlor, 1 stxvp.  */
+  *p = __builtin_vpair_i64_splat (*q);
+}
+
+void
+test_i64_assemble (__vector_pair *p,
+		   vector long long v1,
+		   vector long long v2)
+{
+  /* 2 xxlor, 1 stxvp.  */
+  *p = __builtin_vpair_i64_assemble (v1, v2);
+}
+
+vector long long
+test_i64_extract_0_reg (__vector_pair *p)
+{
+  /* 1 lxvp, 1 xxlor.  */
+  __vector_pair vp = *p;
+  __asm__ (" # extract in register %x0" : "+wa" (vp));
+  return __builtin_vpair_i64_extract_vector (vp, 0);
+}
+
+vector long long
+test_i64_extract_1_reg (__vector_pair *p)
+{
+  /* 1 lxvp, 1 xxlor.  */
+  __vector_pair vp = *p;
+  __asm__ (" # extract in register %x0" : "+wa" (vp));
+  return __builtin_vpair_i64_extract_vector (vp, 0);
+}
+
+vector long long
+test_i64_extract_0_mem (__vector_pair *p)
+{
+  /* 1 lxv.  */
+  return __builtin_vpair_i64_extract_vector (p[1], 0);
+}
+
+vector long long
+test_i64_extract_1_mem (__vector_pair *p)
+{
+  /* 1 lxv.  */
+  return __builtin_vpair_i64_extract_vector (p[2], 1);
+}
+
+void
+test_i64u_splat_0 (__vector_pair *p)
+{
+  /* 2 xxspltib, 1 stxvp.  */
+  *p = __builtin_vpair_i64u_splat (0);
+}
+
+void
+test_i64u_splat_1 (__vector_pair *p)
+{
+  /* 1 xxspltib, 1 vextsb2d, 1 xxlor, 1 stxvp.  */
+  *p = __builtin_vpair_i64u_splat (1);
+}
+
+void
+test_i64u_splat_var (__vector_pair *p,
+		     unsigned long long ull)
+{
+  /* 1 xscvdpspn, 1 xxspltw, 1 xxlor, 1 stxvp.  */
+  *p = __builtin_vpair_i64u_splat (ull);
+}
+
+void
+test_i64u_splat_mem (__vector_pair *p,
+		     unsigned long long *q)
+{
+  /* 1 lxvwsx, 1 xxlor, 1 stxvp.  */
+  *p = __builtin_vpair_i64u_splat (*q);
+}
+
+void
+test_i64u_assemble (__vector_pair *p,
+		    vector unsigned long long v1,
+		    vector unsigned long long v2)
+{
+  /* 2 xxlor, 1 stxvp.  */
+  *p = __builtin_vpair_i64u_assemble (v1, v2);
+}
+
+vector unsigned long long
+test_i64u_extract_0_reg (__vector_pair *p)
+{
+  /* 1 lxvp, 1 xxlor.  */
+  __vector_pair vp = *p;
+  __asm__ (" # extract in register %x0" : "+wa" (vp));
+  return __builtin_vpair_i64u_extract_vector (vp, 0);
+}
+
+vector unsigned long long
+test_i64u_extract_1_reg (__vector_pair *p)
+{
+  /* 1 lxvp, 1 xxlor.  */
+  __vector_pair vp = *p;
+  __asm__ (" # extract in register %x0" : "+wa" (vp));
+  return __builtin_vpair_i64u_extract_vector (vp, 0);
+}
+
+vector unsigned long long
+test_i64u_extract_0_mem (__vector_pair *p)
+{
+  /* 1 lxv.  */
+  return __builtin_vpair_i64u_extract_vector (p[1], 0);
+}
+
+vector unsigned long long
+test_i64u_extract_1_mem (__vector_pair *p)
+{
+  /* 1 lxv.  */
+  return __builtin_vpair_i64u_extract_vector (p[2], 1);
+}
+
+/* { dg-final { scan-assembler-times {\mlxv\M}       4 } } */
+/* { dg-final { scan-assembler-times {\mlxvdsx\M}    2 } } */
+/* { dg-final { scan-assembler-times {\mlxvp\M}      4 } } */
+/* { dg-final { scan-assembler-times {\mmtvsrdd\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M}    10 } } */
+/* { dg-final { scan-assembler-times {\mvextsb2d\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltib\M}  6 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-13.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-13.c
new file mode 100644
index 00000000000..8174f6b1cc3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-13.c
@@ -0,0 +1,139 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test the vector pair built-in functions for creation and extraction of
+   vector pair operations using 32-bit integers.  */
+
+void
+test_i32_splat_0 (__vector_pair *p)
+{
+  /* 2 xxspltib, 1 stxvp.  */
+  *p = __builtin_vpair_i32_splat (0);
+}
+
+void
+test_i32_splat_1 (__vector_pair *p)
+{
+  /* 1 vspltisw, 1 xxlor, 1 stxvp.  */
+  *p = __builtin_vpair_i32_splat (1);
+}
+
+void
+test_i32_splat_mem (__vector_pair *p,
+		    int *q)
+{
+  /* 1 lxvwsx, 1 xxlor, 1 stxvp.  */
+  *p = __builtin_vpair_i32_splat (*q);
+}
+
+void
+test_i32_assemble (__vector_pair *p,
+		   vector int v1,
+		   vector int v2)
+{
+  /* 2 xxlor, 1 stxvp.  */
+  *p = __builtin_vpair_i32_assemble (v1, v2);
+}
+
+vector int
+test_i32_extract_0_reg (__vector_pair *p)
+{
+  /* 1 lxvp, 1 xxlor.  */
+  __vector_pair vp = *p;
+  __asm__ (" # extract in register %x0" : "+wa" (vp));
+  return __builtin_vpair_i32_extract_vector (vp, 0);
+}
+
+vector int
+test_i32_extract_1_reg (__vector_pair *p)
+{
+  /* 1 lxvp, 1 xxlor.  */
+  __vector_pair vp = *p;
+  __asm__ (" # extract in register %x0" : "+wa" (vp));
+  return __builtin_vpair_i32_extract_vector (vp, 0);
+}
+
+vector int
+test_i32_extract_0_mem (__vector_pair *p)
+{
+  /* 1 lxv.  */
+  return __builtin_vpair_i32_extract_vector (p[1], 0);
+}
+
+vector int
+test_i32_extract_1_mem (__vector_pair *p)
+{
+  /* 1 lxv.  */
+  return __builtin_vpair_i32_extract_vector (p[2], 1);
+}
+
+void
+test_i32u_splat_0 (__vector_pair *p)
+{
+  /* 2 xxspltib, 1 stxvp.  */
+  *p = __builtin_vpair_i32u_splat (0);
+}
+
+void
+test_i32u_splat_1 (__vector_pair *p)
+{
+  /* 1 vspltisw, 1 xxlor, 1 stxvp.  */
+  *p = __builtin_vpair_i32u_splat (1);
+}
+
+void
+test_i32u_splat_mem (__vector_pair *p,
+		     unsigned int *q)
+{
+  /* 1 lxvwsx, 1 xxlor, 1 stxvp.  */
+  *p = __builtin_vpair_i32u_splat (*q);
+}
+
+void
+test_i32u_assemble (__vector_pair *p,
+		    vector unsigned int v1,
+		    vector unsigned int v2)
+{
+  /* 2 xxlor, 1 stxvp.  */
+  *p = __builtin_vpair_i32u_assemble (v1, v2);
+}
+
+vector unsigned int
+test_i32u_extract_0_reg (__vector_pair *p)
+{
+  /* 1 lxvp, 1 xxlor.  */
+  __vector_pair vp = *p;
+  __asm__ (" # extract in register %x0" : "+wa" (vp));
+  return __builtin_vpair_i32u_extract_vector (vp, 0);
+}
+
+vector unsigned int
+test_i32u_extract_1_reg (__vector_pair *p)
+{
+  /* 1 lxvp, 1 xxlor.  */
+  __vector_pair vp = *p;
+  __asm__ (" # extract in register %x0" : "+wa" (vp));
+  return __builtin_vpair_i32u_extract_vector (vp, 0);
+}
+
+vector unsigned int
+test_i32u_extract_0_mem (__vector_pair *p)
+{
+  /* 1 lxv.  */
+  return __builtin_vpair_i32u_extract_vector (p[1], 0);
+}
+
+vector unsigned int
+test_i32u_extract_1_mem (__vector_pair *p)
+{
+  /* 1 lxv.  */
+  return __builtin_vpair_i32u_extract_vector (p[2], 1);
+}
+
+/* { dg-final { scan-assembler-times {\mlxv\M}      4 } } */
+/* { dg-final { scan-assembler-times {\mlxvp\M}     4 } } */
+/* { dg-final { scan-assembler-times {\mlxvwsx\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M}    8 } } */
+/* { dg-final { scan-assembler-times {\mvspltisw\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltib\M} 4 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-14.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-14.c
new file mode 100644
index 00000000000..fe63df795d6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-14.c
@@ -0,0 +1,141 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test the vector pair built-in functions for creation and extraction of
+   vector pair operations using 16-bit integers.  */
+
+void
+test_i16_splat_0 (__vector_pair *p)
+{
+  /* 2 xxspltib, 1 stxvp.  */
+  *p = __builtin_vpair_i16_splat (0);
+}
+
+void
+test_i16_splat_1 (__vector_pair *p)
+{
+  /* 1 vspltish, 1 xxlor, 1 stxvp.  */
+  *p = __builtin_vpair_i16_splat (1);
+}
+
+void
+test_i16_splat_mem (__vector_pair *p,
+		    short *q)
+{
+  /* 1 lxsihzx, 1 vsplth, 1 xxlor, 1 stxvp.  */
+  *p = __builtin_vpair_i16_splat (*q);
+}
+
+void
+test_i16_assemble (__vector_pair *p,
+		   vector short v1,
+		   vector short v2)
+{
+  /* 2 xxlor, 1 stxvp.  */
+  *p = __builtin_vpair_i16_assemble (v1, v2);
+}
+
+vector short
+test_i16_extract_0_reg (__vector_pair *p)
+{
+  /* 1 lxvp, 1 xxlor.  */
+  __vector_pair vp = *p;
+  __asm__ (" # extract in register %x0" : "+wa" (vp));
+  return __builtin_vpair_i16_extract_vector (vp, 0);
+}
+
+vector short
+test_i16_extract_1_reg (__vector_pair *p)
+{
+  /* 1 lxvp, 1 xxlor.  */
+  __vector_pair vp = *p;
+  __asm__ (" # extract in register %x0" : "+wa" (vp));
+  return __builtin_vpair_i16_extract_vector (vp, 0);
+}
+
+vector short
+test_i16_extract_0_mem (__vector_pair *p)
+{
+  /* 1 lxv.  */
+  return __builtin_vpair_i16_extract_vector (p[1], 0);
+}
+
+vector short
+test_i16_extract_1_mem (__vector_pair *p)
+{
+  /* 1 lxv.  */
+  return __builtin_vpair_i16_extract_vector (p[2], 1);
+}
+
+void
+test_i16u_splat_0 (__vector_pair *p)
+{
+  /* 2 xxspltib, 1 stxvp.  */
+  *p = __builtin_vpair_i16u_splat (0);
+}
+
+void
+test_i16u_splat_1 (__vector_pair *p)
+{
+  /* 1 vspltish, 1 xxlor, 1 stxvp.  */
+  *p = __builtin_vpair_i16u_splat (1);
+}
+
+void
+test_i16u_splat_mem (__vector_pair *p,
+		     unsigned short *q)
+{
+  /* 1 lxsihzx, 1 vsplth, 1 xxlor, 1 stxvp.  */
+  *p = __builtin_vpair_i16u_splat (*q);
+}
+
+void
+test_i16u_assemble (__vector_pair *p,
+		    vector unsigned short v1,
+		    vector unsigned short v2)
+{
+  /* 2 xxlor, 1 stxvp.  */
+  *p = __builtin_vpair_i16u_assemble (v1, v2);
+}
+
+vector unsigned short
+test_i16u_extract_0_reg (__vector_pair *p)
+{
+  /* 1 lxvp, 1 xxlor.  */
+  __vector_pair vp = *p;
+  __asm__ (" # extract in register %x0" : "+wa" (vp));
+  return __builtin_vpair_i16u_extract_vector (vp, 0);
+}
+
+vector unsigned short
+test_i16u_extract_1_reg (__vector_pair *p)
+{
+  /* 1 lxvp, 1 xxlor.  */
+  __vector_pair vp = *p;
+  __asm__ (" # extract in register %x0" : "+wa" (vp));
+  return __builtin_vpair_i16u_extract_vector (vp, 0);
+}
+
+vector unsigned short
+test_i16u_extract_0_mem (__vector_pair *p)
+{
+  /* 1 lxv.  */
+  return __builtin_vpair_i16u_extract_vector (p[1], 0);
+}
+
+vector unsigned short
+test_i16u_extract_1_mem (__vector_pair *p)
+{
+  /* 1 lxv.  */
+  return __builtin_vpair_i16u_extract_vector (p[2], 1);
+}
+
+/* { dg-final { scan-assembler-times {\mlxsihzx\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mlxv\M}       4 } } */
+/* { dg-final { scan-assembler-times {\mlxvp\M}      4 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M}     8 } } */
+/* { dg-final { scan-assembler-times {\mvsplth\M}    2 } } */
+/* { dg-final { scan-assembler-times {\mvspltish\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mxxlor\M}    12 } } */
+/* { dg-final { scan-assembler-times {\mxxspltib\M}  4 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-15.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-15.c
new file mode 100644
index 00000000000..bd494327af6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-15.c
@@ -0,0 +1,139 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test the vector pair built-in functions for creation and extraction of
+   vector pair operations using 8-bit integers.  */
+
+void
+test_i8_splat_0 (__vector_pair *p)
+{
+  /* 2 xxspltib, 1 stxvp.  */
+  *p = __builtin_vpair_i8_splat (0);
+}
+
+void
+test_i8_splat_1 (__vector_pair *p)
+{
+  /* 1 vspltisb, 1 xxlor, 1 stxvp.  */
+  *p = __builtin_vpair_i8_splat (1);
+}
+
+void
+test_i8_splat_mem (__vector_pair *p,
+		   signed char *q)
+{
+  /* 1 lxsibzx, 1 vspltb, 1 xxlor, 1 stxvp.  */
+  *p = __builtin_vpair_i8_splat (*q);
+}
+
+void
+test_i8_assemble (__vector_pair *p,
+		  vector signed char v1,
+		  vector signed char v2)
+{
+  /* 2 xxlor, 1 stxvp.  */
+  *p = __builtin_vpair_i8_assemble (v1, v2);
+}
+
+vector signed char
+test_i8_extract_0_reg (__vector_pair *p)
+{
+  /* 1 lxvp, 1 xxlor.  */
+  __vector_pair vp = *p;
+  __asm__ (" # extract in register %x0" : "+wa" (vp));
+  return __builtin_vpair_i8_extract_vector (vp, 0);
+}
+
+vector signed char
+test_i8_extract_1_reg (__vector_pair *p)
+{
+  /* 1 lxvp, 1 xxlor.  */
+  __vector_pair vp = *p;
+  __asm__ (" # extract in register %x0" : "+wa" (vp));
+  return __builtin_vpair_i8_extract_vector (vp, 0);
+}
+
+vector signed char
+test_i8_extract_0_mem (__vector_pair *p)
+{
+  /* 1 lxv.  */
+  return __builtin_vpair_i8_extract_vector (p[1], 0);
+}
+
+vector signed char
+test_i8_extract_1_mem (__vector_pair *p)
+{
+  /* 1 lxv.  */
+  return __builtin_vpair_i8_extract_vector (p[2], 1);
+}
+
+void
+test_i8u_splat_0 (__vector_pair *p)
+{
+  /* 2 xxspltib, 1 stxvp.  */
+  *p = __builtin_vpair_i8u_splat (0);
+}
+
+void
+test_i8u_splat_1 (__vector_pair *p)
+{
+  /* 1 vspltisb, 1 xxlor, 1 stxvp.  */
+  *p = __builtin_vpair_i8u_splat (1);
+}
+
+void
+test_i8u_splat_mem (__vector_pair *p,
+		    unsigned char *q)
+{
+  /* 1 lxsibzx, 1 vspltb, 1 xxlor, 1 stxvp.  */
+  *p = __builtin_vpair_i8u_splat (*q);
+}
+
+void
+test_i8u_assemble (__vector_pair *p,
+		   vector unsigned char v1,
+		   vector unsigned char v2)
+{
+  /* 2 xxlor, 1 stxvp.  */
+  *p = __builtin_vpair_i8u_assemble (v1, v2);
+}
+
+vector unsigned char
+test_i8u_extract_0_reg (__vector_pair *p)
+{
+  /* 1 lxvp, 1 xxlor.  */
+  __vector_pair vp = *p;
+  __asm__ (" # extract in register %x0" : "+wa" (vp));
+  return __builtin_vpair_i8u_extract_vector (vp, 0);
+}
+
+vector unsigned char
+test_i8u_extract_1_reg (__vector_pair *p)
+{
+  /* 1 lxvp, 1 xxlor.  */
+  __vector_pair vp = *p;
+  __asm__ (" # extract in register %x0" : "+wa" (vp));
+  return __builtin_vpair_i8u_extract_vector (vp, 0);
+}
+
+vector unsigned char
+test_i8u_extract_0_mem (__vector_pair *p)
+{
+  /* 1 lxv.  */
+  return __builtin_vpair_i8u_extract_vector (p[1], 0);
+}
+
+vector unsigned char
+test_i8u_extract_1_mem (__vector_pair *p)
+{
+  /* 1 lxv.  */
+  return __builtin_vpair_i8u_extract_vector (p[2], 1);
+}
+
+/* { dg-final { scan-assembler-times {\mlxsibzx\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mlxv\M}      4 } } */
+/* { dg-final { scan-assembler-times {\mlxvp\M}     4 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M}    8 } } */
+/* { dg-final { scan-assembler-times {\mvspltb\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltib\M} 6 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-16.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-16.c
new file mode 100644
index 00000000000..a8c206c4093
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-16.c
@@ -0,0 +1,45 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test vector pair built-in functions to do a horizontal add of the
+   elements.  */
+
+float
+f32_add_elements (__vector_pair *p)
+{
+  /* 1 lxvp, 1 xvaddsp, 2 vsldoi, 2 xvaddsp, 1 xcvspdp.  */
+  return __builtin_vpair_f32_add_elements (*p);
+}
+
+double
+f64_add_elements (__vector_pair *p)
+{
+  /* 1 lxvp, 1 xvadddp, 1 xxperdi, 1 fadd/xxadddp.  */
+  return __builtin_vpair_f64_add_elements (*p);
+}
+
+long long
+i64_add_elements (__vector_pair *p)
+{
+  /* 1 lxvp, 1vaddudm, 1 mfvsrld, 1 mfvsrd, 1 add.  */
+  return __builtin_vpair_i64_add_elements (*p);
+}
+
+unsigned long long
+i64u_add_elements (__vector_pair *p)
+{
+  /* 1 lxvp, 1vaddudm, 1 mfvsrld, 1 mfvsrd, 1 add.  */
+  return __builtin_vpair_i64u_add_elements (*p);
+}
+
+/* { dg-final { scan-assembler-times {\mfadd\M|\mxsadddp\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mlxvp\M}             4 } } */
+/* { dg-final { scan-assembler-times {\mmfvsrd\M}           2 } } */
+/* { dg-final { scan-assembler-times {\mmfvsrld\M}          2 } } */
+/* { dg-final { scan-assembler-times {\mvaddudm\M}          2 } } */
+/* { dg-final { scan-assembler-times {\mvsldoi\M}           2 } } */
+/* { dg-final { scan-assembler-times {\mxscvspdp\M}         1 } } */
+/* { dg-final { scan-assembler-times {\mxvadddp\M}          1 } } */
+/* { dg-final { scan-assembler-times {\mxvaddsp\M}          3 } } */
+/* { dg-final { scan-assembler-times {\mxxpermdi\M}         1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-2.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-2.c
new file mode 100644
index 00000000000..2facb727053
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-2.c
@@ -0,0 +1,134 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test whether the vector buitin code generates the expected instructions for
+   vector pairs with 8 float elements.  */
+
+void
+test_add (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 xvaddsp, 1 stxvp.  */
+  *dest = __builtin_vpair_f32_add (*x, *y);
+}
+
+void
+test_sub (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 xvsubsp, 1 stxvp.  */
+  *dest = __builtin_vpair_f32_sub (*x, *y);
+}
+
+void
+test_multiply (__vector_pair *dest,
+	       __vector_pair *x,
+	       __vector_pair *y)
+{
+  /* 2 lxvp, 2 xvmulsp, 1 stxvp.  */
+  *dest = __builtin_vpair_f32_mul (*x, *y);
+}
+
+void
+test_max (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 xvmaxsp, 1 stxvp.  */
+  *dest = __builtin_vpair_f32_max (*x, *y);
+}
+
+void
+test_min (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 xvminsp, 1 stxvp.  */
+  *dest = __builtin_vpair_f32_min (*x, *y);
+}
+
+void
+test_negate (__vector_pair *dest,
+	     __vector_pair *x)
+{
+  /* 1 lxvp, 2 xvnegsp, 1 stxvp.  */
+  *dest = __builtin_vpair_f32_neg (*x);
+}
+
+void
+test_abs (__vector_pair *dest,
+	  __vector_pair *x)
+{
+  /* 1 lxvp, 2 xvabssp, 1 stxvp.  */
+  *dest = __builtin_vpair_f32_abs (*x);
+}
+
+void
+test_negative_abs (__vector_pair *dest,
+		   __vector_pair *x)
+{
+  /* 2 lxvp, 2 xvnabssp, 1 stxvp.  */
+  __vector_pair ab = __builtin_vpair_f32_abs (*x);
+  *dest = __builtin_vpair_f32_neg (ab);
+}
+
+void
+test_fma (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y,
+	  __vector_pair *z)
+{
+  /* 3 lxvp, 2 xvmadd{a,q}sp, 1 stxvp.  */
+  *dest = __builtin_vpair_f32_fma (*x, *y, *z);
+}
+
+void
+test_fms (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y,
+	  __vector_pair *z)
+{
+  /* 3 lxvp, 2 xvmsub{a,q}sp, 1 stxvp.  */
+  __vector_pair n = __builtin_vpair_f32_neg (*z);
+  *dest = __builtin_vpair_f32_fma (*x, *y, n);
+}
+
+void
+test_nfma (__vector_pair *dest,
+	   __vector_pair *x,
+	   __vector_pair *y,
+	   __vector_pair *z)
+{
+  /* 3 lxvp, 2 xvnmadd{a,q}sp, 1 stxvp.  */
+  __vector_pair w = __builtin_vpair_f32_fma (*x, *y, *z);
+  *dest = __builtin_vpair_f32_neg (w);
+}
+
+void
+test_nfms (__vector_pair *dest,
+	   __vector_pair *x,
+	   __vector_pair *y,
+	   __vector_pair *z)
+{
+  /* 3 lxvp, 2 xvnmsub{a,q}sp, 1 stxvp.  */
+  __vector_pair n = __builtin_vpair_f32_neg (*z);
+  __vector_pair w = __builtin_vpair_f32_fma (*x, *y, n);
+  *dest = __builtin_vpair_f32_neg (w);
+}
+
+/* { dg-final { scan-assembler-times {\mlxvp\M}       25 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M}      12 } } */
+/* { dg-final { scan-assembler-times {\mxvabssp\M}     2 } } */
+/* { dg-final { scan-assembler-times {\mxvaddsp\M}     2 } } */
+/* { dg-final { scan-assembler-times {\mxvmadd.sp\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mxvmaxsp\M}     2 } } */
+/* { dg-final { scan-assembler-times {\mxvminsp\M}     2 } } */
+/* { dg-final { scan-assembler-times {\mxvmsub.sp\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mxvmulsp\M}     2 } } */
+/* { dg-final { scan-assembler-times {\mxvnabssp\M}    2 } } */
+/* { dg-final { scan-assembler-times {\mxvnegsp\M}     2 } } */
+/* { dg-final { scan-assembler-times {\mxvnmadd.sp\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mxvnmsub.sp\M}  2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-3.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-3.c
new file mode 100644
index 00000000000..65bfc44f85d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-3.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -Ofast" } */
+
+/* Test whether the vector buitin code combines multiply, add/subtract, and
+   negate operations to the appropriate fused multiply-add instruction for
+   vector pairs with 4 double elements.  */
+
+void
+test_fma (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y,
+	  __vector_pair *z)
+{
+  /* 3 ldxvp, 2 xvmadd{a,m}dp, 1 stxvp.  */
+  __vector_pair m = __builtin_vpair_f64_mul (*x, *y);
+  *dest = __builtin_vpair_f64_add (m, *z);
+}
+
+void
+test_fms (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y,
+	  __vector_pair *z)
+{
+  /* 3 ldxvp, 2 xvmsub{a,m}dp, 1 stxvp.  */
+  __vector_pair m = __builtin_vpair_f64_mul (*x, *y);
+  *dest = __builtin_vpair_f64_sub (m, *z);
+}
+
+void
+test_nfma (__vector_pair *dest,
+	   __vector_pair *x,
+	   __vector_pair *y,
+	   __vector_pair *z)
+{
+  /* 3 ldxvp, 2 xvnmadd{a,m}dp, 1 stxvp.  */
+  __vector_pair m = __builtin_vpair_f64_mul (*x, *y);
+  __vector_pair w = __builtin_vpair_f64_add (m, *z);
+  *dest = __builtin_vpair_f64_neg (w);
+}
+
+void
+test_nfms (__vector_pair *dest,
+	   __vector_pair *x,
+	   __vector_pair *y,
+	   __vector_pair *z)
+{
+  /* 3 ldxvp, 2 xvnmadd{a,m}dp, 1 stxvp.  */
+  __vector_pair m = __builtin_vpair_f64_mul (*x, *y);
+  __vector_pair w = __builtin_vpair_f64_sub (m, *z);
+  *dest = __builtin_vpair_f64_neg (w);
+}
+
+/* { dg-final { scan-assembler-times {\mlxvp\M}        12 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M}        4 } } */
+/* { dg-final { scan-assembler-times {\mxvmadd.dp\M}    2 } } */
+/* { dg-final { scan-assembler-times {\mxvmsub.dp\M}    2 } } */
+/* { dg-final { scan-assembler-times {\mxvnmadd.dp\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mxvnmsub.dp\M}   2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-4.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-4.c
new file mode 100644
index 00000000000..b62871be1fd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-4.c
@@ -0,0 +1,60 @@
+/* { dgv64-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -Ofast" } */
+
+/* Test whether the vector buitin code combines multiply, add/subtract, and
+   negate operations to the appropriate fused multiply-add instruction for
+   vector pairs with 8 float elements.  */
+
+void
+test_fma (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y,
+	  __vector_pair *z)
+{
+  /* 3 ldxvp, 2 xvmadd{a,m}sp, 1 stxvp.  */
+  __vector_pair m = __builtin_vpair_f32_mul (*x, *y);
+  *dest = __builtin_vpair_f32_add (m, *z);
+}
+
+void
+test_fms (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y,
+	  __vector_pair *z)
+{
+  /* 3 ldxvp, 2 xvmsub{a,m}sp, 1 stxvp.  */
+  __vector_pair m = __builtin_vpair_f32_mul (*x, *y);
+  *dest = __builtin_vpair_f32_sub (m, *z);
+}
+
+void
+test_nfma (__vector_pair *dest,
+	   __vector_pair *x,
+	   __vector_pair *y,
+	   __vector_pair *z)
+{
+  /* 3 ldxvp, 2 xvnmadd{a,m}sp, 1 stxvp.  */
+  __vector_pair m = __builtin_vpair_f32_mul (*x, *y);
+  __vector_pair w = __builtin_vpair_f32_add (m, *z);
+  *dest = __builtin_vpair_f32_neg (w);
+}
+
+void
+test_nfms (__vector_pair *dest,
+	   __vector_pair *x,
+	   __vector_pair *y,
+	   __vector_pair *z)
+{
+  /* 3 ldxvp, 2 xvnmadd{a,m}sp, 1 stxvp.  */
+  __vector_pair m = __builtin_vpair_f32_mul (*x, *y);
+  __vector_pair w = __builtin_vpair_f32_sub (m, *z);
+  *dest = __builtin_vpair_f32_neg (w);
+}
+
+/* { dg-final { scan-assembler-times {\mlxvp\M}        12 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M}        4 } } */
+/* { dg-final { scan-assembler-times {\mxvmadd.sp\M}    2 } } */
+/* { dg-final { scan-assembler-times {\mxvmsub.sp\M}    2 } } */
+/* { dg-final { scan-assembler-times {\mxvnmadd.sp\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mxvnmsub.sp\M}   2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-5.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-5.c
new file mode 100644
index 00000000000..924919cae1b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-5.c
@@ -0,0 +1,193 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test whether the vector buitin code generates the expected instructions for
+   vector pairs with 4 64-bit integer elements.  */
+
+void
+test_add (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 vaddudm, 1 stxvp.  */
+  *dest = __builtin_vpair_i64_add (*x, *y);
+}
+
+void
+test_sub (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 vaddudm, 1 stxvp.  */
+  *dest = __builtin_vpair_i64_sub (*x, *y);
+}
+
+void
+test_and (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxland, 1 stxvp.  */
+  *dest = __builtin_vpair_i64_and (*x, *y);
+}
+
+void
+test_or (__vector_pair *dest,
+	 __vector_pair *x,
+	 __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlor, 1 stxvp.  */
+  *dest = __builtin_vpair_i64_ior (*x, *y);
+}
+
+void
+test_xor (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlxor, 1 stxvp.  */
+  *dest = __builtin_vpair_i64_xor (*x, *y);
+}
+
+void
+test_smax (__vector_pair *dest,
+	   __vector_pair *x,
+	   __vector_pair *y)
+{
+  /* 2 lxvp, 2 vmaxsd, 1 stxvp.  */
+  *dest = __builtin_vpair_i64_max (*x, *y);
+}
+
+void
+test_smin (__vector_pair *dest,
+	   __vector_pair *x,
+	   __vector_pair *y)
+{
+  /* 2 lxvp, 2 vminsd, 1 stxvp.  */
+  *dest = __builtin_vpair_i64_min (*x, *y);
+}
+
+void
+test_umax (__vector_pair *dest,
+	   __vector_pair *x,
+	   __vector_pair *y)
+{
+  /* 2 lxvp, 2 vmaxud, 1 stxvp.  */
+  *dest = __builtin_vpair_i64u_max (*x, *y);
+}
+
+void
+test_umin (__vector_pair *dest,
+	   __vector_pair *x,
+	   __vector_pair *y)
+{
+  /* 2 lxvp, 2 vminud, 1 stxvp.  */
+  *dest = __builtin_vpair_i64u_min (*x, *y);
+}
+
+void
+test_negate (__vector_pair *dest,
+	     __vector_pair *x)
+{
+  /* 2 lxvp, 2 vnegd, 1 stxvp.  */
+  *dest = __builtin_vpair_i64_neg (*x);
+}
+
+void
+test_not (__vector_pair *dest,
+	  __vector_pair *x)
+{
+  /* 2 lxvp, 2 xxlnor, 1 stxvp.  */
+  *dest = __builtin_vpair_i64_not (*x);
+}
+
+/* Combination of logical operators.  */
+
+void
+test_andc_1 (__vector_pair *dest,
+	     __vector_pair *x,
+	     __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlandc, 1 stxvp.  */
+  __vector_pair n = __builtin_vpair_i64_not (*y);
+  *dest = __builtin_vpair_i64_and (*x, n);
+}
+
+void
+test_andc_2 (__vector_pair *dest,
+	     __vector_pair *x,
+	     __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlandc, 1 stxvp.  */
+  __vector_pair n = __builtin_vpair_i64_not (*x);
+  *dest = __builtin_vpair_i64_and (n, *y);
+}
+
+void
+test_orc_1 (__vector_pair *dest,
+	    __vector_pair *x,
+	    __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlorc, 1 stxvp.  */
+  __vector_pair n = __builtin_vpair_i64_not (*y);
+  *dest = __builtin_vpair_i64_ior (*x, n);
+}
+
+void
+test_orc_2 (__vector_pair *dest,
+	    __vector_pair *x,
+	    __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlorc, 1 stxvp.  */
+  __vector_pair n = __builtin_vpair_i64_not (*x);
+  *dest = __builtin_vpair_i64_ior (n, *y);
+}
+
+void
+test_nand_1 (__vector_pair *dest,
+	     __vector_pair *x,
+	     __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlnand, 1 stxvp.  */
+  __vector_pair a = __builtin_vpair_i64_and (*x, *y);
+  *dest = __builtin_vpair_i64_not (a);
+}
+
+void
+test_nand_2 (__vector_pair *dest,
+	     __vector_pair *x,
+	     __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlnand, 1 stxvp.  */
+  __vector_pair nx = __builtin_vpair_i64_not (*x);
+  __vector_pair ny = __builtin_vpair_i64_not (*y);
+  *dest = __builtin_vpair_i64_ior (nx, ny);
+}
+
+void
+test_nor (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlnor, 1 stxvp.  */
+  __vector_pair a = __builtin_vpair_i64_ior (*x, *y);
+  *dest = __builtin_vpair_i64_not (a);
+}
+
+/* { dg-final { scan-assembler-times {\mlxvp\M}    34 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M}   18 } } */
+/* { dg-final { scan-assembler-times {\mvaddudm\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mvmaxsd\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mvmaxud\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mvminsd\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mvminud\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mvnegd\M}    2 } } */
+/* { dg-final { scan-assembler-times {\mvsubudm\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mxxland\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mxxlandc\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mxxlnand\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mxxlnor\M}   4 } } */
+/* { dg-final { scan-assembler-times {\mxxlor\M}    2 } } */
+/* { dg-final { scan-assembler-times {\mxxlorc\M}   4 } } */
+/* { dg-final { scan-assembler-times {\mxxlxor\M}   2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-6.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-6.c
new file mode 100644
index 00000000000..f22949c1f95
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-6.c
@@ -0,0 +1,193 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test whether the vector buitin code generates the expected instructions for
+   vector pairs with 8 32-bit integer elements.  */
+
+void
+test_add (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 vadduwm, 1 stxvp.  */
+  *dest = __builtin_vpair_i32_add (*x, *y);
+}
+
+void
+test_sub (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 vsubuwm, 1 stxvp.  */
+  *dest = __builtin_vpair_i32_sub (*x, *y);
+}
+
+void
+test_and (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxland, 1 stxvp.  */
+  *dest = __builtin_vpair_i32_and (*x, *y);
+}
+
+void
+test_or (__vector_pair *dest,
+	 __vector_pair *x,
+	 __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlor, 1 stxvp.  */
+  *dest = __builtin_vpair_i32_ior (*x, *y);
+}
+
+void
+test_xor (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlxor, 1 stxvp.  */
+  *dest = __builtin_vpair_i32_xor (*x, *y);
+}
+
+void
+test_smax (__vector_pair *dest,
+	   __vector_pair *x,
+	   __vector_pair *y)
+{
+  /* 2 lxvp, 2 vmaxsw, 1 stxvp.  */
+  *dest = __builtin_vpair_i32_max (*x, *y);
+}
+
+void
+test_smin (__vector_pair *dest,
+	   __vector_pair *x,
+	   __vector_pair *y)
+{
+  /* 2 lxvp, 2 vminsw, 1 stxvp.  */
+  *dest = __builtin_vpair_i32_min (*x, *y);
+}
+
+void
+test_umax (__vector_pair *dest,
+	   __vector_pair *x,
+	   __vector_pair *y)
+{
+  /* 2 lxvp, 2 vmaxuw, 1 stxvp.  */
+  *dest = __builtin_vpair_i32u_max (*x, *y);
+}
+
+void
+test_umin (__vector_pair *dest,
+	   __vector_pair *x,
+	   __vector_pair *y)
+{
+  /* 2 lxvp, 2 vminuw, 1 stxvp.  */
+  *dest = __builtin_vpair_i32u_min (*x, *y);
+}
+
+void
+test_negate (__vector_pair *dest,
+	     __vector_pair *x)
+{
+  /* 2 lxvp, 2 vnegw, 1 stxvp.  */
+  *dest = __builtin_vpair_i32_neg (*x);
+}
+
+void
+test_not (__vector_pair *dest,
+	  __vector_pair *x)
+{
+  /* 2 lxvp, 2 xxlnor, 1 stxvp.  */
+  *dest = __builtin_vpair_i32_not (*x);
+}
+
+/* Combination of logical operators.  */
+
+void
+test_andc_1 (__vector_pair *dest,
+	     __vector_pair *x,
+	     __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlandc, 1 stxvp.  */
+  __vector_pair n = __builtin_vpair_i32_not (*y);
+  *dest = __builtin_vpair_i32_and (*x, n);
+}
+
+void
+test_andc_2 (__vector_pair *dest,
+	     __vector_pair *x,
+	     __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlandc, 1 stxvp.  */
+  __vector_pair n = __builtin_vpair_i32_not (*x);
+  *dest = __builtin_vpair_i32_and (n, *y);
+}
+
+void
+test_orc_1 (__vector_pair *dest,
+	    __vector_pair *x,
+	    __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlorc, 1 stxvp.  */
+  __vector_pair n = __builtin_vpair_i32_not (*y);
+  *dest = __builtin_vpair_i32_ior (*x, n);
+}
+
+void
+test_orc_2 (__vector_pair *dest,
+	    __vector_pair *x,
+	    __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlorc, 1 stxvp.  */
+  __vector_pair n = __builtin_vpair_i32_not (*x);
+  *dest = __builtin_vpair_i32_ior (n, *y);
+}
+
+void
+test_nand_1 (__vector_pair *dest,
+	     __vector_pair *x,
+	     __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlnand, 1 stxvp.  */
+  __vector_pair a = __builtin_vpair_i32_and (*x, *y);
+  *dest = __builtin_vpair_i32_not (a);
+}
+
+void
+test_nand_2 (__vector_pair *dest,
+	     __vector_pair *x,
+	     __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlnand, 1 stxvp.  */
+  __vector_pair nx = __builtin_vpair_i32_not (*x);
+  __vector_pair ny = __builtin_vpair_i32_not (*y);
+  *dest = __builtin_vpair_i32_ior (nx, ny);
+}
+
+void
+test_nor (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlnor, 1 stxvp.  */
+  __vector_pair a = __builtin_vpair_i32_ior (*x, *y);
+  *dest = __builtin_vpair_i32_not (a);
+}
+
+/* { dg-final { scan-assembler-times {\mlxvp\M}    34 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M}   18 } } */
+/* { dg-final { scan-assembler-times {\mvadduwm\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mvmaxsw\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mvmaxuw\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mvminsw\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mvminuw\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mvnegw\M}    2 } } */
+/* { dg-final { scan-assembler-times {\mvsubuwm\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mxxland\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mxxlandc\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mxxlnand\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mxxlnor\M}   4 } } */
+/* { dg-final { scan-assembler-times {\mxxlor\M}    2 } } */
+/* { dg-final { scan-assembler-times {\mxxlorc\M}   4 } } */
+/* { dg-final { scan-assembler-times {\mxxlxor\M}   2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-7.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-7.c
new file mode 100644
index 00000000000..71452f59284
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-7.c
@@ -0,0 +1,193 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test whether the vector buitin code generates the expected instructions for
+   vector pairs with 16 16-bit integer elements.  */
+
+void
+test_add (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 vadduhm, 1 stxvp.  */
+  *dest = __builtin_vpair_i16_add (*x, *y);
+}
+
+void
+test_sub (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 vsubuhm, 1 stxvp.  */
+  *dest = __builtin_vpair_i16_sub (*x, *y);
+}
+
+void
+test_and (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxland, 1 stxvp.  */
+  *dest = __builtin_vpair_i16_and (*x, *y);
+}
+
+void
+test_or (__vector_pair *dest,
+	 __vector_pair *x,
+	 __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlor, 1 stxvp.  */
+  *dest = __builtin_vpair_i16_ior (*x, *y);
+}
+
+void
+test_xor (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlxor, 1 stxvp.  */
+  *dest = __builtin_vpair_i16_xor (*x, *y);
+}
+
+void
+test_smax (__vector_pair *dest,
+	   __vector_pair *x,
+	   __vector_pair *y)
+{
+  /* 2 lxvp, 2 vmaxsh, 1 stxvp.  */
+  *dest = __builtin_vpair_i16_max (*x, *y);
+}
+
+void
+test_smin (__vector_pair *dest,
+	   __vector_pair *x,
+	   __vector_pair *y)
+{
+  /* 2 lxvp, 2 vminsh, 1 stxvp.  */
+  *dest = __builtin_vpair_i16_min (*x, *y);
+}
+
+void
+test_umax (__vector_pair *dest,
+	   __vector_pair *x,
+	   __vector_pair *y)
+{
+  /* 2 lxvp, 2 vmaxuh, 1 stxvp.  */
+  *dest = __builtin_vpair_i16u_max (*x, *y);
+}
+
+void
+test_umin (__vector_pair *dest,
+	   __vector_pair *x,
+	   __vector_pair *y)
+{
+  /* 2 lxvp, 2 vminuh, 1 stxvp.  */
+  *dest = __builtin_vpair_i16u_min (*x, *y);
+}
+
+void
+test_negate (__vector_pair *dest,
+	     __vector_pair *x)
+{
+  /* 2 lxvp, 1 xxspltib, 2 vsubuhm, 1 stxvp.  */
+  *dest = __builtin_vpair_i16_neg (*x);
+}
+
+void
+test_not (__vector_pair *dest,
+	  __vector_pair *x)
+{
+  /* 2 lxvp, 2 xxlnor, 1 stxvp.  */
+  *dest = __builtin_vpair_i16_not (*x);
+}
+
+/* Combination of logical operators.  */
+
+void
+test_andc_1 (__vector_pair *dest,
+	     __vector_pair *x,
+	     __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlandc, 1 stxvp.  */
+  __vector_pair n = __builtin_vpair_i16_not (*y);
+  *dest = __builtin_vpair_i16_and (*x, n);
+}
+
+void
+test_andc_2 (__vector_pair *dest,
+	     __vector_pair *x,
+	     __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlandc, 1 stxvp.  */
+  __vector_pair n = __builtin_vpair_i16_not (*x);
+  *dest = __builtin_vpair_i16_and (n, *y);
+}
+
+void
+test_orc_1 (__vector_pair *dest,
+	    __vector_pair *x,
+	    __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlorc, 1 stxvp.  */
+  __vector_pair n = __builtin_vpair_i16_not (*y);
+  *dest = __builtin_vpair_i16_ior (*x, n);
+}
+
+void
+test_orc_2 (__vector_pair *dest,
+	    __vector_pair *x,
+	    __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlorc, 1 stxvp.  */
+  __vector_pair n = __builtin_vpair_i16_not (*x);
+  *dest = __builtin_vpair_i16_ior (n, *y);
+}
+
+void
+test_nand_1 (__vector_pair *dest,
+	     __vector_pair *x,
+	     __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlnand, 1 stxvp.  */
+  __vector_pair a = __builtin_vpair_i16_and (*x, *y);
+  *dest = __builtin_vpair_i16_not (a);
+}
+
+void
+test_nand_2 (__vector_pair *dest,
+	     __vector_pair *x,
+	     __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlnand, 1 stxvp.  */
+  __vector_pair nx = __builtin_vpair_i16_not (*x);
+  __vector_pair ny = __builtin_vpair_i16_not (*y);
+  *dest = __builtin_vpair_i16_ior (nx, ny);
+}
+
+void
+test_nor (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlnor, 1 stxvp.  */
+  __vector_pair a = __builtin_vpair_i16_ior (*x, *y);
+  *dest = __builtin_vpair_i16_not (a);
+}
+
+/* { dg-final { scan-assembler-times {\mlxvp\M}     34 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M}    18 } } */
+/* { dg-final { scan-assembler-times {\mvadduhm\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mvmaxsh\M}    2 } } */
+/* { dg-final { scan-assembler-times {\mvmaxuh\M}    2 } } */
+/* { dg-final { scan-assembler-times {\mvminsh\M}    2 } } */
+/* { dg-final { scan-assembler-times {\mvminuh\M}    2 } } */
+/* { dg-final { scan-assembler-times {\mvsubuhm\M}   4 } } */
+/* { dg-final { scan-assembler-times {\mxxland\M}    2 } } */
+/* { dg-final { scan-assembler-times {\mxxlandc\M}   4 } } */
+/* { dg-final { scan-assembler-times {\mxxlnand\M}   4 } } */
+/* { dg-final { scan-assembler-times {\mxxlnor\M}    4 } } */
+/* { dg-final { scan-assembler-times {\mxxlor\M}     2 } } */
+/* { dg-final { scan-assembler-times {\mxxlorc\M}    4 } } */
+/* { dg-final { scan-assembler-times {\mxxlxor\M}    2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltib\M}  1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-8.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-8.c
new file mode 100644
index 00000000000..8db9056d4cc
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-8.c
@@ -0,0 +1,194 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test whether the vector buitin code generates the expected instructions for
+   vector pairs with 32 8-bit integer elements.  */
+
+
+void
+test_add (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 vaddubm, 1 stxvp.  */
+  *dest = __builtin_vpair_i8_add (*x, *y);
+}
+
+void
+test_sub (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 vsububm, 1 stxvp.  */
+  *dest = __builtin_vpair_i8_sub (*x, *y);
+}
+
+void
+test_and (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxland, 1 stxvp.  */
+  *dest = __builtin_vpair_i8_and (*x, *y);
+}
+
+void
+test_or (__vector_pair *dest,
+	 __vector_pair *x,
+	 __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlor, 1 stxvp.  */
+  *dest = __builtin_vpair_i8_ior (*x, *y);
+}
+
+void
+test_xor (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlxor, 1 stxvp.  */
+  *dest = __builtin_vpair_i8_xor (*x, *y);
+}
+
+void
+test_smax (__vector_pair *dest,
+	   __vector_pair *x,
+	   __vector_pair *y)
+{
+  /* 2 lxvp, 2 vmaxsb, 1 stxvp.  */
+  *dest = __builtin_vpair_i8_max (*x, *y);
+}
+
+void
+test_smin (__vector_pair *dest,
+	   __vector_pair *x,
+	   __vector_pair *y)
+{
+  /* 2 lxvp, 2 vminsb, 1 stxvp.  */
+  *dest = __builtin_vpair_i8_min (*x, *y);
+}
+
+void
+test_umax (__vector_pair *dest,
+	   __vector_pair *x,
+	   __vector_pair *y)
+{
+  /* 2 lxvp, 2 vmaxub, 1 stxvp.  */
+  *dest = __builtin_vpair_i8u_max (*x, *y);
+}
+
+void
+test_umin (__vector_pair *dest,
+	   __vector_pair *x,
+	   __vector_pair *y)
+{
+  /* 2 lxvp, 2 vminub, 1 stxvp.  */
+  *dest = __builtin_vpair_i8u_min (*x, *y);
+}
+
+void
+test_negate (__vector_pair *dest,
+	     __vector_pair *x)
+{
+  /* 2 lxvp, 1 xxspltib, 2 vsububm, 1 stxvp.  */
+  *dest = __builtin_vpair_i8_neg (*x);
+}
+
+void
+test_not (__vector_pair *dest,
+	  __vector_pair *x)
+{
+  /* 2 lxvp, 2 xxlnor, 1 stxvp.  */
+  *dest = __builtin_vpair_i8_not (*x);
+}
+
+/* Combination of logical operators.  */
+
+void
+test_andc_1 (__vector_pair *dest,
+	     __vector_pair *x,
+	     __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlandc, 1 stxvp.  */
+  __vector_pair n = __builtin_vpair_i8_not (*y);
+  *dest = __builtin_vpair_i8_and (*x, n);
+}
+
+void
+test_andc_2 (__vector_pair *dest,
+	     __vector_pair *x,
+	     __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlandc, 1 stxvp.  */
+  __vector_pair n = __builtin_vpair_i8_not (*x);
+  *dest = __builtin_vpair_i8_and (n, *y);
+}
+
+void
+test_orc_1 (__vector_pair *dest,
+	    __vector_pair *x,
+	    __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlorc, 1 stxvp.  */
+  __vector_pair n = __builtin_vpair_i8_not (*y);
+  *dest = __builtin_vpair_i8_ior (*x, n);
+}
+
+void
+test_orc_2 (__vector_pair *dest,
+	    __vector_pair *x,
+	    __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlorc, 1 stxvp.  */
+  __vector_pair n = __builtin_vpair_i8_not (*x);
+  *dest = __builtin_vpair_i8_ior (n, *y);
+}
+
+void
+test_nand_1 (__vector_pair *dest,
+	     __vector_pair *x,
+	     __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlnand, 1 stxvp.  */
+  __vector_pair a = __builtin_vpair_i8_and (*x, *y);
+  *dest = __builtin_vpair_i8_not (a);
+}
+
+void
+test_nand_2 (__vector_pair *dest,
+	     __vector_pair *x,
+	     __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlnand, 1 stxvp.  */
+  __vector_pair nx = __builtin_vpair_i8_not (*x);
+  __vector_pair ny = __builtin_vpair_i8_not (*y);
+  *dest = __builtin_vpair_i8_ior (nx, ny);
+}
+
+void
+test_nor (__vector_pair *dest,
+	  __vector_pair *x,
+	  __vector_pair *y)
+{
+  /* 2 lxvp, 2 xxlnor, 1 stxvp.  */
+  __vector_pair a = __builtin_vpair_i8_ior (*x, *y);
+  *dest = __builtin_vpair_i8_not (a);
+}
+
+/* { dg-final { scan-assembler-times {\mlxvp\M}     34 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M}    18 } } */
+/* { dg-final { scan-assembler-times {\mvaddubm\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mvmaxsb\M}    2 } } */
+/* { dg-final { scan-assembler-times {\mvmaxub\M}    2 } } */
+/* { dg-final { scan-assembler-times {\mvminsb\M}    2 } } */
+/* { dg-final { scan-assembler-times {\mvminub\M}    2 } } */
+/* { dg-final { scan-assembler-times {\mvsububm\M}   4 } } */
+/* { dg-final { scan-assembler-times {\mxxland\M}    2 } } */
+/* { dg-final { scan-assembler-times {\mxxlandc\M}   4 } } */
+/* { dg-final { scan-assembler-times {\mxxlnand\M}   4 } } */
+/* { dg-final { scan-assembler-times {\mxxlnor\M}    4 } } */
+/* { dg-final { scan-assembler-times {\mxxlor\M}     2 } } */
+/* { dg-final { scan-assembler-times {\mxxlorc\M}    4 } } */
+/* { dg-final { scan-assembler-times {\mxxlxor\M}    2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltib\M}  1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-pair-9.c b/gcc/testsuite/gcc.target/powerpc/vector-pair-9.c
new file mode 100644
index 00000000000..95504a5afd0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-pair-9.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+void
+test_zero (__vector_pair *p)
+{
+  /* 2 xxspltib.  */
+  *p = __builtin_vpair_zero ();
+}
+
+/* { dg-final { scan-assembler-times {\mstxvp\M}    1 } } */
+/* { dg-final { scan-assembler-times {\mxxspltib\M} 2 } } */

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2023-11-28  6:06 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-28  6:06 [gcc(refs/users/meissner/heads/work148-vpair)] <intro for the vector pair built-in patches> Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).