[gcc(refs/users/meissner/heads/work134-vsize)] Add support for -mvector-pair.

public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed

* [gcc(refs/users/meissner/heads/work134-vsize)] Add support for -mvector-pair.
@ 2023-09-21 18:55 Michael Meissner
  0 siblings, 0 replies; only message in thread
From: Michael Meissner @ 2023-09-21 18:55 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:7af190bd4a6b26b6a18e9901179e0f3bc32e0357

commit 7af190bd4a6b26b6a18e9901179e0f3bc32e0357
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Thu Sep 21 14:54:26 2023 -0400

    Add support for -mvector-pair.
    
    2023-09-21  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/predicates.md (const_0_to_31_operand): New predicate.
            * config/rs6000/rs6000-c.cc (rs6000_cpu_cpp_builtins): Define
            __VECTOR_PAIR__ if -mvector-pair was used.
            * config/rs6000/rs6000-protos.h (vector_pair_to_vector_mode): New
            declaration.
            (rs6000_adjust_for_vector_pair): Likewise.
            (split_unary_vector_pair): Likewise.
            (split_binary_vector_pair): Likewise.
            (split_fma_vector_pair): Likewise.
            * config/rs6000/rs6000.cc (rs6000_hard_regno_mode_ok_uncached): Add
            support for 32-byte vector types created with -mvector-pair.
            (rs6000_modes_tieable_p): Make all 32-byte vectors tie with other
            32-byte vectors.
            (rs6000_debug_reg_global): If -mdebug=reg, print whether -mvector-pair
            was enabled.
            (rs6000_init_hard_regno_mode_ok): Add support for 32-byte vectors.
            (rs6000_option_override_internal): Add checking for -mvector-pair.
            (rs6000_expand_vector_extract): Add support for 32-byte vectors.
            (reg_offset_addressing_ok_p): Likewise.
            (rs6000_emit_move): Likewise.
            (rs6000_preferred_reload_class): Likewise.
            (vector_pair_to_vector_mode): New vector pair helper function.
            (rs6000_adjust_for_vector_pair): Likewise.
            (rs6000_split_vpair_constan): Likewise.
            (split_unary_vector_pair): Likewise.
            (split_binary_vector_pair): Likewise.
            (split_fma_vector_pair): Likewise.
            (rs6000_split_multireg_move): Add support for 32-byte vectors.
            * config/rs6000/rs6000.h (VECTOR_PAIR_MODE): New macro.
            * config/rs6000/rs6000.md (wd attribute): Add 32-byte vector modes.
            (RELOAD): Likewise.
            (toplevel): Include vector-pair.md.
            * config/rs6000/rs6000.opt (-mvector-pair): New option.
            * config/rs6000/t-rs6000 (MD_INCLUDES): Add vector-pair.md.
            * config/rs6000/vector-pair.md: New file.
            * config/rs6000/vector.md (VEC_base): Add 32-byte vector modes.
            * config/rs6000/vsx.md (VSX_EXTRACT_PREDICATE): Likewise.
            (VSX_EX): Likewise.
            (VPAIR_V4DI_V4DF): New mode iterator.
            (VPAIR_VECTOR): New mode attribute.
            (vpair_vector): Likewise.
            (vsx_extract_<mode>, VPAIR_V4DF_V4DI iterator): New extract insn for
            vector pair support.
            (vsx_extract_v8sf): Likewise.
            (vsx_extract_<mode>, VPAIR_SMALL_INT iterator): Likewise.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/vector-size-32-1.c: New test.
            * gcc.target/powerpc/vector-size-32-2.c: New test.
            * gcc.target/powerpc/vector-size-32-3.c: New test.
            * gcc.target/powerpc/vector-size-32-4.c: New test.
            * gcc.target/powerpc/vector-size-32-5.c: New test.
            * gcc.target/powerpc/vector-size-32-6.c: New test.

Diff:
---
 gcc/config/rs6000/predicates.md                    |   5 +
 gcc/config/rs6000/rs6000-c.cc                      |   3 +
 gcc/config/rs6000/rs6000-protos.h                  |   7 +
 gcc/config/rs6000/rs6000.cc                        | 337 ++++++++++++-
 gcc/config/rs6000/rs6000.h                         |   6 +
 gcc/config/rs6000/rs6000.md                        |   6 +
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/t-rs6000                         |   1 +
 gcc/config/rs6000/vector-pair.md                   | 536 +++++++++++++++++++++
 gcc/config/rs6000/vector.md                        |   8 +-
 gcc/config/rs6000/vsx.md                           | 120 ++++-
 .../gcc.target/powerpc/vector-size-32-1.c          |  85 ++++
 .../gcc.target/powerpc/vector-size-32-2.c          |  96 ++++
 .../gcc.target/powerpc/vector-size-32-3.c          | 137 ++++++
 .../gcc.target/powerpc/vector-size-32-4.c          | 137 ++++++
 .../gcc.target/powerpc/vector-size-32-5.c          | 137 ++++++
 .../gcc.target/powerpc/vector-size-32-6.c          | 137 ++++++
 17 files changed, 1744 insertions(+), 18 deletions(-)

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 925f69cd3fc..0b3baa9e6b9 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -327,6 +327,11 @@
   (and (match_code "const_int")
        (match_test "IN_RANGE (INTVAL (op), 0, 15)")))
 
+;; Match op = 0..31
+(define_predicate "const_0_to_31_operand"
+  (and (match_code "const_int")
+       (match_test "IN_RANGE (INTVAL (op), 0, 31)")))
+
 ;; Return 1 if op is a 34-bit constant integer.
 (define_predicate "cint34_operand"
   (match_code "const_int")
diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 65be0ac43e2..9246eb2e584 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -631,6 +631,9 @@ rs6000_cpu_cpp_builtins (cpp_reader *pfile)
     builtin_define ("__SIZEOF_IBM128__=16");
   if (ieee128_float_type_node)
     builtin_define ("__SIZEOF_IEEE128__=16");
+  if (TARGET_VECTOR_PAIR)
+    builtin_define ("__VECTOR_PAIR__");
+
 #ifdef TARGET_LIBC_PROVIDES_HWCAP_IN_TCB
   builtin_define ("__BUILTIN_CPU_SUPPORTS__");
 #endif
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index f70118ea40f..b260fe31180 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -138,6 +138,13 @@ extern void rs6000_emit_swsqrt (rtx, rtx, bool);
 extern void output_toc (FILE *, rtx, int, machine_mode);
 extern void rs6000_fatal_bad_address (rtx);
 extern rtx create_TOC_reference (rtx, rtx);
+extern machine_mode vector_pair_to_vector_mode (machine_mode);
+extern machine_mode rs6000_adjust_for_vector_pair (machine_mode, rtx *, int *);
+extern void split_unary_vector_pair (machine_mode, rtx [], rtx (*)(rtx, rtx));
+extern void split_binary_vector_pair (machine_mode, rtx [],
+				      rtx (*)(rtx, rtx, rtx));
+extern void split_fma_vector_pair (machine_mode, rtx [],
+				   rtx (*)(rtx, rtx, rtx, rtx));
 extern void rs6000_split_multireg_move (rtx, rtx);
 extern void rs6000_emit_le_vsx_permute (rtx, rtx, machine_mode);
 extern void rs6000_emit_le_vsx_move (rtx, rtx, machine_mode);
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 4bfc62d930b..33efbcee9af 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -1843,7 +1843,7 @@ rs6000_hard_regno_mode_ok_uncached (int regno, machine_mode mode)
 
   /* Vector pair modes need even/odd VSX register pairs.  Only allow vector
      registers.  */
-  if (mode == OOmode)
+  if (VECTOR_PAIR_MODE (mode))
     return (TARGET_MMA && VSX_REGNO_P (regno) && (regno & 1) == 0);
 
   /* MMA accumulator modes need FPR registers divisible by 4.  */
@@ -1954,9 +1954,10 @@ rs6000_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
    GPR registers, and TImode can go in any GPR as well as VSX registers (PR
    57744).
 
-   Similarly, don't allow OOmode (vector pair, restricted to even VSX
-   registers) or XOmode (vector quad, restricted to FPR registers divisible
-   by 4) to tie with other modes.
+   Similarly, don't allow XOmode (vector quad, restricted to FPR registers
+   divisible by 4) to tie with other modes.
+
+   Vector pair modes can tie with other vector pair modes.
 
    Altivec/VSX vector tests were moved ahead of scalar float mode, so that IEEE
    128-bit floating point on VSX systems ties with other vectors.  */
@@ -1964,10 +1965,15 @@ rs6000_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
 static bool
 rs6000_modes_tieable_p (machine_mode mode1, machine_mode mode2)
 {
-  if (mode1 == PTImode || mode1 == OOmode || mode1 == XOmode
-      || mode2 == PTImode || mode2 == OOmode || mode2 == XOmode)
+  if (mode1 == PTImode || mode1 == XOmode || mode2 == PTImode
+      || mode2 == XOmode)
     return mode1 == mode2;
 
+  if (VECTOR_PAIR_MODE (mode1))
+    return VECTOR_PAIR_MODE (mode2);
+  if (VECTOR_PAIR_MODE (mode2))
+    return false;
+
   if (ALTIVEC_OR_VSX_VECTOR_MODE (mode1))
     return ALTIVEC_OR_VSX_VECTOR_MODE (mode2);
   if (ALTIVEC_OR_VSX_VECTOR_MODE (mode2))
@@ -2578,10 +2584,11 @@ rs6000_debug_reg_global (void)
 	     (int)VECTOR_ELEMENT_MFVSRLD_64BIT);
 
   if (TARGET_MMA)
-    fprintf (stderr, DEBUG_FMT_ID "%s, %s\n",
+    fprintf (stderr, DEBUG_FMT_ID "%s, %s, %s\n",
 	     "vector_pair",
 	     TARGET_LXVP ? "lxvp" : "no-lxvp",
-	     TARGET_STXVP ? "stxvp" : "no-stxvp");
+	     TARGET_STXVP ? "stxvp" : "no-stxvp",
+	     TARGET_VECTOR_PAIR ? "vector-pair" : "no-vector-pair");
 }
 
 \f
@@ -2945,6 +2952,33 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p)
       rs6000_vector_align[XOmode] = 512;
     }
 
+  if (TARGET_VECTOR_PAIR)
+    {
+      rs6000_vector_unit[V32QImode] = VECTOR_NONE;
+      rs6000_vector_mem[V32QImode] = VECTOR_VSX;
+      rs6000_vector_align[V32QImode] = 256;
+
+      rs6000_vector_unit[V16HImode] = VECTOR_NONE;
+      rs6000_vector_mem[V16HImode] = VECTOR_VSX;
+      rs6000_vector_align[V16HImode] = 256;
+
+      rs6000_vector_unit[V8SImode] = VECTOR_NONE;
+      rs6000_vector_mem[V8SImode] = VECTOR_VSX;
+      rs6000_vector_align[V8SImode] = 256;
+
+      rs6000_vector_unit[V4DImode] = VECTOR_NONE;
+      rs6000_vector_mem[V4DImode] = VECTOR_VSX;
+      rs6000_vector_align[V4DImode] = 256;
+
+      rs6000_vector_unit[V8SFmode] = VECTOR_NONE;
+      rs6000_vector_mem[V8SFmode] = VECTOR_VSX;
+      rs6000_vector_align[V8SFmode] = 256;
+
+      rs6000_vector_unit[V4DFmode] = VECTOR_NONE;
+      rs6000_vector_mem[V4DFmode] = VECTOR_VSX;
+      rs6000_vector_align[V4DFmode] = 256;
+    }
+
   /* Register class constraints for the constraints that depend on compile
      switches. When the VSX code was added, different constraints were added
      based on the type (DFmode, V2DFmode, V4SFmode).  For the vector types, all
@@ -3076,6 +3110,22 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p)
 		  reg_addr[XOmode].reload_store = CODE_FOR_reload_xo_di_store;
 		  reg_addr[XOmode].reload_load = CODE_FOR_reload_xo_di_load;
 		}
+
+	      if (TARGET_VECTOR_PAIR)
+		{
+		  reg_addr[V32QImode].reload_store = CODE_FOR_reload_v32qi_di_store;
+		  reg_addr[V32QImode].reload_load = CODE_FOR_reload_v32qi_di_load;
+		  reg_addr[V16HImode].reload_store = CODE_FOR_reload_v16hi_di_store;
+		  reg_addr[V16HImode].reload_load = CODE_FOR_reload_v16hi_di_load;
+		  reg_addr[V8SImode].reload_store = CODE_FOR_reload_v8si_di_store;
+		  reg_addr[V8SImode].reload_load = CODE_FOR_reload_v8si_di_load;
+		  reg_addr[V4DImode].reload_store = CODE_FOR_reload_v4di_di_store;
+		  reg_addr[V4DImode].reload_load = CODE_FOR_reload_v4di_di_load;
+		  reg_addr[V8SFmode].reload_store = CODE_FOR_reload_v8sf_di_store;
+		  reg_addr[V8SFmode].reload_load = CODE_FOR_reload_v8sf_di_load;
+		  reg_addr[V4DFmode].reload_store = CODE_FOR_reload_v4df_di_store;
+		  reg_addr[V4DFmode].reload_load = CODE_FOR_reload_v4df_di_load;
+		}
 	    }
 	}
       else
@@ -3133,6 +3183,22 @@ rs6000_init_hard_regno_mode_ok (bool global_init_p)
 	      reg_addr[DDmode].reload_fpr_gpr = CODE_FOR_reload_fpr_from_gprdd;
 	      reg_addr[DFmode].reload_fpr_gpr = CODE_FOR_reload_fpr_from_gprdf;
 	    }
+
+	  if (TARGET_VECTOR_PAIR)
+	    {
+	      reg_addr[V32QImode].reload_store = CODE_FOR_reload_v32qi_si_store;
+	      reg_addr[V32QImode].reload_load = CODE_FOR_reload_v32qi_si_load;
+	      reg_addr[V16HImode].reload_store = CODE_FOR_reload_v16hi_si_store;
+	      reg_addr[V16HImode].reload_load = CODE_FOR_reload_v16hi_si_load;
+	      reg_addr[V8SImode].reload_store = CODE_FOR_reload_v8si_si_store;
+	      reg_addr[V8SImode].reload_load = CODE_FOR_reload_v8si_si_load;
+	      reg_addr[V4DImode].reload_store = CODE_FOR_reload_v4di_si_store;
+	      reg_addr[V4DImode].reload_load = CODE_FOR_reload_v4di_si_load;
+	      reg_addr[V8SFmode].reload_store = CODE_FOR_reload_v8sf_si_store;
+	      reg_addr[V8SFmode].reload_load = CODE_FOR_reload_v8sf_si_load;
+	      reg_addr[V4DFmode].reload_store = CODE_FOR_reload_v4df_si_store;
+	      reg_addr[V4DFmode].reload_load = CODE_FOR_reload_v4df_si_load;
+	    }
 	}
 
       reg_addr[DFmode].scalar_in_vmx_p = true;
@@ -7649,6 +7715,48 @@ rs6000_expand_vector_extract (rtx target, rtx vec, rtx elt)
 	      return;
 	    }
 	  break;
+	case E_V4DImode:
+	  if (TARGET_MMA && TARGET_VECTOR_PAIR)
+	    {
+	      emit_insn (gen_vsx_extract_v4di (target, vec, elt));
+	      return;
+	    }
+	  break;
+	case E_V8SImode:
+	  if (TARGET_MMA && TARGET_VECTOR_PAIR)
+	    {
+	      emit_insn (gen_vsx_extract_v8si (target, vec, elt));
+	      return;
+	    }
+	  break;
+	case E_V16HImode:
+	  if (TARGET_MMA && TARGET_VECTOR_PAIR)
+	    {
+	      emit_insn (gen_vsx_extract_v16hi (target, vec, elt));
+	      return;
+	    }
+	  break;
+	case E_V32QImode:
+	  if (TARGET_MMA && TARGET_VECTOR_PAIR)
+	    {
+	      emit_insn (gen_vsx_extract_v32qi (target, vec, elt));
+	      return;
+	    }
+	  break;
+	case E_V4DFmode:
+	  if (TARGET_MMA)
+	    {
+	      emit_insn (gen_vsx_extract_v4df (target, vec, elt));
+	      return;
+	    }
+	  break;
+	case E_V8SFmode:
+	  if (TARGET_MMA)
+	    {
+	      emit_insn (gen_vsx_extract_v8sf (target, vec, elt));
+	      return;
+	    }
+	  break;
 	}
     }
   else if (VECTOR_MEM_VSX_P (mode) && !CONST_INT_P (elt)
@@ -8690,6 +8798,12 @@ reg_offset_addressing_ok_p (machine_mode mode)
       /* The vector pair/quad types support offset addressing if the
 	 underlying vectors support offset addressing.  */
     case E_OOmode:
+    case E_V8SFmode:
+    case E_V4DFmode:
+    case E_V32QImode:
+    case E_V16HImode:
+    case E_V8SImode:
+    case E_V4DImode:
     case E_XOmode:
       return TARGET_MMA;
 
@@ -10983,11 +11097,17 @@ rs6000_emit_move (rtx dest, rtx source, machine_mode mode)
 	operands[1] = force_const_mem (mode, operands[1]);
       break;
 
+    case E_V32QImode:
     case E_V16QImode:
+    case E_V16HImode:
     case E_V8HImode:
+    case E_V8SFmode:
     case E_V4SFmode:
+    case E_V8SImode:
     case E_V4SImode:
+    case E_V4DFmode:
     case E_V2DFmode:
+    case E_V4DImode:
     case E_V2DImode:
     case E_V1TImode:
       if (CONSTANT_P (operands[1])
@@ -13244,7 +13364,7 @@ rs6000_preferred_reload_class (rtx x, enum reg_class rclass)
      the GPR registers.  */
   if (rclass == GEN_OR_FLOAT_REGS)
     {
-      if (mode == OOmode)
+      if (VECTOR_PAIR_MODE (mode))
 	return VSX_REGS;
 
       if (mode == XOmode)
@@ -27185,6 +27305,165 @@ rs6000_split_logical (rtx operands[3],
   return;
 }
 
+/* For a vector pair mode, return the equivalent vector mode or VOIDmode.  */
+
+machine_mode
+vector_pair_to_vector_mode (machine_mode mode)
+{
+  machine_mode vmode;
+
+  switch (mode)
+    {
+    case E_V32QImode: vmode = V16QImode; break;
+    case E_V16HImode: vmode = V8HImode;  break;
+    case E_V8SImode:  vmode = V4SImode;  break;
+    case E_V8SFmode:  vmode = V4SFmode;  break;
+    case E_V4DImode:  vmode = V2DImode;  break;
+    case E_V4DFmode:  vmode = V2DFmode;  break;
+    default:          vmode = VOIDmode;  break;
+    }
+
+  return vmode;
+}
+
+/* Adjust a vector pair register, element number, and mode to reflect the
+   vector register after splitting it.
+
+   Return the vector mode if the mode is a vector pair, or the original mode if
+   it wasn't.
+
+   MODE is the original mode, P_REG is a pointer to the register to be adjust,
+   and P_ELEMENT is a pointer to the element number to be adjusted.  */
+
+machine_mode
+rs6000_adjust_for_vector_pair (machine_mode orig_mode,
+			       rtx *p_reg,
+			       int *p_element)
+{
+  machine_mode vmode = vector_pair_to_vector_mode (orig_mode);
+
+  /* Return if not a vector pair.  */
+  if (vmode == VOIDmode)
+    return orig_mode;
+
+  unsigned regno = reg_or_subregno (*p_reg);
+  int element = *p_element;
+
+  /* Choose which register.  We have to reverse the words for little
+     endian.  */
+  int nunits = GET_MODE_NUNITS (vmode);
+  if (element >= nunits)
+    {
+      element -= nunits;
+      if (WORDS_BIG_ENDIAN)
+	regno++;
+    }
+  else if (!WORDS_BIG_ENDIAN)
+    regno++;
+
+  /* Adjust elements.  */
+  *p_reg = gen_rtx_REG (vmode, regno);
+  *p_element = element;
+  return vmode;
+}
+
+
+/* Split a vector constant for a type that can be held into a vector register
+   pair into 2 separate constants that can be held in a single vector register.
+   Return true if we can split the constant.  */
+
+bool
+rs6000_split_vpair_constant (rtx op, rtx *high, rtx *low)
+{
+  machine_mode vmode = vector_pair_to_vector_mode (GET_MODE (op));
+
+  *high = *low = NULL_RTX;
+
+  if (!CONST_VECTOR_P (op) || vmode == GET_MODE (op))
+    return false;
+
+  size_t nunits = GET_MODE_NUNITS (vmode);
+  rtvec hi_vec = rtvec_alloc (nunits);
+  rtvec lo_vec = rtvec_alloc (nunits);
+
+  for (size_t i = 0; i < nunits; i++)
+    {
+      RTVEC_ELT (hi_vec, i) = CONST_VECTOR_ELT (op, i);
+      RTVEC_ELT (lo_vec, i) = CONST_VECTOR_ELT (op, i + nunits);
+    }
+
+  *high = gen_rtx_CONST_VECTOR (vmode, hi_vec);
+  *low = gen_rtx_CONST_VECTOR (vmode, lo_vec);
+  return true;
+}
+
+/* Split a unary vector pair insn into two separate vector insns.  */
+
+void
+split_unary_vector_pair (machine_mode mode,		/* vector mode.  */
+			 rtx operands[],		/* dest, src.  */
+			 rtx (*func)(rtx, rtx))		/* create insn.  */
+{
+  unsigned reg0 = reg_or_subregno (operands[0]);
+  unsigned reg1 = reg_or_subregno (operands[1]);
+
+  emit_insn (func (gen_rtx_REG (mode, reg0),
+		   gen_rtx_REG (mode, reg1)));
+
+  emit_insn (func (gen_rtx_REG (mode, reg0 + 1),
+		   gen_rtx_REG (mode, reg1 + 1)));
+
+  return;
+}
+
+/* Split a binary vector pair insn into two separate vector insns.  */
+
+void
+split_binary_vector_pair (machine_mode mode,		/* vector mode.  */
+			 rtx operands[],		/* dest, src.  */
+			 rtx (*func)(rtx, rtx, rtx))	/* create insn.  */
+{
+  unsigned reg0 = reg_or_subregno (operands[0]);
+  unsigned reg1 = reg_or_subregno (operands[1]);
+  unsigned reg2 = reg_or_subregno (operands[2]);
+
+  emit_insn (func (gen_rtx_REG (mode, reg0),
+		   gen_rtx_REG (mode, reg1),
+		   gen_rtx_REG (mode, reg2)));
+
+  emit_insn (func (gen_rtx_REG (mode, reg0 + 1),
+		   gen_rtx_REG (mode, reg1 + 1),
+		   gen_rtx_REG (mode, reg2 + 1)));
+
+  return;
+}
+
+/* Split a fused multiply-add vector pair insn into two separate vector
+   insns.  */
+
+void
+split_fma_vector_pair (machine_mode mode,		/* vector mode.  */
+		       rtx operands[],			/* dest, src.  */
+		       rtx (*func)(rtx, rtx, rtx, rtx))	/* create insn.  */
+{
+  unsigned reg0 = reg_or_subregno (operands[0]);
+  unsigned reg1 = reg_or_subregno (operands[1]);
+  unsigned reg2 = reg_or_subregno (operands[2]);
+  unsigned reg3 = reg_or_subregno (operands[3]);
+
+  emit_insn (func (gen_rtx_REG (mode, reg0),
+		   gen_rtx_REG (mode, reg1),
+		   gen_rtx_REG (mode, reg2),
+		   gen_rtx_REG (mode, reg3)));
+
+  emit_insn (func (gen_rtx_REG (mode, reg0 + 1),
+		   gen_rtx_REG (mode, reg1 + 1),
+		   gen_rtx_REG (mode, reg2 + 1),
+		   gen_rtx_REG (mode, reg3 + 1)));
+
+  return;
+}
+
 /* Emit instructions to move SRC to DST.  Called by splitters for
    multi-register moves.  It will emit at most one instruction for
    each register that is accessed; that is, it won't emit li/lis pairs
@@ -27203,6 +27482,8 @@ rs6000_split_multireg_move (rtx dst, rtx src)
   int reg_mode_size;
   /* The number of registers that will be moved.  */
   int nregs;
+  /* Hi/lo values for splitting vector pair constants.  */
+  rtx vpair_hi, vpair_lo;
 
   reg = REG_P (dst) ? REGNO (dst) : REGNO (src);
   mode = GET_MODE (dst);
@@ -27218,8 +27499,11 @@ rs6000_split_multireg_move (rtx dst, rtx src)
     }
   /* If we have a vector pair/quad mode, split it into two/four separate
      vectors.  */
-  else if (mode == OOmode || mode == XOmode)
-    reg_mode = V1TImode;
+  else if (VECTOR_PAIR_MODE (mode) || mode == XOmode)
+    {
+      machine_mode vmode = vector_pair_to_vector_mode (mode);
+      reg_mode = (vmode == VOIDmode) ? V1TImode : vmode;
+    }
   else if (FP_REGNO_P (reg))
     reg_mode = DECIMAL_FLOAT_MODE_P (mode) ? DDmode :
 	(TARGET_HARD_FLOAT ? DFmode : SFmode);
@@ -27231,6 +27515,29 @@ rs6000_split_multireg_move (rtx dst, rtx src)
 
   gcc_assert (reg_mode_size * nregs == GET_MODE_SIZE (mode));
 
+  /* Handle vector pair constants.  */
+  if (CONST_VECTOR_P (src) && VECTOR_PAIR_MODE (mode) && TARGET_MMA
+      && rs6000_split_vpair_constant (src, &vpair_hi, &vpair_lo)
+      && VSX_REGNO_P (reg))
+    {
+      reg_mode = GET_MODE (vpair_hi);
+      rtx reg_hi = gen_rtx_REG (reg_mode, reg);
+      rtx reg_lo = gen_rtx_REG (reg_mode, reg + 1);
+
+      emit_move_insn (reg_hi, vpair_hi);
+
+      /* 0.0 is easy.  For other constants, copy the high register into the low
+	 register if the two sets of constants are equal.  This means we won't
+	 be doing back to back prefixed load immediate instructions.  */
+      if (rtx_equal_p (vpair_hi, vpair_lo)
+	  && !rtx_equal_p (vpair_hi, CONST0_RTX (reg_mode)))
+	emit_move_insn (reg_lo, reg_hi);
+      else
+	emit_move_insn (reg_lo, vpair_lo);
+      
+      return;
+    }
+      
   /* TDmode residing in FP registers is special, since the ISA requires that
      the lower-numbered word of a register pair is always the most significant
      word, even in little-endian mode.  This does not match the usual subreg
@@ -27270,7 +27577,7 @@ rs6000_split_multireg_move (rtx dst, rtx src)
      below.  This means the last register gets the first memory
      location.  We also need to be careful of using the right register
      numbers if we are splitting XO to OO.  */
-  if (mode == OOmode || mode == XOmode)
+  if (VECTOR_PAIR_MODE (mode) || mode == XOmode)
     {
       nregs = hard_regno_nregs (reg, mode);
       int reg_mode_nregs = hard_regno_nregs (reg, reg_mode);
@@ -27330,7 +27637,7 @@ rs6000_split_multireg_move (rtx dst, rtx src)
 	  gcc_assert (REG_P (dst));
 	  if (GET_MODE (src) == XOmode)
 	    gcc_assert (FP_REGNO_P (REGNO (dst)));
-	  if (GET_MODE (src) == OOmode)
+	  if (VECTOR_PAIR_MODE (GET_MODE (src)))
 	    gcc_assert (VSX_REGNO_P (REGNO (dst)));
 
 	  int nvecs = XVECLEN (src, 0);
@@ -27405,7 +27712,7 @@ rs6000_split_multireg_move (rtx dst, rtx src)
 	 overlap.  */
       int i;
       /* XO/OO are opaque so cannot use subregs. */
-      if (mode == OOmode || mode == XOmode )
+      if (VECTOR_PAIR_MODE (mode) || mode == XOmode )
 	{
 	  for (i = nregs - 1; i >= 0; i--)
 	    {
@@ -27579,7 +27886,7 @@ rs6000_split_multireg_move (rtx dst, rtx src)
 	    continue;
 
 	  /* XO/OO are opaque so cannot use subregs. */
-	  if (mode == OOmode || mode == XOmode )
+	  if (VECTOR_PAIR_MODE (mode) || mode == XOmode )
 	    {
 	      rtx dst_i = gen_rtx_REG (reg_mode, REGNO (dst) + j);
 	      rtx src_i = gen_rtx_REG (reg_mode, REGNO (src) + j);
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 3503614efbd..cd18fba018f 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -1006,6 +1006,12 @@ enum data_align { align_abi, align_opt, align_both };
   (ALTIVEC_VECTOR_MODE (MODE) || VSX_VECTOR_MODE (MODE)			\
    || (MODE) == V2DImode || (MODE) == V1TImode)
 
+/* Whether a mode is held in paired vector registers.  */
+#define VECTOR_PAIR_MODE(MODE)						\
+  ((MODE) == OOmode							\
+   || (MODE) == V32QImode || (MODE) == V16HImode || (MODE) == V8SImode	\
+   || (MODE) == V4DImode || (MODE) == V8SFmode || (MODE) == V4DFmode)
+
 /* Post-reload, we can't use any new AltiVec registers, as we already
    emitted the vrsave mask.  */
 
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 02de5afcd18..4d9e99c397d 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -680,9 +680,13 @@
 		      (HI    "h")
 		      (SI    "w")
 		      (DI    "d")
+		      (V32QI "b")
 		      (V16QI "b")
+		      (V16HI "h")
 		      (V8HI  "h")
+		      (V8SI  "w")
 		      (V4SI  "w")
+		      (V4DI  "d")
 		      (V2DI  "d")
 		      (V1TI  "q")
 		      (TI    "q")])
@@ -808,6 +812,7 @@
 ;; Reload iterator for creating the function to allocate a base register to
 ;; supplement addressing modes.
 (define_mode_iterator RELOAD [V16QI V8HI V4SI V2DI V4SF V2DF V1TI
+			      V32QI V16HI V8SI V4DI V8SF V4DF
 			      SF SD SI DF DD DI TI PTI KF IF TF
 			      OO XO])
 
@@ -15734,6 +15739,7 @@
 (include "vsx.md")
 (include "altivec.md")
 (include "mma.md")
+(include "vector-pair.md")
 (include "dfp.md")
 (include "crypto.md")
 (include "htm.md")
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 663f0578f30..f7656c4c5ce 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -605,6 +605,10 @@ mstxvp
 Target Undocumented Var(TARGET_STXVP) Init(1) Save
 Generate (do not generate) the STXVP instruction if -mmma is enabled.
 
+mvector-pair
+Target Undocumented Var(TARGET_VECTOR_PAIR) Init(0) Save
+Generate (do not generate) vector pair instructions for vector_size(32).
+
 mrelative-jumptables
 Target Undocumented Var(rs6000_relative_jumptables) Init(1) Save
 
diff --git a/gcc/config/rs6000/t-rs6000 b/gcc/config/rs6000/t-rs6000
index f183b42ce1d..5fc89499795 100644
--- a/gcc/config/rs6000/t-rs6000
+++ b/gcc/config/rs6000/t-rs6000
@@ -128,6 +128,7 @@ MD_INCLUDES = $(srcdir)/config/rs6000/rs64.md \
 	$(srcdir)/config/rs6000/vsx.md \
 	$(srcdir)/config/rs6000/altivec.md \
 	$(srcdir)/config/rs6000/mma.md \
+	$(srcdir)/config/rs6000/vector-pair.md \
 	$(srcdir)/config/rs6000/crypto.md \
 	$(srcdir)/config/rs6000/htm.md \
 	$(srcdir)/config/rs6000/dfp.md \
diff --git a/gcc/config/rs6000/vector-pair.md b/gcc/config/rs6000/vector-pair.md
new file mode 100644
index 00000000000..64e2bb051a7
--- /dev/null
+++ b/gcc/config/rs6000/vector-pair.md
@@ -0,0 +1,536 @@
+;; Vector pair arithmetic and logical instruction support.
+;; Copyright (C) 2020-2023 Free Software Foundation, Inc.
+;; Contributed by Peter Bergner <bergner@linux.ibm.com> and
+;;		  Michael Meissner <meissner@linux.ibm.com>
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published
+;; by the Free Software Foundation; either version 3, or (at your
+;; option) any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but WITHOUT
+;; ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+;; or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+;; License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+;; This function adds support for doing vector operations on pairs of vector
+;; registers.  Most of the instructions use vector pair instructions to load
+;; and possibly store registers, but splitting the operation after register
+;; allocation to do 2 separate operations.  The second scheduler pass can
+;; interleave other instructions between these pairs of instructions if
+;; possible.
+
+;; Iterator for all vector pair modes
+(define_mode_iterator VPAIR [V32QI V16HI V8SI V4DI V8SF V4DF])
+
+;; Iterator for the integer vector pair modes
+(define_mode_iterator VPAIR_INT [V32QI V16HI V8SI V4DI])
+
+;; Special iterators for NEG (V4SI and V2DI have vneg{w,d}), while V16QI and
+;; V8HI have to use a subtract from 0.
+(define_mode_iterator VPAIR_NEG_VNEG [V4DI V8SI])
+(define_mode_iterator VPAIR_NEG_SUB [V32QI V16HI])
+
+;; Iterator for the floating point vector pair modes
+(define_mode_iterator VPAIR_FP [V8SF V4DF])
+
+;; Iterator doing unary/binary arithmetic on vector pairs.  Split it into
+;; integer and floating point operations.
+(define_code_iterator VPAIR_INT_UNARY   [not])
+(define_code_iterator VPAIR_INT_BINARY  [plus minus smin smax])
+(define_code_iterator VPAIR_INT_LOGICAL [and ior xor])
+
+(define_code_iterator VPAIR_FP_UNARY  [abs neg])
+(define_code_iterator VPAIR_FP_BINARY [plus minus mult smin smax])
+
+;; Give the insn name from the opertion
+(define_code_attr vpair_op [(abs      "abs")
+			    (and      "and")
+			    (fma      "fma")
+			    (ior      "ior")
+			    (minus    "sub")
+			    (mult     "mul")
+			    (not      "one_cmpl")
+			    (neg      "neg")
+			    (plus     "add")
+			    (smin     "smin")
+			    (smax     "smax")
+			    (umin     "umin")
+			    (umax     "umax")
+			    (xor      "xor")])
+
+;; Vector pair move support.
+(define_expand "mov<mode>"
+  [(set (match_operand:VPAIR 0 "nonimmediate_operand")
+	(match_operand:VPAIR 1 "input_operand"))]
+  "TARGET_VECTOR_PAIR"
+{
+  rs6000_emit_move (operands[0], operands[1], <MODE>mode);
+  DONE;
+})
+
+(define_insn_and_split "*mov<mode>"
+  [(set (match_operand:VPAIR 0 "nonimmediate_operand" "=wa,wa,m,Qo,wa,wa,wa")
+	(match_operand:VPAIR 1 "input_operand" "m,Qo,wa,wa,wa,j,eP"))]
+  "TARGET_VECTOR_PAIR
+   && (gpc_reg_operand (operands[0], <MODE>mode)
+       || gpc_reg_operand (operands[1], <MODE>mode))"
+  "@
+   lxvp%X1 %x0,%1
+   #
+   stxvp%X0 %x1,%0
+   #
+   #
+   #
+   #"
+  "&& reload_completed
+   && !(MEM_P (operands[0]) && TARGET_STXVP)
+   && !(MEM_P (operands[1]) && TARGET_LXVP)"
+  [(const_int 0)]
+{
+  rs6000_split_multireg_move (operands[0], operands[1]);
+  DONE;
+}
+  [(set_attr "type" "vecload,vecload,vecstore,vecstore,veclogical,
+                     vecperm,vecperm")
+   (set_attr "size" "256")
+   (set_attr "isa" "lxvp,*,stxvp,*,*,*,*")
+   (set_attr "length" "*,8,*,8,8,8,40")])
+
+\f
+;; Vector pair floating point arithmetic unary operations
+(define_insn_and_split "<vpair_op><mode>2"
+  [(set (match_operand:VPAIR_FP 0 "vsx_register_operand" "=wa")
+	(VPAIR_FP_UNARY:VPAIR_FP
+	 (match_operand:VPAIR_FP 1 "vsx_register_operand" "wa")))]
+  "TARGET_VECTOR_PAIR"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_unary_vector_pair (<VPAIR_VECTOR>mode, operands,
+			   gen_<vpair_op><vpair_vector>2);
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+;; Optimize negative absolute value (both floating point and integer)
+(define_insn_and_split "nabs<mode>2"
+  [(set (match_operand:VPAIR_FP 0 "vsx_register_operand" "=wa")
+	(neg:VPAIR_FP
+	 (abs:VPAIR_FP
+	  (match_operand:VPAIR_FP 1 "vsx_register_operand" "wa"))))]
+  "TARGET_VECTOR_PAIR"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_unary_vector_pair (<VPAIR_VECTOR>mode, operands,
+			   gen_vsx_nabs<vpair_vector>2);
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+;; Vector pair floating point arithmetic binary operations
+(define_insn_and_split "<vpair_op><mode>3"
+  [(set (match_operand:VPAIR_FP 0 "vsx_register_operand" "=wa")
+	(VPAIR_FP_BINARY:VPAIR_FP
+	 (match_operand:VPAIR_FP 1 "vsx_register_operand" "wa")
+	 (match_operand:VPAIR_FP 2 "vsx_register_operand" "wa")))]
+  "TARGET_VECTOR_PAIR"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_binary_vector_pair (<VPAIR_VECTOR>mode, operands,
+			    gen_<vpair_op><vpair_vector>3);
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+;; Vector pair floating point fused multiply-add
+(define_insn_and_split "fma<mode>3"
+  [(set (match_operand:VPAIR_FP 0 "vsx_register_operand" "=wa,wa")
+	(fma:VPAIR_FP
+	 (match_operand:VPAIR_FP 1 "vsx_register_operand" "%wa,wa")
+	 (match_operand:VPAIR_FP 2 "vsx_register_operand" "wa,0")
+	 (match_operand:VPAIR_FP 3 "vsx_register_operand" "0,wa")))]
+  "TARGET_VECTOR_PAIR"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_fma_vector_pair (<VPAIR_VECTOR>mode, operands,
+			 gen_fma<vpair_vector>4);
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+;; Vector pair floating point fused multiply-subtract
+(define_insn_and_split "fms<mode>3"
+  [(set (match_operand:VPAIR_FP 0 "vsx_register_operand" "=wa,wa")
+	(fma:VPAIR_FP
+	 (match_operand:VPAIR_FP 1 "vsx_register_operand" "%wa,wa")
+	 (match_operand:VPAIR_FP 2 "vsx_register_operand" "wa,0")
+	 (neg:VPAIR_FP
+	  (match_operand:VPAIR_FP 3 "vsx_register_operand" "0,wa"))))]
+  "TARGET_VECTOR_PAIR"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_fma_vector_pair (<VPAIR_VECTOR>mode, operands,
+			 gen_fms<vpair_vector>4);
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+;; Vector pair floating point negative fused multiply-add
+(define_insn_and_split "nfma<mode>3"
+  [(set (match_operand:VPAIR_FP 0 "vsx_register_operand" "=wa,wa")
+	(neg:VPAIR_FP
+	 (fma:VPAIR_FP
+	  (match_operand:VPAIR_FP 1 "vsx_register_operand" "%wa,wa")
+	  (match_operand:VPAIR_FP 2 "vsx_register_operand" "wa,0")
+	  (match_operand:VPAIR_FP 3 "vsx_register_operand" "0,wa"))))]
+  "TARGET_VECTOR_PAIR"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_fma_vector_pair (<VPAIR_VECTOR>mode, operands,
+			 gen_nfma<vpair_vector>4);
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+;; Vector pair floating point fused negative multiply-subtract
+(define_insn_and_split "nfms<mode>3"
+  [(set (match_operand:VPAIR_FP 0 "vsx_register_operand" "=wa,wa")
+	(neg:VPAIR_FP
+	 (fma:VPAIR_FP
+	  (match_operand:VPAIR_FP 1 "vsx_register_operand" "%wa,wa")
+	  (match_operand:VPAIR_FP 2 "vsx_register_operand" "wa,0")
+	  (neg:VPAIR_FP
+	   (match_operand:VPAIR_FP 3 "vsx_register_operand" "0,wa")))))]
+  "TARGET_VECTOR_PAIR"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_fma_vector_pair (<VPAIR_VECTOR>mode, operands,
+			 gen_nfms<vpair_vector>4);
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+;; Optimize vector pair (a * b) + c into fma (a, b, c)
+(define_insn_and_split "*fma_fpcontract_<mode>3"
+  [(set (match_operand:VPAIR_FP 0 "vsx_register_operand" "=wa,wa")
+	(plus:VPAIR_FP
+	 (mult:VPAIR_FP
+	  (match_operand:VPAIR_FP 1 "vsx_register_operand" "%wa,wa")
+	  (match_operand:VPAIR_FP 2 "vsx_register_operand" "wa,0"))
+	 (match_operand:VPAIR_FP 3 "vsx_register_operand" "0,wa")))]
+  "TARGET_VECTOR_PAIR && flag_fp_contract_mode == FP_CONTRACT_FAST"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(fma:VPAIR_FP (match_dup 1)
+		      (match_dup 2)
+		      (match_dup 3)))]
+{
+}
+  [(set_attr "length" "8")])
+
+;; Optimize vector pair (a * b) - c into fma (a, b, -c)
+(define_insn_and_split "*fms_fpcontract_<mode>3"
+  [(set (match_operand:VPAIR_FP 0 "vsx_register_operand" "=wa,wa")
+	(minus:VPAIR_FP
+	 (mult:VPAIR_FP
+	  (match_operand:VPAIR_FP 1 "vsx_register_operand" "%wa,wa")
+	  (match_operand:VPAIR_FP 2 "vsx_register_operand" "wa,0"))
+	 (match_operand:VPAIR_FP 3 "vsx_register_operand" "0,wa")))]
+  "TARGET_VECTOR_PAIR && flag_fp_contract_mode == FP_CONTRACT_FAST"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(fma:VPAIR_FP (match_dup 1)
+		      (match_dup 2)
+		      (neg:VPAIR_FP
+		       (match_dup 3))))]
+{
+}
+  [(set_attr "length" "8")])
+
+;; Optimize vector pair -((a * b) + c) into -fma (a, b, c)
+(define_insn_and_split "*nfma_fpcontract_<mode>3"
+  [(set (match_operand:VPAIR_FP 0 "vsx_register_operand" "=wa,wa")
+	(neg:VPAIR_FP
+	 (plus:VPAIR_FP
+	  (mult:VPAIR_FP
+	   (match_operand:VPAIR_FP 1 "vsx_register_operand" "%wa,wa")
+	   (match_operand:VPAIR_FP 2 "vsx_register_operand" "wa,0"))
+	  (match_operand:VPAIR_FP 3 "vsx_register_operand" "0,wa"))))]
+  "TARGET_VECTOR_PAIR && flag_fp_contract_mode == FP_CONTRACT_FAST"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(neg:VPAIR_FP
+	 (fma:VPAIR_FP (match_dup 1)
+		       (match_dup 2)
+		       (match_dup 3))))]
+{
+}
+  [(set_attr "length" "8")])
+
+;; Optimize vector pair -((a * b) - c) into -fma (a, b, -c)
+(define_insn_and_split "*nfms_fpcontract_<mode>3"
+  [(set (match_operand:VPAIR_FP 0 "vsx_register_operand" "=wa,wa")
+	(neg:VPAIR_FP
+	 (minus:VPAIR_FP
+	  (mult:VPAIR_FP
+	   (match_operand:VPAIR_FP 1 "vsx_register_operand" "%wa,wa")
+	   (match_operand:VPAIR_FP 2 "vsx_register_operand" "wa,0"))
+	  (match_operand:VPAIR_FP 3 "vsx_register_operand" "0,wa"))))]
+  "TARGET_VECTOR_PAIR && flag_fp_contract_mode == FP_CONTRACT_FAST"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(neg:VPAIR_FP
+	 (fma:VPAIR_FP (match_dup 1)
+		       (match_dup 2)
+		       (neg:VPAIR_FP
+			(match_dup 3)))))]
+{
+}
+  [(set_attr "length" "8")])
+
+\f
+;; Vector pair integer arithmetic unary operations
+(define_insn_and_split "<vpair_op><mode>2"
+  [(set (match_operand:VPAIR_INT 0 "vsx_register_operand" "=wa")
+	(VPAIR_INT_UNARY:VPAIR_INT
+	 (match_operand:VPAIR_INT 1 "vsx_register_operand" "wa")))]
+  "TARGET_VECTOR_PAIR"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_unary_vector_pair (<VPAIR_VECTOR>mode, operands,
+			   gen_<vpair_op><vpair_vector>2);
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+;; Vector pair negate if we have the VNEGx instruction.
+(define_insn_and_split "neg<mode>2"
+  [(set (match_operand:VPAIR_NEG_VNEG 0 "vsx_register_operand" "=v")
+	(neg:VPAIR_NEG_VNEG
+	 (match_operand:VPAIR_NEG_VNEG 1 "vsx_register_operand" "v")))]
+  "TARGET_VECTOR_PAIR"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_unary_vector_pair (<VPAIR_VECTOR>mode, operands,
+			   gen_neg<vpair_vector>2);
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+;; Vector pair negate if we have to do a subtract from 0
+(define_insn_and_split "neg<mode>2"
+  [(set (match_operand:VPAIR_NEG_SUB 0 "vsx_register_operand" "=v")
+	(neg:VPAIR_NEG_SUB
+	 (match_operand:VPAIR_NEG_SUB 1 "vsx_register_operand" "v")))
+   (clobber (match_scratch:<VPAIR_VECTOR> 2 "=&v"))]
+  "TARGET_VECTOR_PAIR"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  enum machine_mode mode = <VPAIR_VECTOR>mode;
+  rtx tmp = operands[2];
+  unsigned reg0 = reg_or_subregno (operands[0]);
+  unsigned reg1 = reg_or_subregno (operands[1]);
+
+  emit_move_insn (tmp, CONST0_RTX (mode));
+  emit_insn (gen_sub<vpair_vector>3 (gen_rtx_REG (mode, reg0),
+				     tmp,
+				     gen_rtx_REG (mode, reg1)));
+
+  emit_insn (gen_sub<vpair_vector>3 (gen_rtx_REG (mode, reg0 + 1),
+				     tmp,
+				     gen_rtx_REG (mode, reg1 + 1)));
+
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+;; Vector pair integer arithmetic binary operations
+(define_insn_and_split "<vpair_op><mode>3"
+  [(set (match_operand:VPAIR_INT 0 "vsx_register_operand" "=v")
+	(VPAIR_INT_BINARY:VPAIR_INT
+	 (match_operand:VPAIR_INT 1 "vsx_register_operand" "v")
+	 (match_operand:VPAIR_INT 2 "vsx_register_operand" "v")))]
+  "TARGET_VECTOR_PAIR"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_binary_vector_pair (<VPAIR_VECTOR>mode, operands,
+			    gen_<vpair_op><vpair_vector>3);
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+;; Vector pair integer arithmetic logical operations
+(define_insn_and_split "<vpair_op><mode>3"
+  [(set (match_operand:VPAIR_INT 0 "vsx_register_operand" "=wa")
+	(VPAIR_INT_LOGICAL:VPAIR_INT
+	 (match_operand:VPAIR_INT 1 "vsx_register_operand" "wa")
+	 (match_operand:VPAIR_INT 2 "vsx_register_operand" "wa")))]
+  "TARGET_VECTOR_PAIR"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_binary_vector_pair (<VPAIR_VECTOR>mode, operands,
+			    gen_<vpair_op><vpair_vector>3);
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+;; Optiomize vector pair ~(a | b)  or ((~a) & (~b)) to produce xxlnor
+(define_insn_and_split "*nor<mode>3_1"
+  [(set (match_operand:VPAIR_INT 0 "vsx_register_operand" "=wa")
+	(not:VPAIR_INT
+	 (ior:VPAIR_INT
+	  (match_operand:VPAIR_INT 1 "vsx_register_operand" "wa")
+	  (match_operand:VPAIR_INT 2 "vsx_register_operand" "wa"))))]
+  "TARGET_VECTOR_PAIR"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_binary_vector_pair (<VPAIR_VECTOR>mode, operands,
+			    gen_nor<vpair_vector>3);
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+(define_insn_and_split "*nor<mode>3_2"
+  [(set (match_operand:VPAIR_INT 0 "vsx_register_operand" "=wa")
+	(and:VPAIR_INT
+	 (not:VPAIR_INT
+	  (match_operand:VPAIR_INT 1 "vsx_register_operand" "wa"))
+	 (not:VPAIR_INT
+	  (match_operand:VPAIR_INT 2 "vsx_register_operand" "wa"))))]
+  "TARGET_VECTOR_PAIR"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_binary_vector_pair (<VPAIR_VECTOR>mode, operands,
+			    gen_nor<vpair_vector>3);
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+;; Optimize vector pair (~a) & b to use xxlandc
+(define_insn_and_split "*andc<mode>3"
+  [(set (match_operand:VPAIR_INT 0 "vsx_register_operand" "=wa")
+	(and:VPAIR_INT
+	 (not:VPAIR_INT
+	  (match_operand:VPAIR_INT 1 "vsx_register_operand" "wa"))
+	 (match_operand:VPAIR_INT 2 "vsx_register_operand" "wa")))]
+  "TARGET_VECTOR_PAIR"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_binary_vector_pair (<VPAIR_VECTOR>mode, operands,
+			    gen_andc<vpair_vector>3);
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+;; Optimize vector pair ~(a ^ b) to produce xxleqv
+(define_insn_and_split "*eqv<mode>3"
+  [(set (match_operand:VPAIR_INT 0 "vsx_register_operand" "=wa")
+	(not:VPAIR_INT
+	 (xor:VPAIR_INT
+	  (match_operand:VPAIR_INT 1 "vsx_register_operand" "wa")
+	  (match_operand:VPAIR_INT 2 "vsx_register_operand" "wa"))))]
+  "TARGET_VECTOR_PAIR"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_binary_vector_pair (<VPAIR_VECTOR>mode, operands,
+			    gen_nor<vpair_vector>3);
+  DONE;
+}
+[(set_attr "length" "8")])
+
+
+;; Optiomize vector pair ~(a & b) or ((~a) | (~b)) to produce xxlnand
+(define_insn_and_split "*nand<mode>3_1"
+  [(set (match_operand:VPAIR_INT 0 "vsx_register_operand" "=wa")
+	(not:VPAIR_INT
+	 (and:VPAIR_INT
+	  (match_operand:VPAIR_INT 1 "vsx_register_operand" "wa")
+	  (match_operand:VPAIR_INT 2 "vsx_register_operand" "wa"))))]
+  "TARGET_VECTOR_PAIR"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_binary_vector_pair (<VPAIR_VECTOR>mode, operands,
+			    gen_nand<vpair_vector>3);
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+(define_insn_and_split "*nand<mode>3_2"
+  [(set (match_operand:VPAIR_INT 0 "vsx_register_operand" "=wa")
+	(ior:VPAIR_INT
+	 (not:VPAIR_INT
+	  (match_operand:VPAIR_INT 1 "vsx_register_operand" "wa"))
+	 (not:VPAIR_INT
+	  (match_operand:VPAIR_INT 2 "vsx_register_operand" "wa"))))]
+  "TARGET_VECTOR_PAIR"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_binary_vector_pair (<VPAIR_VECTOR>mode, operands,
+			    gen_nand<vpair_vector>3);
+  DONE;
+}
+  [(set_attr "length" "8")])
+
+;; Optimize vector pair (~a) | b to produce xxlorc
+(define_insn_and_split "*orc<mode>3"
+  [(set (match_operand:VPAIR_INT 0 "vsx_register_operand" "=wa")
+	(ior:VPAIR_INT
+	 (not:VPAIR_INT
+	  (match_operand:VPAIR_INT 1 "vsx_register_operand" "wa"))
+	 (match_operand:VPAIR_INT 2 "vsx_register_operand" "wa")))]
+  "TARGET_VECTOR_PAIR"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_binary_vector_pair (<VPAIR_VECTOR>mode, operands,
+			    gen_orc<vpair_vector>3);
+  DONE;
+}
+  [(set_attr "length" "8")])
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index f4fc620b653..3d713f6b7f8 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -71,11 +71,17 @@
 (define_mode_iterator VI [V4SI V8HI V16QI])
 
 ;; Base type from vector mode
-(define_mode_attr VEC_base [(V16QI "QI")
+(define_mode_attr VEC_base [(V32QI "QI")
+			    (V16QI "QI")
+			    (V16HI "HI")
 			    (V8HI  "HI")
+			    (V8SI  "SI")
 			    (V4SI  "SI")
+			    (V4DI  "DI")
 			    (V2DI  "DI")
+			    (V8SF  "SF")
 			    (V4SF  "SF")
+			    (V4DF  "DF")
 			    (V2DF  "DF")
 			    (V1TI  "TI")
 			    (TI    "TI")])
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 9011a3f7e40..26ecab6f089 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -213,14 +213,20 @@
 
 ;; Mode attribute to give the correct predicate for ISA 3.0 vector extract and
 ;; insert to validate the operand number.
-(define_mode_attr VSX_EXTRACT_PREDICATE [(V16QI "const_0_to_15_operand")
+(define_mode_attr VSX_EXTRACT_PREDICATE [(V32QI "const_0_to_31_operand")
+					 (V16QI "const_0_to_15_operand")
+					 (V16HI "const_0_to_15_operand")
 					 (V8HI  "const_0_to_7_operand")
+					 (V8SI  "const_0_to_7_operand")
 					 (V4SI  "const_0_to_3_operand")])
 
 ;; Mode attribute to give the constraint for vector extract and insert
 ;; operations.
-(define_mode_attr VSX_EX [(V16QI "v")
+(define_mode_attr VSX_EX [(V32QI "v")
+			  (V16QI "v")
+			  (V16HI "v")
 			  (V8HI  "v")
+			  (V8SI  "wa")
 			  (V4SI  "wa")])
 
 ;; Mode iterator for binary floating types other than double to
@@ -259,6 +265,30 @@
 ;; and Vector Integer Multiply/Divide/Modulo Instructions
 (define_mode_iterator VIlong [V2DI V4SI])
 
+;; Iterator for extraction from vector pair modes with 64-bit elemenents
+(define_mode_iterator VPAIR_V4DI_V4DF [V4DI V4DF])
+
+;; Iterator for the small integer vector pair modes
+(define_mode_iterator VPAIR_SMALL_INT [V32QI V16HI V8SI])
+
+;; Map vector pair mode to vector mode in upper case after the vector pair is
+;; split to two vectors.
+(define_mode_attr VPAIR_VECTOR [(V32QI "V16QI")
+				(V16HI "V8HI")
+				(V8SI  "V4SI")
+				(V4DI  "V2DI")
+				(V8SF  "V4SF")
+				(V4DF  "V2DF")])
+
+;; Map vector pair mode to vector mode in lower case after the vector pair is
+;; split to two vectors.
+(define_mode_attr vpair_vector [(V32QI "v16qi")
+				(V16HI "v8hi")
+				(V8SI  "v4si")
+				(V4DI  "v2di")
+				(V8SF  "v4sf")
+				(V4DF  "v2df")])
+
 ;; Constants for creating unspecs
 (define_c_enum "unspec"
   [UNSPEC_VSX_CONCAT
@@ -3545,6 +3575,33 @@
 }
   [(set_attr "type" "fpload,load")])
 
+;; Exctract DF/DI from V4DF/V4DI, convert it into extract from V2DF/V2DI.
+(define_insn_and_split "vsx_extract_<mode>"
+  [(set (match_operand:<VEC_base> 0 "gpc_reg_operand" "=wa,r")
+	(vec_select:<VEC_base>
+	 (match_operand:VPAIR_V4DI_V4DF 1 "gpc_reg_operand" "wa,wa")
+	 (parallel
+	  [(match_operand:QI 2 "const_0_to_3_operand" "n,n")])))]
+  "TARGET_VECTOR_PAIR"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0)
+	(vec_select:<VEC_base>
+	 (match_dup 3)
+	 (parallel [(match_dup 4)])))]
+{
+  HOST_WIDE_INT element = INTVAL (operands[2]);
+  unsigned reg_num = reg_or_subregno (operands[1]);
+
+  if ((WORDS_BIG_ENDIAN && element >= 2)
+      || (!WORDS_BIG_ENDIAN && element < 2))
+    reg_num++;
+
+  operands[3] = gen_rtx_REG (<VPAIR_VECTOR>mode, reg_num);
+  operands[4] = GEN_INT (element & 1);
+}
+  [(set_attr "type" "mfvsr,vecperm")])
+
 ;; Extract a SF element from V4SF
 (define_insn_and_split "vsx_extract_v4sf"
   [(set (match_operand:SF 0 "vsx_register_operand" "=wa")
@@ -3632,6 +3689,35 @@
 }
   [(set_attr "type" "fpload,load")])
 
+;; Extract SF from V8SF, converting it into an extract from V4SF
+(define_insn_and_split "vsx_extract_v8sf"
+  [(set (match_operand:SF 0 "vsx_register_operand" "=wa")
+	(vec_select:SF
+	 (match_operand:V8SF 1 "vsx_register_operand" "wa")
+	 (parallel [(match_operand:QI 2 "const_0_to_7_operand" "n")])))
+   (clobber (match_scratch:V4SF 3 "=0"))]
+  "TARGET_VECTOR_PAIR"
+  "#"
+  "&& reload_completed"
+  [(parallel [(set (match_dup 0)
+		   (vec_select:SF
+		    (match_dup 4)
+		    (parallel [(match_dup 5)])))
+	      (clobber (match_dup 3))])]
+{
+  HOST_WIDE_INT element = INTVAL (operands[2]);
+  unsigned reg_num = reg_or_subregno (operands[1]);
+
+  if ((WORDS_BIG_ENDIAN && element >= 4)
+      || (!WORDS_BIG_ENDIAN && element < 4))
+    reg_num++;
+
+  operands[3] = gen_rtx_REG (V4SFmode, reg_num);
+  operands[4] = GEN_INT (element & 3);
+}
+  [(set_attr "length" "8")
+   (set_attr "type" "fp")])
+
 ;; Expand the builtin form of xxpermdi to canonical rtl.
 (define_expand "vsx_xxpermdi_<mode>"
   [(match_operand:VSX_L 0 "vsx_register_operand")
@@ -4074,6 +4160,36 @@
 }
   [(set_attr "type" "load")])
 
+;; Extract SI/HI/QI from V8SI/V16HI/V32QI, converting it into an extract from a
+;; single vector.
+(define_insn_and_split "vsx_extract_<mode>"
+  [(set (match_operand:<VEC_base> 0 "gpc_reg_operand" "=r,<VSX_EX>")
+	(vec_select:<VEC_base>
+	 (match_operand:VPAIR_SMALL_INT 1 "gpc_reg_operand" "v,<VSX_EX>")
+	 (parallel [(match_operand:QI 2 "<VSX_EXTRACT_PREDICATE>" "n,n")])))
+   (clobber (match_scratch:SI 3 "=r,X"))]
+  "TARGET_VECTOR_PAIR"
+  "#"
+  "&& reload_completed"
+  [(parallel [(set (match_dup 0)
+		   (vec_select:<VEC_base>
+		    (match_dup 4)
+		    (parallel [(match_dup 5)])))
+	      (match_dup 3)])]
+{
+  HOST_WIDE_INT element = INTVAL (operands[2]);
+  HOST_WIDE_INT nunits = GET_MODE_NUNITS (<VPAIR_VECTOR>mode);
+  unsigned reg_num = reg_or_subregno (operands[1]);
+
+  if ((WORDS_BIG_ENDIAN && element >= nunits)
+      || (!WORDS_BIG_ENDIAN && element < nunits))
+    reg_num++;
+
+  operands[4] = gen_rtx_REG (<VPAIR_VECTOR>mode, reg_num);
+  operands[5] = GEN_INT (element & (nunits - 1));
+}
+  [(set_attr "type" "vecsimple")])
+
 ;; ISA 3.1 extract
 (define_expand "vextractl<mode>"
   [(set (match_operand:V2DI 0 "altivec_register_operand")
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-size-32-1.c b/gcc/testsuite/gcc.target/powerpc/vector-size-32-1.c
new file mode 100644
index 00000000000..e18539875c4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-size-32-1.c
@@ -0,0 +1,85 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mvector-pair" } */
+
+/* Test whether the __attrbiute__((__vector_size(32))) generates paired vector
+   loads and stores with the -mvector-pair option.  This file tests 32-byte
+   vectors with 4 double elements.  */
+
+typedef double vectype_t __attribute__((__vector_size__(32)));
+
+void
+test_add (vectype_t *dest,
+	  vectype_t *a,
+	  vectype_t *b)
+{
+  /* 2 lxvp, 2 xvadddp, 1 stxvp.  */
+  *dest = *a + *b;
+}
+
+void
+test_sub (vectype_t *dest,
+	  vectype_t *a,
+	  vectype_t *b)
+{
+  /* 2 lxvp, 2 xvsubdp, 1 stxvp.  */
+  *dest = *a - *b;
+}
+
+void
+test_multiply (vectype_t *dest,
+	       vectype_t *a,
+	       vectype_t *b)
+{
+  /* 2 lxvp, 2 xvmuldp, 1 stxvp.  */
+  *dest = *a * *b;
+}
+
+void
+test_negate (vectype_t *dest,
+	     vectype_t *a,
+	     vectype_t *b)
+{
+  /* 2 lxvp, 2 xvnegdp, 1 stxvp.  */
+  *dest = - *a;
+}
+
+void
+test_fma (vectype_t *dest,
+	  vectype_t *a,
+	  vectype_t *b,
+	  vectype_t *c)
+{
+  /* 2 lxvp, 2 xvmadd{a,m}dp, 1 stxvp.  */
+  *dest = (*a * *b) + *c;
+}
+
+void
+test_fms (vectype_t *dest,
+	  vectype_t *a,
+	  vectype_t *b,
+	  vectype_t *c)
+{
+  /* 2 lxvp, 2 xvmsub{a,m}dp, 1 stxvp.  */
+  *dest = (*a * *b) - *c;
+}
+
+void
+test_nfma (vectype_t *dest,
+	   vectype_t *a,
+	   vectype_t *b,
+	   vectype_t *c)
+{
+  /* 2 lxvp, 2 xvnmadddp, 1 stxvp.  */
+  *dest = -((*a * *b) + *c);
+}
+
+void
+test_nfms (vectype_t *dest,
+	   vectype_t *a,
+	   vectype_t *b,
+	   vectype_t *c)
+{
+  /* 2 lxvp, 2 xvnmsubdp, 1 stxvp.  */
+  *dest = -((*a * *b) - *c);
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-size-32-2.c b/gcc/testsuite/gcc.target/powerpc/vector-size-32-2.c
new file mode 100644
index 00000000000..7093a2d8d61
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-size-32-2.c
@@ -0,0 +1,96 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mvector-pair" } */
+
+/* Test whether the __attrbiute__((__vector_size(32))) generates paired vector
+   loads and stores with the -mvector-pair option.  This file tests 32-byte
+   vectors with 8 float elements.  */
+
+typedef float vectype_t __attribute__((__vector_size__(32)));
+
+void
+test_add (vectype_t *dest,
+	  vectype_t *a,
+	  vectype_t *b)
+{
+  /* 2 lxvp, 2 xvaddsp, 1 stxvp.  */
+  *dest = *a + *b;
+}
+
+void
+test_sub (vectype_t *dest,
+	  vectype_t *a,
+	  vectype_t *b)
+{
+  /* 2 lxvp, 2 xvsubsp, 1 stxvp.  */
+  *dest = *a - *b;
+}
+
+void
+test_multiply (vectype_t *dest,
+	       vectype_t *a,
+	       vectype_t *b)
+{
+  /* 2 lxvp, 2 xvmulsp, 1 stxvp.  */
+  *dest = *a * *b;
+}
+
+void
+test_negate (vectype_t *dest,
+	     vectype_t *a,
+	     vectype_t *b)
+{
+  /* 2 lxvp, 2 xvnegsp, 1 stxvp.  */
+  *dest = - *a;
+}
+
+void
+test_fma (vectype_t *dest,
+	  vectype_t *a,
+	  vectype_t *b,
+	  vectype_t *c)
+{
+  /* 2 lxvp, 2 xvmadd{a,m}sp, 1 stxvp.  */
+  *dest = (*a * *b) + *c;
+}
+
+void
+test_fms (vectype_t *dest,
+	  vectype_t *a,
+	  vectype_t *b,
+	  vectype_t *c)
+{
+  /* 2 lxvp, 2 xvmsub{a,m}sp, 1 stxvp.  */
+  *dest = (*a * *b) - *c;
+}
+
+void
+test_nfma (vectype_t *dest,
+	   vectype_t *a,
+	   vectype_t *b,
+	   vectype_t *c)
+{
+  /* 2 lxvp, 2 xvnmaddsp, 1 stxvp.  */
+  *dest = -((*a * *b) + *c);
+}
+
+void
+test_nfms (vectype_t *dest,
+	   vectype_t *a,
+	   vectype_t *b,
+	   vectype_t *c)
+{
+  /* 2 lxvp, 2 xvnmsubsp, 1 stxvp.  */
+  *dest = -((*a * *b) - *c);
+}
+
+/* { dg-final { scan-assembler-times {\mlxvp\M}       19 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M}       8 } } */
+/* { dg-final { scan-assembler-times {\mxvaddsp\M}     2 } } */
+/* { dg-final { scan-assembler-times {\mxvmadd.sp\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mxvmsub.sp\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mxvmulsp\M}     2 } } */
+/* { dg-final { scan-assembler-times {\mxvnegsp\M}     2 } } */
+/* { dg-final { scan-assembler-times {\mxvnmadd.sp\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mxvnmsub.sp\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mxvsubsp\M}     2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-size-32-3.c b/gcc/testsuite/gcc.target/powerpc/vector-size-32-3.c
new file mode 100644
index 00000000000..f1e2a038ade
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-size-32-3.c
@@ -0,0 +1,137 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mvector-pair" } */
+
+/* Test whether the __attrbiute__((__vector_size(32))) generates paired vector
+   loads and stores with the -mvector-pair option.  This file tests 32-byte
+   vectors with 4 64-bit integer elements.  */
+
+typedef long long vectype_t __attribute__((__vector_size__(32)));
+
+void
+test_add (vectype_t *dest,
+	  vectype_t *a,
+	  vectype_t *b)
+{
+  /* 2 lxvp, 2 vaddudm, 1 stxvp.  */
+  *dest = *a + *b;
+}
+
+void
+test_sub (vectype_t *dest,
+	  vectype_t *a,
+	  vectype_t *b)
+{
+  /* 2 lxvp, 2 vsubudm, 1 stxvp.  */
+  *dest = *a - *b;
+}
+
+void
+test_negate (vectype_t *dest,
+	     vectype_t *a)
+{
+  /* 2 lxvp, 2 vnegd, 1 stxvp.  */
+  *dest = - *a;
+}
+
+void
+test_not (vectype_t *dest,
+	  vectype_t *a)
+{
+  /* 2 lxvp, 2 xxlnor, 1 stxvp.  */
+  *dest = ~ *a;
+}
+
+void
+test_and (vectype_t *dest,
+	  vectype_t *a,
+	  vectype_t *b)
+{
+  /* 2 lxvp, 2 xxland, 1 stxvp.  */
+  *dest = *a & *b;
+}
+
+void
+test_or (vectype_t *dest,
+	 vectype_t *a,
+	 vectype_t *b)
+{
+  /* 2 lxvp, 2 xxlor, 1 stxvp.  */
+  *dest = *a | *b;
+}
+
+void
+test_xor (vectype_t *dest,
+	  vectype_t *a,
+	  vectype_t *b)
+{
+  /* 2 lxvp, 2 xxlxor, 1 stxvp.  */
+  *dest = *a ^ *b;
+}
+
+void
+test_andc_1 (vectype_t *dest,
+	     vectype_t *a,
+	     vectype_t *b)
+{
+  /* 2 lxvp, 2 xxlandc, 1 stxvp.  */
+  *dest = (~ *a) & *b;
+}
+
+void
+test_andc_2 (vectype_t *dest,
+	     vectype_t *a,
+	     vectype_t *b)
+{
+  /* 2 lxvp, 2 xxlandc, 1 stxvp.  */
+  *dest = *a & (~ *b);
+}
+
+void
+test_orc_1 (vectype_t *dest,
+	    vectype_t *a,
+	    vectype_t *b)
+{
+  /* 2 lxvp, 2 xxlorc, 1 stxvp.  */
+  *dest = (~ *a) | *b;
+}
+
+void
+test_orc_2 (vectype_t *dest,
+	    vectype_t *a,
+	    vectype_t *b)
+{
+  /* 2 lxvp, 2 xxlorc, 1 stxvp.  */
+  *dest = *a | (~ *b);
+}
+
+void
+test_nand (vectype_t *dest,
+	   vectype_t *a,
+	   vectype_t *b)
+{
+  /* 2 lxvp, 2 xxlnand, 1 stxvp.  */
+  *dest = ~(*a & *b);
+}
+
+void
+test_nor (vectype_t *dest,
+	  vectype_t *a,
+	  vectype_t *b)
+{
+  /* 2 lxvp, 2 xxlnor, 1 stxvp.  */
+  *dest = ~(*a | *b);
+}
+
+/* { dg-final { scan-assembler-times {\mlxvp\M}    24 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M}   13 } } */
+/* { dg-final { scan-assembler-times {\mvaddudm\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mvnegd\M}    2 } } */
+/* { dg-final { scan-assembler-times {\mvsubudm\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mxxland\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mxxlandc\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mxxlnand\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mxxlnor\M}   4 } } */
+/* { dg-final { scan-assembler-times {\mxxlor\M}    2 } } */
+/* { dg-final { scan-assembler-times {\mxxlorc\M}   4 } } */
+/* { dg-final { scan-assembler-times {\mxxlxor\M}   2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-size-32-4.c b/gcc/testsuite/gcc.target/powerpc/vector-size-32-4.c
new file mode 100644
index 00000000000..146c052dd18
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-size-32-4.c
@@ -0,0 +1,137 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mvector-pair" } */
+
+/* Test whether the __attrbiute__((__vector_size(32))) generates paired vector
+   loads and stores with the -mvector-pair option.  This file tests 32-byte
+   vectors with 4 64-bit integer elements.  */
+
+typedef int vectype_t __attribute__((__vector_size__(32)));
+
+void
+test_add (vectype_t *dest,
+	  vectype_t *a,
+	  vectype_t *b)
+{
+  /* 2 lxvp, 2 vadduwm, 1 stxvp.  */
+  *dest = *a + *b;
+}
+
+void
+test_sub (vectype_t *dest,
+	  vectype_t *a,
+	  vectype_t *b)
+{
+  /* 2 lxvp, 2 vsubuwm, 1 stxvp.  */
+  *dest = *a - *b;
+}
+
+void
+test_negate (vectype_t *dest,
+	     vectype_t *a)
+{
+  /* 2 lxvp, 2 vnegw, 1 stxvp.  */
+  *dest = - *a;
+}
+
+void
+test_not (vectype_t *dest,
+	  vectype_t *a)
+{
+  /* 2 lxvp, 2 xxlnor, 1 stxvp.  */
+  *dest = ~ *a;
+}
+
+void
+test_and (vectype_t *dest,
+	  vectype_t *a,
+	  vectype_t *b)
+{
+  /* 2 lxvp, 2 xxland, 1 stxvp.  */
+  *dest = *a & *b;
+}
+
+void
+test_or (vectype_t *dest,
+	 vectype_t *a,
+	 vectype_t *b)
+{
+  /* 2 lxvp, 2 xxlor, 1 stxvp.  */
+  *dest = *a | *b;
+}
+
+void
+test_xor (vectype_t *dest,
+	  vectype_t *a,
+	  vectype_t *b)
+{
+  /* 2 lxvp, 2 xxlxor, 1 stxvp.  */
+  *dest = *a ^ *b;
+}
+
+void
+test_andc_1 (vectype_t *dest,
+	     vectype_t *a,
+	     vectype_t *b)
+{
+  /* 2 lxvp, 2 xxlandc, 1 stxvp.  */
+  *dest = (~ *a) & *b;
+}
+
+void
+test_andc_2 (vectype_t *dest,
+	     vectype_t *a,
+	     vectype_t *b)
+{
+  /* 2 lxvp, 2 xxlandc, 1 stxvp.  */
+  *dest = *a & (~ *b);
+}
+
+void
+test_orc_1 (vectype_t *dest,
+	    vectype_t *a,
+	    vectype_t *b)
+{
+  /* 2 lxvp, 2 xxlorc, 1 stxvp.  */
+  *dest = (~ *a) | *b;
+}
+
+void
+test_orc_2 (vectype_t *dest,
+	    vectype_t *a,
+	    vectype_t *b)
+{
+  /* 2 lxvp, 2 xxlorc, 1 stxvp.  */
+  *dest = *a | (~ *b);
+}
+
+void
+test_nand (vectype_t *dest,
+	   vectype_t *a,
+	   vectype_t *b)
+{
+  /* 2 lxvp, 2 xxlnand, 1 stxvp.  */
+  *dest = ~(*a & *b);
+}
+
+void
+test_nor (vectype_t *dest,
+	  vectype_t *a,
+	  vectype_t *b)
+{
+  /* 2 lxvp, 2 xxlnor, 1 stxvp.  */
+  *dest = ~(*a | *b);
+}
+
+/* { dg-final { scan-assembler-times {\mlxvp\M}    24 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M}   13 } } */
+/* { dg-final { scan-assembler-times {\mvadduwm\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mvnegw\M}    2 } } */
+/* { dg-final { scan-assembler-times {\mvsubuwm\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mxxland\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mxxlandc\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mxxlnand\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mxxlnor\M}   4 } } */
+/* { dg-final { scan-assembler-times {\mxxlor\M}    2 } } */
+/* { dg-final { scan-assembler-times {\mxxlorc\M}   4 } } */
+/* { dg-final { scan-assembler-times {\mxxlxor\M}   2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-size-32-5.c b/gcc/testsuite/gcc.target/powerpc/vector-size-32-5.c
new file mode 100644
index 00000000000..9d5583fc7d7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-size-32-5.c
@@ -0,0 +1,137 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mvector-pair" } */
+
+/* Test whether the __attrbiute__((__vector_size(32))) generates paired vector
+   loads and stores with the -mvector-pair option.  This file tests 32-byte
+   vectors with 4 64-bit integer elements.  */
+
+typedef short vectype_t __attribute__((__vector_size__(32)));
+
+void
+test_add (vectype_t *dest,
+	  vectype_t *a,
+	  vectype_t *b)
+{
+  /* 2 lxvp, 2 vadduhm, 1 stxvp.  */
+  *dest = *a + *b;
+}
+
+void
+test_sub (vectype_t *dest,
+	  vectype_t *a,
+	  vectype_t *b)
+{
+  /* 2 lxvp, 2 vsubuhm, 1 stxvp.  */
+  *dest = *a - *b;
+}
+
+void
+test_negate (vectype_t *dest,
+	     vectype_t *a)
+{
+  /* 2 lxvp, 1 xxspltib, 2 vsubuhm, 1 stxvp.  */
+  *dest = - *a;
+}
+
+void
+test_not (vectype_t *dest,
+	  vectype_t *a)
+{
+  /* 2 lxvp, 2 xxlnor, 1 stxvp.  */
+  *dest = ~ *a;
+}
+
+void
+test_and (vectype_t *dest,
+	  vectype_t *a,
+	  vectype_t *b)
+{
+  /* 2 lxvp, 2 xxland, 1 stxvp.  */
+  *dest = *a & *b;
+}
+
+void
+test_or (vectype_t *dest,
+	 vectype_t *a,
+	 vectype_t *b)
+{
+  /* 2 lxvp, 2 xxlor, 1 stxvp.  */
+  *dest = *a | *b;
+}
+
+void
+test_xor (vectype_t *dest,
+	  vectype_t *a,
+	  vectype_t *b)
+{
+  /* 2 lxvp, 2 xxlxor, 1 stxvp.  */
+  *dest = *a ^ *b;
+}
+
+void
+test_andc_1 (vectype_t *dest,
+	     vectype_t *a,
+	     vectype_t *b)
+{
+  /* 2 lxvp, 2 xxlandc, 1 stxvp.  */
+  *dest = (~ *a) & *b;
+}
+
+void
+test_andc_2 (vectype_t *dest,
+	     vectype_t *a,
+	     vectype_t *b)
+{
+  /* 2 lxvp, 2 xxlandc, 1 stxvp.  */
+  *dest = *a & (~ *b);
+}
+
+void
+test_orc_1 (vectype_t *dest,
+	    vectype_t *a,
+	    vectype_t *b)
+{
+  /* 2 lxvp, 2 xxlorc, 1 stxvp.  */
+  *dest = (~ *a) | *b;
+}
+
+void
+test_orc_2 (vectype_t *dest,
+	    vectype_t *a,
+	    vectype_t *b)
+{
+  /* 2 lxvp, 2 xxlorc, 1 stxvp.  */
+  *dest = *a | (~ *b);
+}
+
+void
+test_nand (vectype_t *dest,
+	   vectype_t *a,
+	   vectype_t *b)
+{
+  /* 2 lxvp, 2 xxlnand, 1 stxvp.  */
+  *dest = ~(*a & *b);
+}
+
+void
+test_nor (vectype_t *dest,
+	  vectype_t *a,
+	  vectype_t *b)
+{
+  /* 2 lxvp, 2 xxlnor, 1 stxvp.  */
+  *dest = ~(*a | *b);
+}
+
+/* { dg-final { scan-assembler-times {\mlxvp\M}     24 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M}    13 } } */
+/* { dg-final { scan-assembler-times {\mvadduhm\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mvsubuhm\M}   4 } } */
+/* { dg-final { scan-assembler-times {\mxxland\M}    2 } } */
+/* { dg-final { scan-assembler-times {\mxxlandc\M}   4 } } */
+/* { dg-final { scan-assembler-times {\mxxlnand\M}   2 } } */
+/* { dg-final { scan-assembler-times {\mxxlnor\M}    4 } } */
+/* { dg-final { scan-assembler-times {\mxxlor\M}     2 } } */
+/* { dg-final { scan-assembler-times {\mxxlorc\M}    4 } } */
+/* { dg-final { scan-assembler-times {\mxxlxor\M}    2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltib\M}  1 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vector-size-32-6.c b/gcc/testsuite/gcc.target/powerpc/vector-size-32-6.c
new file mode 100644
index 00000000000..ede85d91ac0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vector-size-32-6.c
@@ -0,0 +1,137 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mvector-pair" } */
+
+/* Test whether the __attrbiute__((__vector_size(32))) generates paired vector
+   loads and stores with the -mvector-pair option.  This file tests 32-byte
+   vectors with 4 64-bit integer elements.  */
+
+typedef unsigned char vectype_t __attribute__((__vector_size__(32)));
+
+void
+test_add (vectype_t *dest,
+	  vectype_t *a,
+	  vectype_t *b)
+{
+  /* 2 lxvp, 2 vaddubm, 1 stxvp.  */
+  *dest = *a + *b;
+}
+
+void
+test_sub (vectype_t *dest,
+	  vectype_t *a,
+	  vectype_t *b)
+{
+  /* 2 lxvp, 2 vsububm, 1 stxvp.  */
+  *dest = *a - *b;
+}
+
+void
+test_negate (vectype_t *dest,
+	     vectype_t *a)
+{
+  /* 2 lxvp, 1 xxspltib, 2 vsububm, 1 stxvp.  */
+  *dest = - *a;
+}
+
+void
+test_not (vectype_t *dest,
+	  vectype_t *a)
+{
+  /* 2 lxvp, 2 xxlnor, 1 stxvp.  */
+  *dest = ~ *a;
+}
+
+void
+test_and (vectype_t *dest,
+	  vectype_t *a,
+	  vectype_t *b)
+{
+  /* 2 lxvp, 2 xxland, 1 stxvp.  */
+  *dest = *a & *b;
+}
+
+void
+test_or (vectype_t *dest,
+	 vectype_t *a,
+	 vectype_t *b)
+{
+  /* 2 lxvp, 2 xxlor, 1 stxvp.  */
+  *dest = *a | *b;
+}
+
+void
+test_xor (vectype_t *dest,
+	  vectype_t *a,
+	  vectype_t *b)
+{
+  /* 2 lxvp, 2 xxlxor, 1 stxvp.  */
+  *dest = *a ^ *b;
+}
+
+void
+test_andc_1 (vectype_t *dest,
+	     vectype_t *a,
+	     vectype_t *b)
+{
+  /* 2 lxvp, 2 xxlandc, 1 stxvp.  */
+  *dest = (~ *a) & *b;
+}
+
+void
+test_andc_2 (vectype_t *dest,
+	     vectype_t *a,
+	     vectype_t *b)
+{
+  /* 2 lxvp, 2 xxlandc, 1 stxvp.  */
+  *dest = *a & (~ *b);
+}
+
+void
+test_orc_1 (vectype_t *dest,
+	    vectype_t *a,
+	    vectype_t *b)
+{
+  /* 2 lxvp, 2 xxlorc, 1 stxvp.  */
+  *dest = (~ *a) | *b;
+}
+
+void
+test_orc_2 (vectype_t *dest,
+	    vectype_t *a,
+	    vectype_t *b)
+{
+  /* 2 lxvp, 2 xxlorc, 1 stxvp.  */
+  *dest = *a | (~ *b);
+}
+
+void
+test_nand (vectype_t *dest,
+	   vectype_t *a,
+	   vectype_t *b)
+{
+  /* 2 lxvp, 2 xxlnand, 1 stxvp.  */
+  *dest = ~(*a & *b);
+}
+
+void
+test_nor (vectype_t *dest,
+	  vectype_t *a,
+	  vectype_t *b)
+{
+  /* 2 lxvp, 2 xxlnor, 1 stxvp.  */
+  *dest = ~(*a | *b);
+}
+
+/* { dg-final { scan-assembler-times {\mlxvp\M}      24 } } */
+/* { dg-final { scan-assembler-times {\mstxvp\M}     13 } } */
+/* { dg-final { scan-assembler-times {\mvaddubm\M}    2 } } */
+/* { dg-final { scan-assembler-times {\mvsububm\M}    4 } } */
+/* { dg-final { scan-assembler-times {\mxxland\M}     2 } } */
+/* { dg-final { scan-assembler-times {\mxxlandc\M}    4 } } */
+/* { dg-final { scan-assembler-times {\mxxlnand\M}    2 } } */
+/* { dg-final { scan-assembler-times {\mxxlnor\M}     4 } } */
+/* { dg-final { scan-assembler-times {\mxxlor\M}      2 } } */
+/* { dg-final { scan-assembler-times {\mxxlorc\M}     4 } } */
+/* { dg-final { scan-assembler-times {\mxxlxor\M}     2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltib\M}   1 } } */

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2023-09-21 18:55 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-21 18:55 [gcc(refs/users/meissner/heads/work134-vsize)] Add support for -mvector-pair Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).