[gcc(refs/users/meissner/heads/work071)] Generate XXSPLTIDP on power10.

public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed

* [gcc(refs/users/meissner/heads/work071)] Generate XXSPLTIDP on power10.
@ 2021-10-14 15:56 Michael Meissner
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Meissner @ 2021-10-14 15:56 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:45bdcaa1289387f51566141287ee6d0b210a4738

commit 45bdcaa1289387f51566141287ee6d0b210a4738
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Thu Oct 14 11:56:20 2021 -0400

    Generate XXSPLTIDP on power10.
    
    This patch implements XXSPLTIDP support for SF, DF, and DI scalar constants and
    V2DF and V2DI vector constants.  The XXSPLTIDP instruction is given a 32-bit
    immediate that is converted to a vector of two DFmode constants.  The immediate
    is in SFmode format, so only constants that fit as SFmode values can be loaded
    with XXSPLTIDP.
    
    I added a new constraint (eD) to match scalar and vector constants that can be
    loaded with the XXSPLTIDP instruction.
    
    I have added a temporary switch (-mxxspltidp) to control whether or not the
    XXSPLTIDP instruction is generated.
    
    I added 5 new tests to test loading up SF/DF/DI scalar and V2DI/V2DF vector
    constants.
    
    This patch updates the previous patch to take into account the comments
    from the patch review.   The main change is that this patch does is map
    each vector and scalar to provide all of bits and then match those bits to
    see if the XXSPLTIDP instruction can generate the bits necessary, even if
    the values in the vector aren't DFmode constants.
    
    Some framework is provided in this patch which will also be used in future
    patches adding LXVKQ and XXSPLTIW support (possibly XXSPLTI32DX).  This way for
    instance in easy_fp_constant and easy_vector_constant when we first check
    whether the constant can be generated by XXSPLTIDP, we don't have to build the
    128-bits of the vector for each successive test.
    
    While the PowerPC is currently limited to 128-bit vectors, I have written
    the code so it can be changed in the future if we ever have larger vection
    sizes.
    
    2021-10-14  Michael Meissner  <meissner@the-meissners.org>
    
    gcc/
    
            * config/rs6000/constraints.md (eD): New constraint.
            * config/rs6000/predicates.md (easy_fp_constant): Add support for
            generating XXSPLTIDP.
            (easy_vector_constant_64bit_element): New predicate.
            (easy_vector_constant): Add support for generating XXSPLTIDP.
            * config/rs6000/rs6000-protos.h (prefixed_xxsplti_p): New
            declaration.
            (VECTOR_CONST_*): New macros.
            (rs6000_vec_const): New structure to hold information about vector
            constants.
            (vec_const_to_bytes): New function.
            (vec_const_use_xxspltidp): New function.
            * config/rs6000/rs6000.c (output_vec_const_move): Add support for
            XXSPLTIDP.
            (prefixed_xxsplti_p): New function.
            (vec_const_integer): New helper function.
            (vec_const_floating_point): New helper function.
            (vec_const_use_xxspltidp): New function.
            (vec_const_to_bytes): New function.
            * config/rs6000/rs6000.md (prefixed attribute): Add support for
            insns that generate XXSPLTIDP.
            (movsf_hardfloat): Add support for XXSPLTIDP.
            (mov<mode>_hardfloat32, FMOVE64 iterator): Likewise.
            (mov<mode>_hardfloat64, FMOVE64 iterator): Likewise.
            (movdi_internal32): Likewise.
            (movdi_internal64): Likewise.
            * config/rs6000/rs6000.opt (-mxxspltidp): New debug option.
            * config/rs6000/vsx.md (vsx_mov<mode>_64bit): Add support for
            XXSPLTIDP.
            (vsx_mov<mode>_32bit): Likewise.
            (XXSPLTIDP): New mode iterator.
            (xxspltidp_<mode>_internal): New insn.
            (XXSPLTIDP splitters): New splitters for XXSPLTIDP.
            * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
            eD constraint.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/pr86731-fwrapv-longlong.c: Update insn
            regex for power10.
            * gcc.target/powerpc/vec-splat-constant-df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-di.c: New test.
            * gcc.target/powerpc/vec-splat-constant-sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2di.c: New test.

Diff:
---
 gcc/config/rs6000/constraints.md                   |   5 +
 gcc/config/rs6000/predicates.md                    |  36 ++
 gcc/config/rs6000/rs6000-protos.h                  |  22 ++
 gcc/config/rs6000/rs6000.c                         | 367 +++++++++++++++++++++
 gcc/config/rs6000/rs6000.md                        |  58 +++-
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           |  65 +++-
 gcc/doc/md.texi                                    |   3 +
 .../gcc.target/powerpc/pr86731-fwrapv-longlong.c   |   9 +-
 .../gcc.target/powerpc/vec-splat-constant-df.c     |  60 ++++
 .../gcc.target/powerpc/vec-splat-constant-di.c     |  70 ++++
 .../gcc.target/powerpc/vec-splat-constant-sf.c     |  60 ++++
 .../gcc.target/powerpc/vec-splat-constant-v2df.c   |  64 ++++
 .../gcc.target/powerpc/vec-splat-constant-v2di.c   |  50 +++
 14 files changed, 845 insertions(+), 28 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index c8cff1a3038..d26c8940104 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -208,6 +208,11 @@
   (and (match_code "const_int")
        (match_test "((- (unsigned HOST_WIDE_INT) ival) + 0x8000) < 0x10000")))
 
+;; A scalar or vector constant that can be loaded with the XXSPLTIDP instruction.
+(define_constraint "eD"
+  "A constant that can be loaded with the XXSPLTIDP instruction."
+  (match_operand 0 "easy_vector_constant_64bit_element"))
+
 ;; 34-bit signed integer constant
 (define_constraint "eI"
   "A signed 34-bit integer constant if prefixed instructions are supported."
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 956e42bc514..d4b50276bac 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -601,6 +601,15 @@
   if (TARGET_VSX && op == CONST0_RTX (mode))
     return 1;
 
+  /* Constants that can be generated with ISA 3.1 instructions are easy.  */
+  rs6000_vec_const vec_const;
+
+  if (TARGET_POWER10 && vec_const_to_bytes (op, mode, &vec_const))
+    {
+      if (vec_const_use_xxspltidp (&vec_const))
+	return true;
+    }
+
   /* Otherwise consider floating point constants hard, so that the
      constant gets pushed to memory during the early RTL phases.  This
      has the advantage that double precision constants that can be
@@ -609,6 +618,23 @@
    return 0;
 })
 
+;; Return 1 if the operand is a 64-bit vector constant that can be loaded via
+;; the XXSPLTIDP instruction, which takes a SFmode value and produces a
+;; V2DFmode or V2DI result.
+
+(define_predicate "easy_vector_constant_64bit_element"
+  (match_code "const_vector,vec_duplicate,const_int,const_double")
+{
+  rs6000_vec_const vec_const;
+
+  /* Can we generate the XXSPLTIDP instruction?  */
+  if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  return (vec_const_to_bytes (op, mode, &vec_const)
+	  && vec_const_use_xxspltidp (&vec_const));
+})
+
 ;; Return 1 if the operand is a constant that can loaded with a XXSPLTIB
 ;; instruction and then a VUPKHSB, VECSB2W or VECSB2D instruction.
 
@@ -657,6 +683,16 @@
           && xxspltib_constant_p (op, mode, &num_insns, &value))
 	return true;
 
+      /* See if the constant can be generated with the ISA 3.1
+         instructions.  */
+      rs6000_vec_const vec_const;
+
+      if (TARGET_POWER10 && vec_const_to_bytes (op, mode, &vec_const))
+	{
+	  if (vec_const_use_xxspltidp (&vec_const))
+	    return true;
+	}
+
       return easy_altivec_constant (op, mode);
     }
 
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 14f6b313105..df4ae364bfb 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -198,6 +198,7 @@ enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode);
 extern bool prefixed_load_p (rtx_insn *);
 extern bool prefixed_store_p (rtx_insn *);
 extern bool prefixed_paddi_p (rtx_insn *);
+extern bool prefixed_xxsplti_p (rtx_insn *);
 extern void rs6000_asm_output_opcode (FILE *);
 extern void output_pcrel_opt_reloc (rtx);
 extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int);
@@ -222,6 +223,27 @@ address_is_prefixed (rtx addr,
   return (iform == INSN_FORM_PREFIXED_NUMERIC
 	  || iform == INSN_FORM_PCREL_LOCAL);
 }
+
+/* Functions and data structures relating to 128-bit vector constants.  All
+   fields are kept in big endian order.  */
+#define VECTOR_CONST_BITS	128
+#define VECTOR_CONST_BYTES	(VECTOR_CONST_BITS / 8)
+#define VECTOR_CONST_16BIT	(VECTOR_CONST_BITS / 16)
+#define VECTOR_CONST_32BIT	(VECTOR_CONST_BITS / 32)
+#define VECTOR_CONST_64BIT	(VECTOR_CONST_BITS / 64)
+
+typedef struct {
+  /* Vector constant as various sized items.  */
+  unsigned HOST_WIDE_INT d_words[VECTOR_CONST_64BIT];
+  unsigned int words[VECTOR_CONST_32BIT];
+  unsigned short h_words[VECTOR_CONST_16BIT];
+  unsigned char bytes[VECTOR_CONST_BYTES];
+  machine_mode orig_mode;		/* Original mode.  */
+  unsigned int xxspltidp_immediate;	/* Immediate value for XXSPLTIDP.  */
+} rs6000_vec_const;
+
+extern bool vec_const_to_bytes (rtx, machine_mode, rs6000_vec_const *);
+extern bool vec_const_use_xxspltidp (rs6000_vec_const *);
 #endif /* RTX_CODE */
 
 #ifdef TREE_CODE
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index acba4d9f26c..3ec59ed2a5e 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6990,6 +6990,16 @@ output_vec_const_move (rtx *operands)
 	    gcc_unreachable ();
 	}
 
+      rs6000_vec_const vec_const;
+      if (TARGET_POWER10 && vec_const_to_bytes (vec, mode, &vec_const))
+	{
+	  if (vec_const_use_xxspltidp (&vec_const))
+	    {
+	      operands[2] = GEN_INT (vec_const.xxspltidp_immediate);
+	      return "xxspltidp %x0,%2";
+	    }
+	}
+
       if (TARGET_P9_VECTOR
 	  && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
 	{
@@ -26724,6 +26734,41 @@ prefixed_paddi_p (rtx_insn *insn)
   return (iform == INSN_FORM_PCREL_EXTERNAL || iform == INSN_FORM_PCREL_LOCAL);
 }
 
+/* Whether a permute type instruction is a prefixed XXSPLTI* instruction.
+   This is called from the prefixed attribute processing.  */
+
+bool
+prefixed_xxsplti_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  if (!set)
+    return false;
+
+  rtx dest = SET_DEST (set);
+  rtx src = SET_SRC (set);
+  machine_mode mode = GET_MODE (dest);
+
+  if (!REG_P (dest) && !SUBREG_P (dest))
+    return false;
+
+  if (GET_CODE (src) == UNSPEC)
+    {
+      int unspec = XINT (src, 1);
+      return (unspec == UNSPEC_XXSPLTIW
+	      || unspec == UNSPEC_XXSPLTIDP
+	      || unspec == UNSPEC_XXSPLTI32DX);
+    }
+
+  rs6000_vec_const vec_const;
+  if (vec_const_to_bytes (src, mode, &vec_const))
+    {
+      if (vec_const_use_xxspltidp (&vec_const))
+	return true;
+    }
+
+  return false;
+}
+
 /* Whether the next instruction needs a 'p' prefix issued before the
    instruction is printed out.  */
 static bool prepend_p_to_next_insn;
@@ -28587,6 +28632,328 @@ rs6000_output_addr_vec_elt (FILE *file, int value)
   fprintf (file, "\n");
 }
 
+\f
+/* Copy an integer constant to the vector constant structure.  */
+
+static void
+vec_const_integer (rtx op,
+		   machine_mode mode,
+		   size_t byte_num,
+		   rs6000_vec_const *vec_const)
+{
+  unsigned HOST_WIDE_INT uvalue = UINTVAL (op);
+  unsigned bitsize = GET_MODE_BITSIZE (mode);
+
+  for (int shift = bitsize - 8; shift >= 0; shift -= 8)
+    vec_const->bytes[byte_num++] = (uvalue >> shift) & 0xff;
+}
+
+/* Copy an floating point constant to the vector constant structure.  */
+
+static void
+vec_const_floating_point (rtx op,
+			  machine_mode mode,
+			  size_t byte_num,
+			  rs6000_vec_const *vec_const)
+{
+  unsigned bitsize = GET_MODE_BITSIZE (mode);
+  unsigned num_words = bitsize / 32;
+  const REAL_VALUE_TYPE *rtype = CONST_DOUBLE_REAL_VALUE (op);
+  long real_words[VECTOR_CONST_32BIT];
+
+  /* Make sure we don't overflow the real_words array and that it is
+     filled completely.  */
+  gcc_assert (bitsize <= VECTOR_CONST_BITS && (bitsize % 32) == 0);
+
+  real_to_target (real_words, rtype, mode);
+
+  /* Iterate over each 32-bit word in the floating point constant.  The
+     real_to_target function puts out words in endian fashion.  We need
+     to arrange so the words are written in big endian order.  */
+  for (unsigned num = 0; num < num_words; num++)
+    {
+      unsigned endian_num = (BYTES_BIG_ENDIAN
+			     ? num
+			     : num_words - 1 - num);
+
+      unsigned uvalue = real_words[endian_num];
+      for (int shift = 32 - 8; shift >= 0; shift -= 8)
+	vec_const->bytes[byte_num++] = (uvalue >> shift) & 0xff;
+    }
+}
+
+/* Determine if a vector constant can be loaded with XXSPLTIDP.  If so,
+   fill out the fields used to generate the instruction.  */
+
+bool
+vec_const_use_xxspltidp (rs6000_vec_const *vec_const)
+{
+  if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  /* Make sure that the two 64-bit segments are the same.  */
+  unsigned HOST_WIDE_INT df_upper = vec_const->d_words[0];
+  unsigned HOST_WIDE_INT df_lower = vec_const->d_words[1];
+  if (df_upper != df_lower)
+    return false;
+  
+  /* Avoid values that are easy to create with other instructions (0.0 for
+     floating point, and values that can be loaded with XXSPLTIB and sign
+     extension for integer.  */
+  if (df_upper == 0)
+    return false;
+
+  machine_mode mode = vec_const->orig_mode;
+  if (mode == VOIDmode)
+    mode = DImode;
+
+  if (!FLOAT_MODE_P (mode) && IN_RANGE (df_upper, -128, 127))
+    return false;
+
+  /* Avoid values that look like DFmode NaN's, except for the normal NaN bit
+     pattern and signalling NaN bit pattern.  Recognize infinity and negative
+     infinity.
+
+     The IEEE 754 64-bit floating format has 1 bit for sign, 11 bits for the
+     exponent, and 52 bits for the mantissa (not counting the hidden bit used
+     for normal numbers).  NaN values have the exponent set to all 1 bits, and
+     the mantissa non-zero (mantissa == 0 is infinity).  */
+
+  /* Bit representation of DFmode normal quiet NaN.  */
+#define VECTOR_CONST_DF_NAN	HOST_WIDE_INT_UC (0x7ff8000000000000)
+
+  /* Bit representation of DFmode normal signaling NaN.  */
+#define VECTOR_CONST_DF_NANS	HOST_WIDE_INT_UC (0x7ff4000000000000)
+
+  /* Bit representation of DFmode positive infinity.  */
+#define VECTOR_CONST_DF_INF	HOST_WIDE_INT_UC (0x7ff0000000000000)
+
+  /* Bit representation of DFmode negative infinity.  */
+#define VECTOR_CONST_DF_NEG_INF	HOST_WIDE_INT_UC (0xfff0000000000000)
+
+  if (df_upper != VECTOR_CONST_DF_NAN
+      && df_upper != VECTOR_CONST_DF_NANS
+      && df_upper != VECTOR_CONST_DF_INF
+      && df_upper != VECTOR_CONST_DF_NEG_INF)
+    {
+      int df_exponent = (df_upper >> 52) & 0x7ff;
+      unsigned HOST_WIDE_INT df_mantissa
+	= df_upper & ((HOST_WIDE_INT_1U << 52) - HOST_WIDE_INT_1U);
+
+      if (df_exponent == 0x7ff && df_mantissa != 0)	/* other NaNs.  */
+	return false;
+
+      /* Avoid values that are DFmode subnormal values.  Subnormal numbers have
+	 the exponent all 0 bits, and the mantissa non-zero.  If the value is
+	 subnormal, then the hidden bit in the mantissa is not set.  */
+      if (df_exponent == 0 && df_mantissa != 0)		/* subnormal.  */
+	return false;
+    }
+
+  /* Change the representation to DFmode constant.  */
+  long df_words[2] = { vec_const->words[0], vec_const->words[1] };
+
+  /* real_from_target takes the target words in  target order.  */
+  if (!BYTES_BIG_ENDIAN)
+    std::swap (df_words[0], df_words[1]);
+
+  REAL_VALUE_TYPE rv_type;
+  real_from_target (&rv_type, df_words, DFmode);
+
+  const REAL_VALUE_TYPE *rv = &rv_type;
+
+  /* Validate that the number can be stored as a SFmode value.  */
+  if (!exact_real_truncate (SFmode, rv))
+    return false;
+
+  /* Validate that the number is not a SFmode subnormal value (exponent is 0,
+     mantissa field is non-zero) which is undefined for the XXSPLTIDP
+     instruction.  */
+  long sf_value;
+  real_to_target (&sf_value, rv, SFmode);
+
+  /* IEEE 754 32-bit values have 1 bit for the sign, 8 bits for the exponent,
+     and 23 bits for the mantissa.  Subnormal numbers have the exponent all
+     0 bits, and the mantissa non-zero.  */
+  long sf_exponent = (sf_value >> 23) & 0xFF;
+  long sf_mantissa = sf_value & 0x7FFFFF;
+
+  if (sf_exponent == 0 && sf_mantissa != 0)
+    return false;
+
+  /* Record the information in the vec_const structure for XXSPLTIDP.  */
+  vec_const->xxspltidp_immediate = sf_value;
+
+  return true;
+}
+
+/* Convert a vector constant to an internal structure, breaking it out to
+   bytes, half words, words, and double words.  Return true if we have
+   successfully broken it out.  */
+
+bool
+vec_const_to_bytes (rtx op,
+		    machine_mode mode,
+		    rs6000_vec_const *vec_const)
+{
+  /* Initialize vec const structure.  */
+  memset ((void *)vec_const, 0, sizeof (rs6000_vec_const));
+
+  /* Set up the vector bits.  */
+  switch (GET_CODE (op))
+    {
+      /* Integer constants, default to double word.  */
+    case CONST_INT:
+      {
+	/* Scalars are treated as 64-bit integers.  */
+	if (mode == VOIDmode)
+	  mode = DImode;
+
+	vec_const_integer (op, mode, 0, vec_const);
+
+	/* Splat the constant to the rest of the vector constant structure.  */
+	unsigned size = GET_MODE_SIZE (mode);
+	gcc_assert (size <= VECTOR_CONST_BYTES);
+	gcc_assert ((VECTOR_CONST_BYTES % size) == 0);
+
+	for (size_t splat = size; splat < VECTOR_CONST_BYTES; splat += size)
+	  memcpy ((void *) &vec_const->bytes[splat],
+		  (void *) &vec_const->bytes[0],
+		  size);
+	break;
+      }
+
+      /* Floating point constants.  */
+    case CONST_DOUBLE:
+      {
+	/* Fail if the floating point constant is the wrong mode.  */
+	if (mode == VOIDmode)
+	  mode = GET_MODE (op);
+
+	else if (GET_MODE (op) != mode)
+	  return false;
+
+	/* SFmode stored as scalars are stored in DFmode format.  */
+	if (mode == SFmode)
+	  mode = DFmode;
+
+	vec_const_floating_point (op, mode, 0, vec_const);
+
+	/* Splat the constant to the rest of the vector constant structure.  */
+	unsigned size = GET_MODE_SIZE (mode);
+	gcc_assert (size <= VECTOR_CONST_BYTES);
+	gcc_assert ((VECTOR_CONST_BYTES % size) == 0);
+
+	for (size_t splat = size; splat < VECTOR_CONST_BYTES; splat += size)
+	  memcpy ((void *) &vec_const->bytes[splat],
+		  (void *) &vec_const->bytes[0],
+		  size);
+	break;
+      }
+
+      /* Vector constants, iterate each element.  On little endian systems, we
+	 have to reverse the element numbers.  */
+    case CONST_VECTOR:
+      {
+	/* Fail if the vector constant is the wrong mode.  */
+	if (mode == VOIDmode)
+	  mode = GET_MODE (op);
+
+	else if (GET_MODE (op) != mode)
+	  return false;
+
+	machine_mode ele_mode = GET_MODE_INNER (mode);
+	size_t nunits = GET_MODE_NUNITS (mode);
+	size_t size = GET_MODE_SIZE (ele_mode);
+
+	for (size_t num = 0; num < nunits; num++)
+	  {
+	    rtx ele = (GET_CODE (op) == VEC_DUPLICATE
+		       ? XEXP (op, 0)
+		       : CONST_VECTOR_ELT (op, num));
+	    size_t byte_num = (BYTES_BIG_ENDIAN
+			       ? num
+			       : nunits - 1 - num) * size;
+
+	    if (CONST_INT_P (ele))
+	      vec_const_integer (ele, ele_mode, byte_num, vec_const);
+	    else if (CONST_DOUBLE_P (ele))
+	      vec_const_floating_point (ele, ele_mode, byte_num, vec_const);
+	    else
+	      return false;
+	  }
+
+	break;
+      }
+
+	/* Treat VEC_DUPLICATE of a constant just like a vector constant.  */
+    case VEC_DUPLICATE:
+      {
+	/* Fail if the vector duplicate is the wrong mode.  */
+	if (mode == VOIDmode)
+	  mode = GET_MODE (op);
+
+	else if (GET_MODE (op) != mode)
+	  return false;
+
+	machine_mode ele_mode = GET_MODE_INNER (mode);
+	size_t nunits = GET_MODE_NUNITS (mode);
+	size_t size = GET_MODE_SIZE (ele_mode);
+	rtx ele = XEXP (op, 0);
+
+	if (!CONST_INT_P (ele) && !CONST_DOUBLE_P (ele))
+	  return false;
+
+	for (size_t num = 0; num < nunits; num++)
+	  {
+	    size_t byte_num = num * size;
+
+	    if (CONST_INT_P (ele))
+	      vec_const_integer (ele, ele_mode, byte_num, vec_const);
+	    else
+	      vec_const_floating_point (ele, ele_mode, byte_num, vec_const);
+	  }
+
+	break;
+      }
+
+      /* Any thing else, just return failure.  */
+    default:
+      return false;
+    }
+
+  /* Pack half words together.  */
+  for (size_t i = 0; i < VECTOR_CONST_16BIT; i++)
+    vec_const->h_words[i] = ((vec_const->bytes[2*i] << 8)
+			     | vec_const->bytes[2*i + 1]);
+
+  /* Pack words together.  */
+  for (size_t i = 0; i < VECTOR_CONST_32BIT; i++)
+    {
+      unsigned word = 0;
+      for (size_t j = 0; j < 4; j++)
+	word = (word << 8) | vec_const->bytes[(4*i) + j];
+
+      vec_const->words[i] = word;
+    }
+
+  /* Pack double words together.  */
+  for (size_t i = 0; i < VECTOR_CONST_64BIT; i++)
+    {
+      unsigned HOST_WIDE_INT d_word = 0;
+      for (size_t j = 0; j < 8; j++)
+	d_word = (d_word << 8) | vec_const->bytes[(8*i) + j];
+
+      vec_const->d_words[i] = d_word;
+    }
+
+  /* Remember original mode that the vector/scalar used.  */
+  vec_const->orig_mode = mode;
+
+  return true;
+}
+
+\f
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-rs6000.h"
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 6bec2bddbde..cf42b6d2058 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -314,6 +314,11 @@
 
 	 (eq_attr "type" "integer,add")
 	 (if_then_else (match_test "prefixed_paddi_p (insn)")
+		       (const_string "yes")
+		       (const_string "no"))
+
+	 (eq_attr "type" "vecperm")
+	 (if_then_else (match_test "prefixed_xxsplti_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))]
 
@@ -7759,17 +7764,17 @@
 ;;
 ;;	LWZ          LFS        LXSSP       LXSSPX     STFS       STXSSP
 ;;	STXSSPX      STW        XXLXOR      LI         FMR        XSCPSGNDP
-;;	MR           MT<x>      MF<x>       NOP
+;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP
 
 (define_insn "movsf_hardfloat"
   [(set (match_operand:SF 0 "nonimmediate_operand"
 	 "=!r,       f,         v,          wa,        m,         wY,
 	  Z,         m,         wa,         !r,        f,         wa,
-	  !r,        *c*l,      !r,         *h")
+	  !r,        *c*l,      !r,         *h,        wa")
 	(match_operand:SF 1 "input_operand"
 	 "m,         m,         wY,         Z,         f,         v,
 	  wa,        r,         j,          j,         f,         wa,
-	  r,         r,         *h,         0"))]
+	  r,         r,         *h,         0,         eD"))]
   "(register_operand (operands[0], SFmode)
    || register_operand (operands[1], SFmode))
    && TARGET_HARD_FLOAT
@@ -7791,15 +7796,16 @@
    mr %0,%1
    mt%0 %1
    mf%1 %0
-   nop"
+   nop
+   #"
   [(set_attr "type"
 	"load,       fpload,    fpload,     fpload,    fpstore,   fpstore,
 	 fpstore,    store,     veclogical, integer,   fpsimple,  fpsimple,
-	 *,          mtjmpr,    mfjmpr,     *")
+	 *,          mtjmpr,    mfjmpr,     *,         vecperm")
    (set_attr "isa"
 	"*,          *,         p9v,        p8v,       *,         p9v,
 	 p8v,        *,         *,          *,         *,         *,
-	 *,          *,         *,          *")])
+	 *,          *,         *,          *,         p10")])
 
 ;;	LWZ          LFIWZX     STW        STFIWX     MTVSRWZ    MFVSRWZ
 ;;	FMR          MR         MT%0       MF%1       NOP
@@ -8059,18 +8065,18 @@
 
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSD         STXSD       XXLOR       XXLXOR      GPR<-0
-;;           LWZ          STW         MR
+;;           LWZ          STW         MR          XXSPLTIDP
 
 
 (define_insn "*mov<mode>_hardfloat32"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
             "=m,          d,          d,          <f64_p9>,   wY,
               <f64_av>,   Z,          <f64_vsx>,  <f64_vsx>,  !r,
-              Y,          r,          !r")
+              Y,          r,          !r,         wa")
 	(match_operand:FMOVE64 1 "input_operand"
              "d,          m,          d,          wY,         <f64_p9>,
               Z,          <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
-              r,          Y,          r"))]
+              r,          Y,          r,          eD"))]
   "! TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8087,20 +8093,21 @@
    #
    #
    #
+   #
    #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, two,
-             store,       load,       two")
+             store,       load,       two,        vecperm")
    (set_attr "size" "64")
    (set_attr "length"
             "*,           *,          *,          *,          *,
              *,           *,          *,          *,          8,
-             8,           8,          8")
+             8,           8,          8,          *")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
-             *,           *,          *")])
+             *,           *,          *,          p10")])
 
 ;;           STW      LWZ     MR      G-const H-const F-const
 
@@ -8127,19 +8134,19 @@
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSDX        STXSDX      XXLOR       XXLXOR      LI 0
 ;;           STD          LD          MR          MT{CTR,LR}  MF{CTR,LR}
-;;           NOP          MFVSRD      MTVSRD
+;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP
 
 (define_insn "*mov<mode>_hardfloat64"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
            "=m,           d,          d,          <f64_p9>,   wY,
              <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
              YZ,          r,          !r,         *c*l,       !r,
-            *h,           r,          <f64_dm>")
+            *h,           r,          <f64_dm>,   wa")
 	(match_operand:FMOVE64 1 "input_operand"
             "d,           m,          d,          wY,         <f64_p9>,
              Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
              r,           YZ,         r,          r,          *h,
-             0,           <f64_dm>,   r"))]
+             0,           <f64_dm>,   r,          eD"))]
   "TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8161,18 +8168,19 @@
    mf%1 %0
    nop
    mfvsrd %0,%x1
-   mtvsrd %x0,%1"
+   mtvsrd %x0,%1
+   #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, integer,
              store,       load,       *,          mtjmpr,     mfjmpr,
-             *,           mfvsr,      mtvsr")
+             *,           mfvsr,      mtvsr,      vecperm")
    (set_attr "size" "64")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
              *,           *,          *,          *,          *,
-             *,           p8v,        p8v")])
+             *,           p8v,        p8v,        p10")])
 
 ;;           STD      LD       MR      MT<SPR> MF<SPR> G-const
 ;;           H-const  F-const  Special
@@ -9220,6 +9228,7 @@
 ;; a gpr into a fpr instead of reloading an invalid 'Y' address
 
 ;;        GPR store  GPR load   GPR move   FPR store  FPR load   FPR move
+;;        XXSPLTIDP
 ;;        GPR const  AVX store  AVX store  AVX load   AVX load   VSX move
 ;;        P9 0       P9 -1      AVX 0/-1   VSX 0      VSX -1     P9 const
 ;;        AVX const  
@@ -9227,11 +9236,13 @@
 (define_insn "*movdi_internal32"
   [(set (match_operand:DI 0 "nonimmediate_operand"
          "=Y,        r,         r,         m,         ^d,        ^d,
+          ^wa,
           r,         wY,        Z,         ^v,        $v,        ^wa,
           wa,        wa,        v,         wa,        *i,        v,
           v")
 	(match_operand:DI 1 "input_operand"
          "r,         Y,         r,         ^d,        m,         ^d,
+          eD,
           IJKnF,     ^v,        $v,        wY,        Z,         ^wa,
           Oj,        wM,        OjwM,      Oj,        wM,        wS,
           wB"))]
@@ -9246,6 +9257,7 @@
    lfd%U1%X1 %0,%1
    fmr %0,%1
    #
+   #
    stxsd %1,%0
    stxsdx %x1,%y0
    lxsd %0,%1
@@ -9260,17 +9272,20 @@
    #"
   [(set_attr "type"
          "store,     load,      *,         fpstore,   fpload,    fpsimple,
+          vecperm,
           *,         fpstore,   fpstore,   fpload,    fpload,    veclogical,
           vecsimple, vecsimple, vecsimple, veclogical,veclogical,vecsimple,
           vecsimple")
    (set_attr "size" "64")
    (set_attr "length"
          "8,         8,         8,         *,         *,         *,
+          *,
           16,        *,         *,         *,         *,         *,
           *,         *,         *,         *,         *,         8,
           *")
    (set_attr "isa"
          "*,         *,         *,         *,         *,         *,
+          p10,
           *,         p9v,       p7v,       p9v,       p7v,       *,
           p9v,       p9v,       p7v,       *,         *,         p7v,
           p7v")])
@@ -9306,6 +9321,7 @@
 })
 
 ;;	   GPR store   GPR load    GPR move
+;;	   XXSPLTIDP
 ;;	   GPR li      GPR lis     GPR pli     GPR #
 ;;	   FPR store   FPR load    FPR move
 ;;	   AVX store   AVX store   AVX load    AVX load    VSX move
@@ -9316,6 +9332,7 @@
 (define_insn "*movdi_internal64"
   [(set (match_operand:DI 0 "nonimmediate_operand"
 	  "=YZ,        r,          r,
+	   ^wa,
 	   r,          r,          r,          r,
 	   m,          ^d,         ^d,
 	   wY,         Z,          $v,         $v,         ^wa,
@@ -9325,6 +9342,7 @@
 	   ?r,         ?wa")
 	(match_operand:DI 1 "input_operand"
 	  "r,          YZ,         r,
+	   eD,
 	   I,          L,          eI,         nF,
 	   ^d,         m,          ^d,
 	   ^v,         $v,         wY,         Z,          ^wa,
@@ -9339,6 +9357,7 @@
    std%U0%X0 %1,%0
    ld%U1%X1 %0,%1
    mr %0,%1
+   #
    li %0,%1
    lis %0,%v1
    li %0,%1
@@ -9365,6 +9384,7 @@
    mtvsrd %x0,%1"
   [(set_attr "type"
 	  "store,      load,       *,
+	   vecperm,
 	   *,          *,          *,          *,
 	   fpstore,    fpload,     fpsimple,
 	   fpstore,    fpstore,    fpload,     fpload,     veclogical,
@@ -9375,6 +9395,7 @@
    (set_attr "size" "64")
    (set_attr "length"
 	  "*,          *,          *,
+	   *,
 	   *,          *,          *,          20,
 	   *,          *,          *,
 	   *,          *,          *,          *,          *,
@@ -9384,6 +9405,7 @@
 	   *,          *")
    (set_attr "isa"
 	  "*,          *,          *,
+	   p10,
 	   *,          *,          p10,        *,
 	   *,          *,          *,
 	   p9v,        p7v,        p9v,        p7v,        *,
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 9d7878f144a..1d7ce4cc94a 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -640,6 +640,10 @@ mprivileged
 Target Var(rs6000_privileged) Init(0)
 Generate code that will run in privileged state.
 
+mxxspltidp
+Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
+Generate (do not generate) XXSPLTIDP instructions.
+
 -param=rs6000-density-pct-threshold=
 Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) IntegerRange(0, 100) Param
 When costing for loop vectorization, we probably need to penalize the loop body
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index bf033e31c1c..6be3376f5d1 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1192,16 +1192,19 @@
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
+;;              XXLSPLTIDP
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        r,         we,        ?wQ,
                 ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
+                wa,
                 ?wa,       v,         <??r>,     wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        we,        r,         r,
                 wQ,        Y,         r,         r,         wE,        jwM,
+                wD,
                 ?jwM,      W,         <nW>,      v,         wZ"))]
 
   "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
@@ -1213,36 +1216,47 @@
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
                 store,     load,      store,     *,         vecsimple, vecsimple,
+                vecperm,
                 vecsimple, *,         *,         vecstore,  vecload")
    (set_attr "num_insns"
                "*,         *,         *,         2,         *,         2,
                 2,         2,         2,         2,         *,         *,
+                *,
                 *,         5,         2,         *,         *")
    (set_attr "max_prefixed_insns"
                "*,         *,         *,         *,         *,         2,
                 2,         2,         2,         2,         *,         *,
+                *,
                 *,         *,         *,         *,         *")
    (set_attr "length"
                "*,         *,         *,         8,         *,         8,
                 8,         8,         8,         8,         *,         *,
+                *,
                 *,         20,        8,         *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
                 *,         *,         *,         *,         p9v,       *,
+                p10,
                 <VSisa>,   *,         *,         *,         *")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
-;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
+;;              XXSPLTIB   VSPLTISW   VSX 0/-1
+;;              XXSPLTIDP
+;;              VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
-                wa,        v,         ?wa,       v,         <??r>,
+                wa,        v,         ?wa,
+                wa,
+                v,         <??r>,
                 wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        Y,         r,         r,
-                wE,        jwM,       ?jwM,      W,         <nW>,
+                wE,        jwM,       ?jwM,
+                eD,
+                W,         <nW>,
                 v,         wZ"))]
 
   "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
@@ -1253,15 +1267,21 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, load,      store,    *,
-                vecsimple, vecsimple, vecsimple, *,         *,
+                vecsimple, vecsimple, vecsimple,
+                vecperm,
+                *,         *,
                 vecstore,  vecload")
    (set_attr "length"
                "*,         *,         *,         16,        16,        16,
-                *,         *,         *,         20,        16,
+                *,         *,         *,
+                *,
+                20,        16,
                 *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
-                p9v,       *,         <VSisa>,   *,         *,
+                p9v,       *,         <VSisa>,
+                p10,
+                *,         *,
                 *,         *")])
 
 ;; Explicit  load/store expanders for the builtin functions
@@ -6458,6 +6478,39 @@
   [(set_attr "type" "vecperm")
    (set_attr "prefixed" "yes")])
 
+;; Generate the XXSPLTIDP instruction to support SFmode, DFmode, and DImode
+;; scalar constants and vector constants that look like DFmode floating point
+;; values where both elements are the same.  The constant has to be expressible
+;; as a SFmode constant that is not a SFmode denormal value.
+;;
+;; We don't need splitters for the 128-bit types, since the function
+;; rs6000_output_move_128bit handles the generation of XXSPLTIDP.
+(define_mode_iterator XXSPLTIDP [DI SF DF])
+
+(define_insn "*xxspltidp_<mode>_internal"
+  [(set (match_operand:XXSPLTIDP 0 "register_operand" "=wa")
+	(unspec:XXSPLTIDP [(match_operand:SI 1 "c32bit_cint_operand" "n")]
+			  UNSPEC_XXSPLTIDP))]
+  "TARGET_POWER10"
+  "xxspltidp %x0,%1"
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
+(define_split
+  [(set (match_operand:XXSPLTIDP 0 "vsx_register_operand")
+	(match_operand:XXSPLTIDP 1 "easy_vector_constant_64bit_element"))]
+  "TARGET_POWER10"
+  [(set (match_dup 0)
+	(unspec:XXSPLTIDP [(match_dup 2)] UNSPEC_XXSPLTIDP))]
+{
+  rs6000_vec_const vec_const;
+  if (!vec_const_to_bytes (operands[1], <MODE>mode, &vec_const)
+      || !vec_const_use_xxspltidp (&vec_const))
+    gcc_unreachable ();
+
+  operands[2] = GEN_INT (vec_const.xxspltidp_immediate);
+})
+
 ;; XXSPLTI32DX built-in function support
 (define_expand "xxsplti32dx_v4si"
   [(set (match_operand:V4SI 0 "register_operand" "=wa")
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 41f1850bf6e..b9dfcaf0d44 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3333,6 +3333,9 @@ The integer constant zero.
 A constant whose negation is a signed 16-bit constant.
 @end ifset
 
+@item eD
+A constant that can be loaded with the XXSPLTIDP instruction.
+
 @item eI
 A signed 34-bit integer constant if prefixed instructions are supported.
 
diff --git a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
index bd1502bb30a..dcb30e1d886 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
@@ -24,11 +24,12 @@ vector signed long long splats4(void)
         return (vector signed long long) vec_sl(mzero, mzero);
 }
 
-/* Codegen will consist of splat and shift instructions for most types.
-   If folding is enabled, the vec_sl tests using vector long long type will
-   generate a lvx instead of a vspltisw+vsld pair.  */
+/* Codegen will consist of splat and shift instructions for most types.  If
+   folding is enabled, the vec_sl tests using vector long long type will
+   generate a lvx instead of a vspltisw+vsld pair.  On power10, it will
+   generate a xxspltidp instruction instead of the lvx.  */
 
 /* { dg-final { scan-assembler-times {\mvspltis[bhw]\M} 0 } } */
 /* { dg-final { scan-assembler-times {\mvsl[bhwd]\M} 0 } } */
-/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M|\mxxspltidp\M} 2 } } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
new file mode 100644
index 00000000000..8f6e176f9af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+double
+scalar_double_0 (void)
+{
+  return 0.0;			/* XXSPLTIB or XXLXOR.  */
+}
+
+double
+scalar_double_1 (void)
+{
+  return 1.0;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+double
+scalar_double_m0 (void)
+{
+  return -0.0;			/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_nan (void)
+{
+  return __builtin_nan ("");	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_inf (void)
+{
+  return __builtin_inf ();	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inf ();
+}
+#endif
+
+double
+scalar_double_pi (void)
+{
+  return M_PI;			/* PLFD.  */
+}
+
+double
+scalar_double_denorm (void)
+{
+  return 0x1p-149f;		/* PLFD.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c
new file mode 100644
index 00000000000..75714d0b11d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c
@@ -0,0 +1,70 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test generating DImode constants that have the same bit pattern as DFmode
+   constants that can be loaded with the XXSPLTIDP instruction with the ISA 3.1
+   (power10).  We use asm to force the value into vector registers.  */
+
+double
+scalar_0 (void)
+{
+  /* XXSPLTIB or XXLXOR.  */
+  double d;
+  long long ll = 0;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+double
+scalar_1 (void)
+{
+  /* VSPLTISW/VUPKLSW or XXSPLTIB/VEXTSB2D.  */
+  double d;
+  long long ll = 1;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated
+   with XXSPLTIDP.  */
+double
+scalar_float_neg_0 (void)
+{
+  /* XXSPLTIDP.  */
+  double d;
+  long long ll = 0x8000000000000000LL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with
+   XXSPLTIDP.  */
+double
+scalar_float_1_0 (void)
+{
+  /* XXSPLTIDP.  */
+  double d;
+  long long ll = 0x3ff0000000000000LL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated
+   with XXSPLTIDP.  */
+double
+scalar_pi (void)
+{
+  /* PLXV.  */
+  double d;
+  long long ll = 0x400921fb54442d18LL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
new file mode 100644
index 00000000000..72504bdfbbd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating SFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+float
+scalar_float_0 (void)
+{
+  return 0.0f;			/* XXSPLTIB or XXLXOR.  */
+}
+
+float
+scalar_float_1 (void)
+{
+  return 1.0f;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+float
+scalar_float_m0 (void)
+{
+  return -0.0f;			/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_nan (void)
+{
+  return __builtin_nanf ("");	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_inf (void)
+{
+  return __builtin_inff ();	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inff ();
+}
+#endif
+
+float
+scalar_float_pi (void)
+{
+  return (float)M_PI;		/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_denorm (void)
+{
+  return 0x1p-149f;		/* PLFS.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
new file mode 100644
index 00000000000..82ffc86f8aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
@@ -0,0 +1,64 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating V2DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+vector double
+v2df_double_0 (void)
+{
+  return (vector double) { 0.0, 0.0 };			/* XXSPLTIB or XXLXOR.  */
+}
+
+vector double
+v2df_double_1 (void)
+{
+  return (vector double) { 1.0, 1.0 };			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+vector double
+v2df_double_m0 (void)
+{
+  return (vector double) { -0.0, -0.0 };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_nan (void)
+{
+  return (vector double) { __builtin_nan (""),
+			   __builtin_nan ("") };	/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_inf (void)
+{
+  return (vector double) { __builtin_inf (),
+			   __builtin_inf () };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_m_inf (void)
+{
+  return (vector double) { - __builtin_inf (),
+			   - __builtin_inf () };	/* XXSPLTIDP.  */
+}
+#endif
+
+vector double
+v2df_double_pi (void)
+{
+  return (vector double) { M_PI, M_PI };		/* PLVX.  */
+}
+
+vector double
+v2df_double_denorm (void)
+{
+  return (vector double) { (double)0x1p-149f,
+			   (double)0x1p-149f };		/* PLVX.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
new file mode 100644
index 00000000000..4d44f943d26
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test generating V2DImode constants that have the same bit pattern as
+   V2DFmode constants that can be loaded with the XXSPLTIDP instruction with
+   the ISA 3.1 (power10).  */
+
+vector long long
+vector_0 (void)
+{
+  /* XXSPLTIB or XXLXOR.  */
+  return (vector long long) { 0LL, 0LL };
+}
+
+vector long long
+vector_1 (void)
+{
+  /* XXSPLTIB and VEXTSB2D.  */
+  return (vector long long) { 1LL, 1LL };
+}
+
+/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated
+   with XXSPLTISDP.  */
+vector long long
+vector_float_neg_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x8000000000000000LL, 0x8000000000000000LL };
+}
+
+/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with
+   XXSPLTISDP.  */
+vector long long
+vector_float_1_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x3ff0000000000000LL, 0x3ff0000000000000LL };
+}
+
+/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated
+   with XXSPLTIDP.  */
+vector long long
+scalar_pi (void)
+{
+  /* PLXV.  */
+  return (vector long long) { 0x400921fb54442d18LL, 0x400921fb54442d18LL };
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [gcc(refs/users/meissner/heads/work071)] Generate XXSPLTIDP on power10.
@ 2021-10-21  2:33 Michael Meissner
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Meissner @ 2021-10-21  2:33 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:68023da085f00340d435111d31e024c38a6fc892

commit 68023da085f00340d435111d31e024c38a6fc892
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Wed Oct 20 22:32:41 2021 -0400

    Generate XXSPLTIDP on power10.
    
    This patch implements XXSPLTIDP support for SF, and DF scalar constants and all
    vector constants.  The XXSPLTIDP instruction is given a 32-bit immediate that
    is converted to a vector of two DFmode constants.  The immediate is in SFmode
    format, so only constants that fit as SFmode values can be loaded with
    XXSPLTIDP.
    
    I added one new constraint (eP) which matches instructions that load a VSX
    register with one prefixed instruction.  This patch adds the XXSPLTIDP
    support.  The next patch will add the XXSPLTIW support.
    
    This patch depends on the previous patch that addes the rs6000_const structure
    and the constant_to_bytes function.
    
    DImode scalar constants are not handled.  This is due to the majority of DImode
    constants will be in the GPR registers.  With vector registers, you have the
    problem that XXSPLTIDP splats the double word into both elements of the
    vector.  However, if TImode is loaded with an integer constant, it wants a full
    128-bit constant.
    
    I have added a temporary switch (-msplat-float-constant) to control whether or
    not the XXSPLTIDP instruction is generated.
    
    I added 4 new tests to test loading up SF/DF scalar and vector constants.
    
    This patch updates the previous patch to take into account the comments
    from the patch review.   The main change is that this patch does is map
    each vector and scalar to provide all of bits and then match those bits to
    see if the XXSPLTIDP instruction can generate the bits necessary, even if
    the values in the vector aren't DFmode constants.
    
    2021-10-20  Michael Meissner  <meissner@the-meissners.org>
    
    gcc/
    
            * config/rs6000/constraints.md (eP): New constraint.
            * config/rs6000/predicates.md (easy_fp_constant): Add support for
            generating XXSPLTIDP.
            (vsx_prefixed_constant): New predicate.
            (easy_vector_constant): Add support for generating XXSPLTIDP.
            * config/rs6000/rs6000-protos.h (prefixed_xxsplti_p): New
            declaration.
            (constant_generates_xxspltidp): New declaration.
            * config/rs6000/rs6000.c (prefixed_xxsplti_p): New function.
            (constant_generates_xxspltidp): New function.
            * config/rs6000/rs6000.md (UNSPEC_XXSPLTIDP_CONST): New unspec.
            (prefixed attribute): Add support for prefixed instructions to load
            constants into VSX registers.
            (movsf_hardfloat): Add support for XXSPLTIDP.
            (mov<mode>_hardfloat32, FMOVE64 iterator): Likewise.
            (mov<mode>_hardfloat64, FMOVE64 iterator): Likewise.
            (xxspltidp_<mode>_internal): New insns.
            (splitter for VSX prefix constants): New splitters.
            * config/rs6000/rs6000.opt (-msplat-float-constant): New debug option.
            * config/rs6000/vsx.md (vsx_mov<mode>_64bit): Add support for
            XXSPLTIDP.
            (vsx_mov<mode>_32bit): Likewise.
            * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
            eP constraint.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/pr86731-fwrapv-longlong.c: Update insn
            regex for power10.
            * gcc.target/powerpc/vec-splat-constant-df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2di.c: New test.
            * gcc.target/powerpc/vec-splati-runnable.c: Update insn counts.

Diff:
---
 gcc/config/rs6000/constraints.md                   |   6 +
 gcc/config/rs6000/predicates.md                    |  55 ++++++++
 gcc/config/rs6000/rs6000-protos.h                  |   2 +
 gcc/config/rs6000/rs6000.c                         | 153 +++++++++++++++++++++
 gcc/config/rs6000/rs6000.md                        |  85 +++++++++---
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           |  32 ++++-
 gcc/doc/md.texi                                    |   4 +
 .../gcc.target/powerpc/pr86731-fwrapv-longlong.c   |   9 +-
 .../gcc.target/powerpc/vec-splat-constant-df.c     |  60 ++++++++
 .../gcc.target/powerpc/vec-splat-constant-sf.c     |  60 ++++++++
 .../gcc.target/powerpc/vec-splat-constant-v2df.c   |  64 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-v2di.c   |  50 +++++++
 .../gcc.target/powerpc/vec-splati-runnable.c       |   2 +-
 14 files changed, 557 insertions(+), 29 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index c8cff1a3038..7d594872a78 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -213,6 +213,12 @@
   "A signed 34-bit integer constant if prefixed instructions are supported."
   (match_operand 0 "cint34_operand"))
 
+;; A SF/DF scalar constant or a vector constant that can be loaded into vector
+;; registers with one prefixed instruction such as XXSPLTIDP.
+(define_constraint "eP"
+  "A constant that can be loaded into a VSX register with one prefixed insn."
+  (match_operand 0 "vsx_prefixed_constant"))
+
 ;; Floating-point constraints.  These two are defined so that insn
 ;; length attributes can be calculated exactly.
 
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 956e42bc514..fefa420ed67 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -601,6 +601,15 @@
   if (TARGET_VSX && op == CONST0_RTX (mode))
     return 1;
 
+  /* Constants that can be generated with ISA 3.1 instructions are easy.  */
+  rs6000_const vsx_const;
+  if (TARGET_POWER10
+      && constant_to_bytes (op, mode, &vsx_const, RS6000_CONST_SPLAT_16_BYTES))
+    {
+      if (constant_generates_xxspltidp (&vsx_const))
+	return true;
+    }
+
   /* Otherwise consider floating point constants hard, so that the
      constant gets pushed to memory during the early RTL phases.  This
      has the advantage that double precision constants that can be
@@ -609,6 +618,42 @@
    return 0;
 })
 
+;; Return 1 if the operand is a 64-bit floating point scalar constant or a
+;; vector constant that can be loaded to a VSX register with one prefixed
+;; instruction, such as XXSPLTIDP.
+;;
+;; In addition regular constants, we also recognize constants formed with the
+;; VEC_DUPLICATE insn from scalar constants.
+;;
+;; We don't handle scalar integer constants here because the assumption is the
+;; normal integer constants will be loaded into GPR registers.  For the
+;; constants that need to be loaded into vector registers, the instructions
+;; don't work well with TImode variables assigned a constant.  This is because
+;; the 64-bit scalar constants are splatted into both halves of the register.
+
+(define_predicate "vsx_prefixed_constant"
+  (match_code "const_double,const_vector,vec_duplicate")
+{
+  /* If we can generate the constant with 1-2 Altivec instructions, don't
+      generate a prefixed instruction.  */
+  if (CONST_VECTOR_P (op) && easy_altivec_constant (op, mode))
+    return false;
+
+  /* Do we have prefixed instructions and are VSX registers available?  Is the
+     constant recognized?  */
+  if (!TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  rs6000_const vsx_const;
+  if (!constant_to_bytes (op, mode, &vsx_const, RS6000_CONST_SPLAT_16_BYTES))
+    return false;
+
+  if (constant_generates_xxspltidp (&vsx_const))
+    return true;
+
+  return false;
+})
+
 ;; Return 1 if the operand is a constant that can loaded with a XXSPLTIB
 ;; instruction and then a VUPKHSB, VECSB2W or VECSB2D instruction.
 
@@ -653,6 +698,16 @@
       if (zero_constant (op, mode) || all_ones_constant (op, mode))
 	return true;
 
+      /* Constants that can be generated with ISA 3.1 instructions are
+         easy.  */
+      rs6000_const vsx_const;
+      if (TARGET_POWER10
+	  && constant_to_bytes (op, mode, &vsx_const, RS6000_CONST_NO_SPLAT))
+	{
+	  if (constant_generates_xxspltidp (&vsx_const))
+	    return true;
+	}
+
       if (TARGET_P9_VECTOR
           && xxspltib_constant_p (op, mode, &num_insns, &value))
 	return true;
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index a3fecbb7812..ec4f78d9241 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -198,6 +198,7 @@ enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode);
 extern bool prefixed_load_p (rtx_insn *);
 extern bool prefixed_store_p (rtx_insn *);
 extern bool prefixed_paddi_p (rtx_insn *);
+extern bool prefixed_xxsplti_p (rtx_insn *);
 extern void rs6000_asm_output_opcode (FILE *);
 extern void output_pcrel_opt_reloc (rtx);
 extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int);
@@ -257,6 +258,7 @@ typedef struct {
 
 extern bool constant_to_bytes (rtx, machine_mode, rs6000_const *,
 			       rs6000_const_splat);
+extern unsigned constant_generates_xxspltidp (rs6000_const *);
 #endif /* RTX_CODE */
 
 #ifdef TREE_CODE
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index e2f48f5a1e2..b041db3c728 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6990,6 +6990,21 @@ output_vec_const_move (rtx *operands)
 	    gcc_unreachable ();
 	}
 
+      if (TARGET_PREFIXED)
+	{
+	  rs6000_const vsx_const;
+	  if (constant_to_bytes (vec, mode, &vsx_const,
+				 RS6000_CONST_SPLAT_16_BYTES))
+	    {
+	      unsigned imm = constant_generates_xxspltidp (&vsx_const);
+	      if (imm)
+		{
+		  operands[2] = GEN_INT (imm);
+		  return "xxspltidp %x0,%2";
+		}
+	    }
+	}
+
       if (TARGET_P9_VECTOR
 	  && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
 	{
@@ -26724,6 +26739,41 @@ prefixed_paddi_p (rtx_insn *insn)
   return (iform == INSN_FORM_PCREL_EXTERNAL || iform == INSN_FORM_PCREL_LOCAL);
 }
 
+/* Whether an instruction is a prefixed XXSPLTI* instruction.  This is called
+   from the prefixed attribute processing.  */
+
+bool
+prefixed_xxsplti_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  if (!set)
+    return false;
+
+  rtx dest = SET_DEST (set);
+  rtx src = SET_SRC (set);
+  machine_mode mode = GET_MODE (dest);
+
+  if (!REG_P (dest) && !SUBREG_P (dest))
+    return false;
+
+  if (GET_CODE (src) == UNSPEC)
+    {
+      int unspec = XINT (src, 1);
+      return (unspec == UNSPEC_XXSPLTIW
+	      || unspec == UNSPEC_XXSPLTIDP
+	      || unspec == UNSPEC_XXSPLTI32DX);
+    }
+
+  rs6000_const vsx_const;
+  if (constant_to_bytes (src, mode, &vsx_const, RS6000_CONST_SPLAT_16_BYTES))
+    {
+      if (constant_generates_xxspltidp (&vsx_const))
+	return true;
+    }
+
+  return false;
+}
+
 /* Whether the next instruction needs a 'p' prefix issued before the
    instruction is printed out.  */
 static bool prepend_p_to_next_insn;
@@ -28852,6 +28902,109 @@ constant_to_bytes (rtx op,
   return true;
 }
 
+/* Determine if a vector constant can be loaded with XXSPLTIDP.  Return zero if
+   the XXSPLTIDP instruction cannot be used.  Otherwise return the immediate
+   value to be used with the XXSPLTIDP instruction.  */
+
+unsigned
+constant_generates_xxspltidp (rs6000_const *vsx_const)
+{
+  if (!TARGET_SPLAT_FLOAT_CONSTANT || !TARGET_PREFIXED || !TARGET_VSX)
+    return 0;
+
+  /* Only recognize XXSPLTIDP for 16-byte vector constants (or 8-byte scalar
+     constants that have been splatted to 128-bits).  */
+  if (vsx_const->total_size != 16)
+    return 0;
+
+  /* Make sure that the two 64-bit segments are the same.  */
+  if (!vsx_const->all_double_words_same)
+    return 0;
+
+  /* If the bytes, half words, or words are all the same, don't use XXSPLTIDP.
+     Use a simpler instruction (XXSPLTIB, VSPLTISB, VSPLTISH, or VSPLTISW).  */
+  if (vsx_const->all_bytes_same
+      || vsx_const->all_half_words_same
+      || vsx_const->all_words_same)
+    return 0;
+
+  unsigned HOST_WIDE_INT value = vsx_const->double_words[0];
+
+  /* Avoid values that look like DFmode NaN's, except for the normal NaN bit
+     pattern and the signalling NaN bit pattern.  Recognize infinity and
+     negative infinity.  */
+
+  /* Bit representation of DFmode normal quiet NaN.  */
+#define RS6000_CONST_DF_NAN	HOST_WIDE_INT_UC (0x7ff8000000000000)
+
+  /* Bit representation of DFmode normal signaling NaN.  */
+#define RS6000_CONST_DF_NANS	HOST_WIDE_INT_UC (0x7ff4000000000000)
+
+  /* Bit representation of DFmode positive infinity.  */
+#define RS6000_CONST_DF_INF	HOST_WIDE_INT_UC (0x7ff0000000000000)
+
+  /* Bit representation of DFmode negative infinity.  */
+#define RS6000_CONST_DF_NEG_INF	HOST_WIDE_INT_UC (0xfff0000000000000)
+
+  if (value != RS6000_CONST_DF_NAN
+      && value != RS6000_CONST_DF_NANS
+      && value != RS6000_CONST_DF_INF
+      && value != RS6000_CONST_DF_NEG_INF)
+    {
+      /* The IEEE 754 64-bit floating format has 1 bit for sign, 11 bits for
+	 the exponent, and 52 bits for the mantissa (not counting the hidden
+	 bit used for normal numbers).  NaN values have the exponent set to all
+	 1 bits, and the mantissa non-zero (mantissa == 0 is infinity).  */
+
+      int df_exponent = (value >> 52) & 0x7ff;
+      unsigned HOST_WIDE_INT df_mantissa
+	= value & ((HOST_WIDE_INT_1U << 52) - HOST_WIDE_INT_1U);
+
+      if (df_exponent == 0x7ff && df_mantissa != 0)	/* other NaNs.  */
+	return 0;
+
+      /* Avoid values that are DFmode subnormal values.  Subnormal numbers have
+	 the exponent all 0 bits, and the mantissa non-zero.  If the value is
+	 subnormal, then the hidden bit in the mantissa is not set.  */
+      if (df_exponent == 0 && df_mantissa != 0)		/* subnormal.  */
+	return 0;
+    }
+
+  /* Change the representation to DFmode constant.  */
+  long df_words[2] = { vsx_const->words[0], vsx_const->words[1] };
+
+  /* real_from_target takes the target words in  target order.  */
+  if (!BYTES_BIG_ENDIAN)
+    std::swap (df_words[0], df_words[1]);
+
+  REAL_VALUE_TYPE rv_type;
+  real_from_target (&rv_type, df_words, DFmode);
+
+  const REAL_VALUE_TYPE *rv = &rv_type;
+
+  /* Validate that the number can be stored as a SFmode value.  */
+  if (!exact_real_truncate (SFmode, rv))
+    return 0;
+
+  /* Validate that the number is not a SFmode subnormal value (exponent is 0,
+     mantissa field is non-zero) which is undefined for the XXSPLTIDP
+     instruction.  */
+  long sf_value;
+  real_to_target (&sf_value, rv, SFmode);
+
+  /* IEEE 754 32-bit values have 1 bit for the sign, 8 bits for the exponent,
+     and 23 bits for the mantissa.  Subnormal numbers have the exponent all
+     0 bits, and the mantissa non-zero.  */
+  long sf_exponent = (sf_value >> 23) & 0xFF;
+  long sf_mantissa = sf_value & 0x7FFFFF;
+
+  if (sf_exponent == 0 && sf_mantissa != 0)
+    return 0;
+
+  /* Return the immediate to be used.  */
+  return sf_value;
+}
+
 \f
 struct gcc_target targetm = TARGET_INITIALIZER;
 
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 6bec2bddbde..2633ad9f815 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -156,6 +156,7 @@
    UNSPEC_PEXTD
    UNSPEC_HASHST
    UNSPEC_HASHCHK
+   UNSPEC_XXSPLTIDP_CONST
   ])
 
 ;;
@@ -314,6 +315,11 @@
 
 	 (eq_attr "type" "integer,add")
 	 (if_then_else (match_test "prefixed_paddi_p (insn)")
+		       (const_string "yes")
+		       (const_string "no"))
+
+	 (eq_attr "type" "vecperm")
+	 (if_then_else (match_test "prefixed_xxsplti_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))]
 
@@ -7759,17 +7765,17 @@
 ;;
 ;;	LWZ          LFS        LXSSP       LXSSPX     STFS       STXSSP
 ;;	STXSSPX      STW        XXLXOR      LI         FMR        XSCPSGNDP
-;;	MR           MT<x>      MF<x>       NOP
+;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP
 
 (define_insn "movsf_hardfloat"
   [(set (match_operand:SF 0 "nonimmediate_operand"
 	 "=!r,       f,         v,          wa,        m,         wY,
 	  Z,         m,         wa,         !r,        f,         wa,
-	  !r,        *c*l,      !r,         *h")
+	  !r,        *c*l,      !r,         *h,        wa")
 	(match_operand:SF 1 "input_operand"
 	 "m,         m,         wY,         Z,         f,         v,
 	  wa,        r,         j,          j,         f,         wa,
-	  r,         r,         *h,         0"))]
+	  r,         r,         *h,         0,         eP"))]
   "(register_operand (operands[0], SFmode)
    || register_operand (operands[1], SFmode))
    && TARGET_HARD_FLOAT
@@ -7791,15 +7797,16 @@
    mr %0,%1
    mt%0 %1
    mf%1 %0
-   nop"
+   nop
+   #"
   [(set_attr "type"
 	"load,       fpload,    fpload,     fpload,    fpstore,   fpstore,
 	 fpstore,    store,     veclogical, integer,   fpsimple,  fpsimple,
-	 *,          mtjmpr,    mfjmpr,     *")
+	 *,          mtjmpr,    mfjmpr,     *,         vecperm")
    (set_attr "isa"
 	"*,          *,         p9v,        p8v,       *,         p9v,
 	 p8v,        *,         *,          *,         *,         *,
-	 *,          *,         *,          *")])
+	 *,          *,         *,          *,         p10")])
 
 ;;	LWZ          LFIWZX     STW        STFIWX     MTVSRWZ    MFVSRWZ
 ;;	FMR          MR         MT%0       MF%1       NOP
@@ -8059,18 +8066,18 @@
 
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSD         STXSD       XXLOR       XXLXOR      GPR<-0
-;;           LWZ          STW         MR
+;;           LWZ          STW         MR          XXSPLTIDP
 
 
 (define_insn "*mov<mode>_hardfloat32"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
             "=m,          d,          d,          <f64_p9>,   wY,
               <f64_av>,   Z,          <f64_vsx>,  <f64_vsx>,  !r,
-              Y,          r,          !r")
+              Y,          r,          !r,         wa")
 	(match_operand:FMOVE64 1 "input_operand"
              "d,          m,          d,          wY,         <f64_p9>,
               Z,          <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
-              r,          Y,          r"))]
+              r,          Y,          r,          eP"))]
   "! TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8087,20 +8094,21 @@
    #
    #
    #
+   #
    #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, two,
-             store,       load,       two")
+             store,       load,       two,        vecperm")
    (set_attr "size" "64")
    (set_attr "length"
             "*,           *,          *,          *,          *,
              *,           *,          *,          *,          8,
-             8,           8,          8")
+             8,           8,          8,          *")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
-             *,           *,          *")])
+             *,           *,          *,          p10")])
 
 ;;           STW      LWZ     MR      G-const H-const F-const
 
@@ -8127,19 +8135,19 @@
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSDX        STXSDX      XXLOR       XXLXOR      LI 0
 ;;           STD          LD          MR          MT{CTR,LR}  MF{CTR,LR}
-;;           NOP          MFVSRD      MTVSRD
+;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP
 
 (define_insn "*mov<mode>_hardfloat64"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
            "=m,           d,          d,          <f64_p9>,   wY,
              <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
              YZ,          r,          !r,         *c*l,       !r,
-            *h,           r,          <f64_dm>")
+            *h,           r,          <f64_dm>,   wa")
 	(match_operand:FMOVE64 1 "input_operand"
             "d,           m,          d,          wY,         <f64_p9>,
              Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
              r,           YZ,         r,          r,          *h,
-             0,           <f64_dm>,   r"))]
+             0,           <f64_dm>,   r,          eP"))]
   "TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8161,18 +8169,19 @@
    mf%1 %0
    nop
    mfvsrd %0,%x1
-   mtvsrd %x0,%1"
+   mtvsrd %x0,%1
+   #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, integer,
              store,       load,       *,          mtjmpr,     mfjmpr,
-             *,           mfvsr,      mtvsr")
+             *,           mfvsr,      mtvsr,      vecperm")
    (set_attr "size" "64")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
              *,           *,          *,          *,          *,
-             *,           p8v,        p8v")])
+             *,           p8v,        p8v,        p10")])
 
 ;;           STD      LD       MR      MT<SPR> MF<SPR> G-const
 ;;           H-const  F-const  Special
@@ -8206,6 +8215,46 @@
    (set_attr "length"
             "*,       *,      *,      *,      *,      8,
              12,      16,     *")])
+
+;; Split the VSX prefixed instruction to support SFmode and DFmode scalar
+;; constants that look like DFmode floating point values where both elements
+;; are the same.  The constant has to be expressible as a SFmode constant that
+;; is not a SFmode denormal value.
+;;
+;; We don't need splitters for the 128-bit types, since the function
+;; rs6000_output_move_128bit handles the generation of XXSPLTIDP.
+(define_insn "xxspltidp_<mode>_internal"
+  [(set (match_operand:SFDF 0 "register_operand" "=wa")
+	(unspec:SFDF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
+		     UNSPEC_XXSPLTIDP_CONST))]
+  "TARGET_POWER10"
+  "xxspltidp %x0,%1"
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
+(define_split
+  [(set (match_operand:SFDF 0 "vsx_register_operand")
+	(match_operand:SFDF 1 "vsx_prefixed_constant"))]
+  "TARGET_POWER10"
+  [(pc)]
+{
+  rtx dest = operands[0];
+  rtx src = operands[1];
+  rs6000_const vsx_const;
+
+  if (!constant_to_bytes (src, <MODE>mode, &vsx_const, RS6000_CONST_SPLAT_16_BYTES))
+    gcc_unreachable ();
+
+  unsigned imm = constant_generates_xxspltidp (&vsx_const);
+  if (imm)
+    {
+      emit_insn (gen_xxspltidp_<mode>_internal (dest, GEN_INT (imm)));
+      DONE;
+    }
+
+  else
+    gcc_unreachable ();
+})
 \f
 (define_expand "mov<mode>"
   [(set (match_operand:FMOVE128 0 "general_operand")
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 9d7878f144a..429da57d19d 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -640,6 +640,10 @@ mprivileged
 Target Var(rs6000_privileged) Init(0)
 Generate code that will run in privileged state.
 
+msplat-float-constant
+Target Var(TARGET_SPLAT_FLOAT_CONSTANT) Init(1) Save
+Generate (do not generate) code that uses the XXSPLTIDP instruction.
+
 -param=rs6000-density-pct-threshold=
 Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) IntegerRange(0, 100) Param
 When costing for loop vectorization, we probably need to penalize the loop body
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index bf033e31c1c..0ceecc1975c 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1192,16 +1192,19 @@
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
+;;              XXLSPLTI*
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        r,         we,        ?wQ,
                 ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
+                wa,
                 ?wa,       v,         <??r>,     wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        we,        r,         r,
                 wQ,        Y,         r,         r,         wE,        jwM,
+                eP,
                 ?jwM,      W,         <nW>,      v,         wZ"))]
 
   "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
@@ -1213,36 +1216,47 @@
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
                 store,     load,      store,     *,         vecsimple, vecsimple,
+                vecperm,
                 vecsimple, *,         *,         vecstore,  vecload")
    (set_attr "num_insns"
                "*,         *,         *,         2,         *,         2,
                 2,         2,         2,         2,         *,         *,
+                *,
                 *,         5,         2,         *,         *")
    (set_attr "max_prefixed_insns"
                "*,         *,         *,         *,         *,         2,
                 2,         2,         2,         2,         *,         *,
+                *,
                 *,         *,         *,         *,         *")
    (set_attr "length"
                "*,         *,         *,         8,         *,         8,
                 8,         8,         8,         8,         *,         *,
+                *,
                 *,         20,        8,         *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
                 *,         *,         *,         *,         p9v,       *,
+                p10,
                 <VSisa>,   *,         *,         *,         *")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
-;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
+;;              XXSPLTIB   VSPLTISW   VSX 0/-1
+;;              XXSPLTI*
+;;              VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
-                wa,        v,         ?wa,       v,         <??r>,
+                wa,        v,         ?wa,
+                wa,
+                v,         <??r>,
                 wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        Y,         r,         r,
-                wE,        jwM,       ?jwM,      W,         <nW>,
+                wE,        jwM,       ?jwM,
+                eP,
+                W,         <nW>,
                 v,         wZ"))]
 
   "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
@@ -1253,15 +1267,21 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, load,      store,    *,
-                vecsimple, vecsimple, vecsimple, *,         *,
+                vecsimple, vecsimple, vecsimple,
+                vecperm,
+                *,         *,
                 vecstore,  vecload")
    (set_attr "length"
                "*,         *,         *,         16,        16,        16,
-                *,         *,         *,         20,        16,
+                *,         *,         *,
+                *,
+                20,        16,
                 *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
-                p9v,       *,         <VSisa>,   *,         *,
+                p9v,       *,         <VSisa>,
+                p10,
+                *,         *,
                 *,         *")])
 
 ;; Explicit  load/store expanders for the builtin functions
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 41f1850bf6e..13b56279565 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3336,6 +3336,10 @@ A constant whose negation is a signed 16-bit constant.
 @item eI
 A signed 34-bit integer constant if prefixed instructions are supported.
 
+@item eP
+A scalar floating point constant or a vector constant that can be
+loaded with one prefixed instruction to a VSX register.
+
 @ifset INTERNALS
 @item G
 A floating point constant that can be loaded into a register with one
diff --git a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
index bd1502bb30a..dcb30e1d886 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
@@ -24,11 +24,12 @@ vector signed long long splats4(void)
         return (vector signed long long) vec_sl(mzero, mzero);
 }
 
-/* Codegen will consist of splat and shift instructions for most types.
-   If folding is enabled, the vec_sl tests using vector long long type will
-   generate a lvx instead of a vspltisw+vsld pair.  */
+/* Codegen will consist of splat and shift instructions for most types.  If
+   folding is enabled, the vec_sl tests using vector long long type will
+   generate a lvx instead of a vspltisw+vsld pair.  On power10, it will
+   generate a xxspltidp instruction instead of the lvx.  */
 
 /* { dg-final { scan-assembler-times {\mvspltis[bhw]\M} 0 } } */
 /* { dg-final { scan-assembler-times {\mvsl[bhwd]\M} 0 } } */
-/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M|\mxxspltidp\M} 2 } } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
new file mode 100644
index 00000000000..8f6e176f9af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+double
+scalar_double_0 (void)
+{
+  return 0.0;			/* XXSPLTIB or XXLXOR.  */
+}
+
+double
+scalar_double_1 (void)
+{
+  return 1.0;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+double
+scalar_double_m0 (void)
+{
+  return -0.0;			/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_nan (void)
+{
+  return __builtin_nan ("");	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_inf (void)
+{
+  return __builtin_inf ();	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inf ();
+}
+#endif
+
+double
+scalar_double_pi (void)
+{
+  return M_PI;			/* PLFD.  */
+}
+
+double
+scalar_double_denorm (void)
+{
+  return 0x1p-149f;		/* PLFD.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
new file mode 100644
index 00000000000..72504bdfbbd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating SFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+float
+scalar_float_0 (void)
+{
+  return 0.0f;			/* XXSPLTIB or XXLXOR.  */
+}
+
+float
+scalar_float_1 (void)
+{
+  return 1.0f;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+float
+scalar_float_m0 (void)
+{
+  return -0.0f;			/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_nan (void)
+{
+  return __builtin_nanf ("");	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_inf (void)
+{
+  return __builtin_inff ();	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inff ();
+}
+#endif
+
+float
+scalar_float_pi (void)
+{
+  return (float)M_PI;		/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_denorm (void)
+{
+  return 0x1p-149f;		/* PLFS.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
new file mode 100644
index 00000000000..82ffc86f8aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
@@ -0,0 +1,64 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating V2DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+vector double
+v2df_double_0 (void)
+{
+  return (vector double) { 0.0, 0.0 };			/* XXSPLTIB or XXLXOR.  */
+}
+
+vector double
+v2df_double_1 (void)
+{
+  return (vector double) { 1.0, 1.0 };			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+vector double
+v2df_double_m0 (void)
+{
+  return (vector double) { -0.0, -0.0 };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_nan (void)
+{
+  return (vector double) { __builtin_nan (""),
+			   __builtin_nan ("") };	/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_inf (void)
+{
+  return (vector double) { __builtin_inf (),
+			   __builtin_inf () };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_m_inf (void)
+{
+  return (vector double) { - __builtin_inf (),
+			   - __builtin_inf () };	/* XXSPLTIDP.  */
+}
+#endif
+
+vector double
+v2df_double_pi (void)
+{
+  return (vector double) { M_PI, M_PI };		/* PLVX.  */
+}
+
+vector double
+v2df_double_denorm (void)
+{
+  return (vector double) { (double)0x1p-149f,
+			   (double)0x1p-149f };		/* PLVX.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
new file mode 100644
index 00000000000..4d44f943d26
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test generating V2DImode constants that have the same bit pattern as
+   V2DFmode constants that can be loaded with the XXSPLTIDP instruction with
+   the ISA 3.1 (power10).  */
+
+vector long long
+vector_0 (void)
+{
+  /* XXSPLTIB or XXLXOR.  */
+  return (vector long long) { 0LL, 0LL };
+}
+
+vector long long
+vector_1 (void)
+{
+  /* XXSPLTIB and VEXTSB2D.  */
+  return (vector long long) { 1LL, 1LL };
+}
+
+/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated
+   with XXSPLTISDP.  */
+vector long long
+vector_float_neg_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x8000000000000000LL, 0x8000000000000000LL };
+}
+
+/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with
+   XXSPLTISDP.  */
+vector long long
+vector_float_1_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x3ff0000000000000LL, 0x3ff0000000000000LL };
+}
+
+/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated
+   with XXSPLTIDP.  */
+vector long long
+scalar_pi (void)
+{
+  /* PLXV.  */
+  return (vector long long) { 0x400921fb54442d18LL, 0x400921fb54442d18LL };
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
index a135279b1d7..5f84930e1a7 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -150,7 +150,7 @@ main (int argc, char *argv [])
 }
 
 /* { dg-final { scan-assembler-times {\mxxspltiw\M} 2 } } */
-/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [gcc(refs/users/meissner/heads/work071)] Generate XXSPLTIDP on power10.
@ 2021-10-20 22:44 Michael Meissner
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Meissner @ 2021-10-20 22:44 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:995cded5432990faf46df1d5e1b2aab334a406ff

commit 995cded5432990faf46df1d5e1b2aab334a406ff
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Wed Oct 20 18:44:31 2021 -0400

    Generate XXSPLTIDP on power10.
    
    This patch implements XXSPLTIDP support for SF, and DF scalar constants and all
    vector constants.  The XXSPLTIDP instruction is given a 32-bit immediate that
    is converted to a vector of two DFmode constants.  The immediate is in SFmode
    format, so only constants that fit as SFmode values can be loaded with
    XXSPLTIDP.
    
    I added one new constraint (eP) which matches instructions that load a VSX
    register with one prefixed instruction.  This patch adds the XXSPLTIDP
    support.  The next patch will add the XXSPLTIW support.
    
    This patch depends on the previous patch that addes the rs6000_const structure
    and the constant_to_bytes function.
    
    DImode scalar constants are not handled.  This is due to the majority of DImode
    constants will be in the GPR registers.  With vector registers, you have the
    problem that XXSPLTIDP splats the double word into both elements of the
    vector.  However, if TImode is loaded with an integer constant, it wants a full
    128-bit constant.
    
    I have added a temporary switch (-msplat-float-constant) to control whether or
    not the XXSPLTIDP instruction is generated.
    
    I added 4 new tests to test loading up SF/DF scalar and vector constants.
    
    This patch updates the previous patch to take into account the comments
    from the patch review.   The main change is that this patch does is map
    each vector and scalar to provide all of bits and then match those bits to
    see if the XXSPLTIDP instruction can generate the bits necessary, even if
    the values in the vector aren't DFmode constants.
    
    2021-10-20  Michael Meissner  <meissner@the-meissners.org>
    
    gcc/
    
            * config/rs6000/constraints.md (eP): New constraint.
            * config/rs6000/predicates.md (easy_fp_constant): Add support for
            generating XXSPLTIDP.
            (vsx_prefixed_constant): New predicate.
            (easy_vector_constant): Add support for generating XXSPLTIDP.
            * config/rs6000/rs6000-protos.h (prefixed_xxsplti_p): New
            declaration.
            (constant_generates_xxspltidp): New declaration.
            * config/rs6000/rs6000.c (prefixed_xxsplti_p): New function.
            (constant_generates_xxspltidp): New function.
            * config/rs6000/rs6000.md (UNSPEC_XXSPLTIDP_CONST): New unspec.
            (prefixed attribute): Add support for prefixed instructions to load
            constants into VSX registers.
            (movsf_hardfloat): Add support for XXSPLTIDP.
            (mov<mode>_hardfloat32, FMOVE64 iterator): Likewise.
            (mov<mode>_hardfloat64, FMOVE64 iterator): Likewise.
            (xxspltidp_<mode>_internal): New insns.
            (splitter for VSX prefix constants): New splitters.
            * config/rs6000/rs6000.opt (-msplat-float-constant): New debug option.
            * config/rs6000/vsx.md (vsx_mov<mode>_64bit): Add support for
            XXSPLTIDP.
            (vsx_mov<mode>_32bit): Likewise.
            * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
            eP constraint.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/pr86731-fwrapv-longlong.c: Update insn
            regex for power10.
            * gcc.target/powerpc/vec-splat-constant-df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2di.c: New test.
            * gcc.target/powerpc/vec-splati-runnable.c: Update insn counts.

Diff:
---
 gcc/config/rs6000/constraints.md                   |   6 +
 gcc/config/rs6000/predicates.md                    |  55 ++++++++
 gcc/config/rs6000/rs6000-protos.h                  |   2 +
 gcc/config/rs6000/rs6000.c                         | 153 +++++++++++++++++++++
 gcc/config/rs6000/rs6000.md                        |  85 +++++++++---
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           |  32 ++++-
 .../gcc.target/powerpc/pr86731-fwrapv-longlong.c   |   9 +-
 .../gcc.target/powerpc/vec-splat-constant-df.c     |  60 ++++++++
 .../gcc.target/powerpc/vec-splat-constant-sf.c     |  60 ++++++++
 .../gcc.target/powerpc/vec-splat-constant-v2df.c   |  64 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-v2di.c   |  50 +++++++
 .../gcc.target/powerpc/vec-splati-runnable.c       |   2 +-
 13 files changed, 553 insertions(+), 29 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index c8cff1a3038..7d594872a78 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -213,6 +213,12 @@
   "A signed 34-bit integer constant if prefixed instructions are supported."
   (match_operand 0 "cint34_operand"))
 
+;; A SF/DF scalar constant or a vector constant that can be loaded into vector
+;; registers with one prefixed instruction such as XXSPLTIDP.
+(define_constraint "eP"
+  "A constant that can be loaded into a VSX register with one prefixed insn."
+  (match_operand 0 "vsx_prefixed_constant"))
+
 ;; Floating-point constraints.  These two are defined so that insn
 ;; length attributes can be calculated exactly.
 
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 956e42bc514..fefa420ed67 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -601,6 +601,15 @@
   if (TARGET_VSX && op == CONST0_RTX (mode))
     return 1;
 
+  /* Constants that can be generated with ISA 3.1 instructions are easy.  */
+  rs6000_const vsx_const;
+  if (TARGET_POWER10
+      && constant_to_bytes (op, mode, &vsx_const, RS6000_CONST_SPLAT_16_BYTES))
+    {
+      if (constant_generates_xxspltidp (&vsx_const))
+	return true;
+    }
+
   /* Otherwise consider floating point constants hard, so that the
      constant gets pushed to memory during the early RTL phases.  This
      has the advantage that double precision constants that can be
@@ -609,6 +618,42 @@
    return 0;
 })
 
+;; Return 1 if the operand is a 64-bit floating point scalar constant or a
+;; vector constant that can be loaded to a VSX register with one prefixed
+;; instruction, such as XXSPLTIDP.
+;;
+;; In addition regular constants, we also recognize constants formed with the
+;; VEC_DUPLICATE insn from scalar constants.
+;;
+;; We don't handle scalar integer constants here because the assumption is the
+;; normal integer constants will be loaded into GPR registers.  For the
+;; constants that need to be loaded into vector registers, the instructions
+;; don't work well with TImode variables assigned a constant.  This is because
+;; the 64-bit scalar constants are splatted into both halves of the register.
+
+(define_predicate "vsx_prefixed_constant"
+  (match_code "const_double,const_vector,vec_duplicate")
+{
+  /* If we can generate the constant with 1-2 Altivec instructions, don't
+      generate a prefixed instruction.  */
+  if (CONST_VECTOR_P (op) && easy_altivec_constant (op, mode))
+    return false;
+
+  /* Do we have prefixed instructions and are VSX registers available?  Is the
+     constant recognized?  */
+  if (!TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  rs6000_const vsx_const;
+  if (!constant_to_bytes (op, mode, &vsx_const, RS6000_CONST_SPLAT_16_BYTES))
+    return false;
+
+  if (constant_generates_xxspltidp (&vsx_const))
+    return true;
+
+  return false;
+})
+
 ;; Return 1 if the operand is a constant that can loaded with a XXSPLTIB
 ;; instruction and then a VUPKHSB, VECSB2W or VECSB2D instruction.
 
@@ -653,6 +698,16 @@
       if (zero_constant (op, mode) || all_ones_constant (op, mode))
 	return true;
 
+      /* Constants that can be generated with ISA 3.1 instructions are
+         easy.  */
+      rs6000_const vsx_const;
+      if (TARGET_POWER10
+	  && constant_to_bytes (op, mode, &vsx_const, RS6000_CONST_NO_SPLAT))
+	{
+	  if (constant_generates_xxspltidp (&vsx_const))
+	    return true;
+	}
+
       if (TARGET_P9_VECTOR
           && xxspltib_constant_p (op, mode, &num_insns, &value))
 	return true;
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index a3fecbb7812..ec4f78d9241 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -198,6 +198,7 @@ enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode);
 extern bool prefixed_load_p (rtx_insn *);
 extern bool prefixed_store_p (rtx_insn *);
 extern bool prefixed_paddi_p (rtx_insn *);
+extern bool prefixed_xxsplti_p (rtx_insn *);
 extern void rs6000_asm_output_opcode (FILE *);
 extern void output_pcrel_opt_reloc (rtx);
 extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int);
@@ -257,6 +258,7 @@ typedef struct {
 
 extern bool constant_to_bytes (rtx, machine_mode, rs6000_const *,
 			       rs6000_const_splat);
+extern unsigned constant_generates_xxspltidp (rs6000_const *);
 #endif /* RTX_CODE */
 
 #ifdef TREE_CODE
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index e2f48f5a1e2..b041db3c728 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6990,6 +6990,21 @@ output_vec_const_move (rtx *operands)
 	    gcc_unreachable ();
 	}
 
+      if (TARGET_PREFIXED)
+	{
+	  rs6000_const vsx_const;
+	  if (constant_to_bytes (vec, mode, &vsx_const,
+				 RS6000_CONST_SPLAT_16_BYTES))
+	    {
+	      unsigned imm = constant_generates_xxspltidp (&vsx_const);
+	      if (imm)
+		{
+		  operands[2] = GEN_INT (imm);
+		  return "xxspltidp %x0,%2";
+		}
+	    }
+	}
+
       if (TARGET_P9_VECTOR
 	  && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
 	{
@@ -26724,6 +26739,41 @@ prefixed_paddi_p (rtx_insn *insn)
   return (iform == INSN_FORM_PCREL_EXTERNAL || iform == INSN_FORM_PCREL_LOCAL);
 }
 
+/* Whether an instruction is a prefixed XXSPLTI* instruction.  This is called
+   from the prefixed attribute processing.  */
+
+bool
+prefixed_xxsplti_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  if (!set)
+    return false;
+
+  rtx dest = SET_DEST (set);
+  rtx src = SET_SRC (set);
+  machine_mode mode = GET_MODE (dest);
+
+  if (!REG_P (dest) && !SUBREG_P (dest))
+    return false;
+
+  if (GET_CODE (src) == UNSPEC)
+    {
+      int unspec = XINT (src, 1);
+      return (unspec == UNSPEC_XXSPLTIW
+	      || unspec == UNSPEC_XXSPLTIDP
+	      || unspec == UNSPEC_XXSPLTI32DX);
+    }
+
+  rs6000_const vsx_const;
+  if (constant_to_bytes (src, mode, &vsx_const, RS6000_CONST_SPLAT_16_BYTES))
+    {
+      if (constant_generates_xxspltidp (&vsx_const))
+	return true;
+    }
+
+  return false;
+}
+
 /* Whether the next instruction needs a 'p' prefix issued before the
    instruction is printed out.  */
 static bool prepend_p_to_next_insn;
@@ -28852,6 +28902,109 @@ constant_to_bytes (rtx op,
   return true;
 }
 
+/* Determine if a vector constant can be loaded with XXSPLTIDP.  Return zero if
+   the XXSPLTIDP instruction cannot be used.  Otherwise return the immediate
+   value to be used with the XXSPLTIDP instruction.  */
+
+unsigned
+constant_generates_xxspltidp (rs6000_const *vsx_const)
+{
+  if (!TARGET_SPLAT_FLOAT_CONSTANT || !TARGET_PREFIXED || !TARGET_VSX)
+    return 0;
+
+  /* Only recognize XXSPLTIDP for 16-byte vector constants (or 8-byte scalar
+     constants that have been splatted to 128-bits).  */
+  if (vsx_const->total_size != 16)
+    return 0;
+
+  /* Make sure that the two 64-bit segments are the same.  */
+  if (!vsx_const->all_double_words_same)
+    return 0;
+
+  /* If the bytes, half words, or words are all the same, don't use XXSPLTIDP.
+     Use a simpler instruction (XXSPLTIB, VSPLTISB, VSPLTISH, or VSPLTISW).  */
+  if (vsx_const->all_bytes_same
+      || vsx_const->all_half_words_same
+      || vsx_const->all_words_same)
+    return 0;
+
+  unsigned HOST_WIDE_INT value = vsx_const->double_words[0];
+
+  /* Avoid values that look like DFmode NaN's, except for the normal NaN bit
+     pattern and the signalling NaN bit pattern.  Recognize infinity and
+     negative infinity.  */
+
+  /* Bit representation of DFmode normal quiet NaN.  */
+#define RS6000_CONST_DF_NAN	HOST_WIDE_INT_UC (0x7ff8000000000000)
+
+  /* Bit representation of DFmode normal signaling NaN.  */
+#define RS6000_CONST_DF_NANS	HOST_WIDE_INT_UC (0x7ff4000000000000)
+
+  /* Bit representation of DFmode positive infinity.  */
+#define RS6000_CONST_DF_INF	HOST_WIDE_INT_UC (0x7ff0000000000000)
+
+  /* Bit representation of DFmode negative infinity.  */
+#define RS6000_CONST_DF_NEG_INF	HOST_WIDE_INT_UC (0xfff0000000000000)
+
+  if (value != RS6000_CONST_DF_NAN
+      && value != RS6000_CONST_DF_NANS
+      && value != RS6000_CONST_DF_INF
+      && value != RS6000_CONST_DF_NEG_INF)
+    {
+      /* The IEEE 754 64-bit floating format has 1 bit for sign, 11 bits for
+	 the exponent, and 52 bits for the mantissa (not counting the hidden
+	 bit used for normal numbers).  NaN values have the exponent set to all
+	 1 bits, and the mantissa non-zero (mantissa == 0 is infinity).  */
+
+      int df_exponent = (value >> 52) & 0x7ff;
+      unsigned HOST_WIDE_INT df_mantissa
+	= value & ((HOST_WIDE_INT_1U << 52) - HOST_WIDE_INT_1U);
+
+      if (df_exponent == 0x7ff && df_mantissa != 0)	/* other NaNs.  */
+	return 0;
+
+      /* Avoid values that are DFmode subnormal values.  Subnormal numbers have
+	 the exponent all 0 bits, and the mantissa non-zero.  If the value is
+	 subnormal, then the hidden bit in the mantissa is not set.  */
+      if (df_exponent == 0 && df_mantissa != 0)		/* subnormal.  */
+	return 0;
+    }
+
+  /* Change the representation to DFmode constant.  */
+  long df_words[2] = { vsx_const->words[0], vsx_const->words[1] };
+
+  /* real_from_target takes the target words in  target order.  */
+  if (!BYTES_BIG_ENDIAN)
+    std::swap (df_words[0], df_words[1]);
+
+  REAL_VALUE_TYPE rv_type;
+  real_from_target (&rv_type, df_words, DFmode);
+
+  const REAL_VALUE_TYPE *rv = &rv_type;
+
+  /* Validate that the number can be stored as a SFmode value.  */
+  if (!exact_real_truncate (SFmode, rv))
+    return 0;
+
+  /* Validate that the number is not a SFmode subnormal value (exponent is 0,
+     mantissa field is non-zero) which is undefined for the XXSPLTIDP
+     instruction.  */
+  long sf_value;
+  real_to_target (&sf_value, rv, SFmode);
+
+  /* IEEE 754 32-bit values have 1 bit for the sign, 8 bits for the exponent,
+     and 23 bits for the mantissa.  Subnormal numbers have the exponent all
+     0 bits, and the mantissa non-zero.  */
+  long sf_exponent = (sf_value >> 23) & 0xFF;
+  long sf_mantissa = sf_value & 0x7FFFFF;
+
+  if (sf_exponent == 0 && sf_mantissa != 0)
+    return 0;
+
+  /* Return the immediate to be used.  */
+  return sf_value;
+}
+
 \f
 struct gcc_target targetm = TARGET_INITIALIZER;
 
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 6bec2bddbde..2633ad9f815 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -156,6 +156,7 @@
    UNSPEC_PEXTD
    UNSPEC_HASHST
    UNSPEC_HASHCHK
+   UNSPEC_XXSPLTIDP_CONST
   ])
 
 ;;
@@ -314,6 +315,11 @@
 
 	 (eq_attr "type" "integer,add")
 	 (if_then_else (match_test "prefixed_paddi_p (insn)")
+		       (const_string "yes")
+		       (const_string "no"))
+
+	 (eq_attr "type" "vecperm")
+	 (if_then_else (match_test "prefixed_xxsplti_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))]
 
@@ -7759,17 +7765,17 @@
 ;;
 ;;	LWZ          LFS        LXSSP       LXSSPX     STFS       STXSSP
 ;;	STXSSPX      STW        XXLXOR      LI         FMR        XSCPSGNDP
-;;	MR           MT<x>      MF<x>       NOP
+;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP
 
 (define_insn "movsf_hardfloat"
   [(set (match_operand:SF 0 "nonimmediate_operand"
 	 "=!r,       f,         v,          wa,        m,         wY,
 	  Z,         m,         wa,         !r,        f,         wa,
-	  !r,        *c*l,      !r,         *h")
+	  !r,        *c*l,      !r,         *h,        wa")
 	(match_operand:SF 1 "input_operand"
 	 "m,         m,         wY,         Z,         f,         v,
 	  wa,        r,         j,          j,         f,         wa,
-	  r,         r,         *h,         0"))]
+	  r,         r,         *h,         0,         eP"))]
   "(register_operand (operands[0], SFmode)
    || register_operand (operands[1], SFmode))
    && TARGET_HARD_FLOAT
@@ -7791,15 +7797,16 @@
    mr %0,%1
    mt%0 %1
    mf%1 %0
-   nop"
+   nop
+   #"
   [(set_attr "type"
 	"load,       fpload,    fpload,     fpload,    fpstore,   fpstore,
 	 fpstore,    store,     veclogical, integer,   fpsimple,  fpsimple,
-	 *,          mtjmpr,    mfjmpr,     *")
+	 *,          mtjmpr,    mfjmpr,     *,         vecperm")
    (set_attr "isa"
 	"*,          *,         p9v,        p8v,       *,         p9v,
 	 p8v,        *,         *,          *,         *,         *,
-	 *,          *,         *,          *")])
+	 *,          *,         *,          *,         p10")])
 
 ;;	LWZ          LFIWZX     STW        STFIWX     MTVSRWZ    MFVSRWZ
 ;;	FMR          MR         MT%0       MF%1       NOP
@@ -8059,18 +8066,18 @@
 
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSD         STXSD       XXLOR       XXLXOR      GPR<-0
-;;           LWZ          STW         MR
+;;           LWZ          STW         MR          XXSPLTIDP
 
 
 (define_insn "*mov<mode>_hardfloat32"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
             "=m,          d,          d,          <f64_p9>,   wY,
               <f64_av>,   Z,          <f64_vsx>,  <f64_vsx>,  !r,
-              Y,          r,          !r")
+              Y,          r,          !r,         wa")
 	(match_operand:FMOVE64 1 "input_operand"
              "d,          m,          d,          wY,         <f64_p9>,
               Z,          <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
-              r,          Y,          r"))]
+              r,          Y,          r,          eP"))]
   "! TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8087,20 +8094,21 @@
    #
    #
    #
+   #
    #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, two,
-             store,       load,       two")
+             store,       load,       two,        vecperm")
    (set_attr "size" "64")
    (set_attr "length"
             "*,           *,          *,          *,          *,
              *,           *,          *,          *,          8,
-             8,           8,          8")
+             8,           8,          8,          *")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
-             *,           *,          *")])
+             *,           *,          *,          p10")])
 
 ;;           STW      LWZ     MR      G-const H-const F-const
 
@@ -8127,19 +8135,19 @@
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSDX        STXSDX      XXLOR       XXLXOR      LI 0
 ;;           STD          LD          MR          MT{CTR,LR}  MF{CTR,LR}
-;;           NOP          MFVSRD      MTVSRD
+;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP
 
 (define_insn "*mov<mode>_hardfloat64"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
            "=m,           d,          d,          <f64_p9>,   wY,
              <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
              YZ,          r,          !r,         *c*l,       !r,
-            *h,           r,          <f64_dm>")
+            *h,           r,          <f64_dm>,   wa")
 	(match_operand:FMOVE64 1 "input_operand"
             "d,           m,          d,          wY,         <f64_p9>,
              Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
              r,           YZ,         r,          r,          *h,
-             0,           <f64_dm>,   r"))]
+             0,           <f64_dm>,   r,          eP"))]
   "TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8161,18 +8169,19 @@
    mf%1 %0
    nop
    mfvsrd %0,%x1
-   mtvsrd %x0,%1"
+   mtvsrd %x0,%1
+   #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, integer,
              store,       load,       *,          mtjmpr,     mfjmpr,
-             *,           mfvsr,      mtvsr")
+             *,           mfvsr,      mtvsr,      vecperm")
    (set_attr "size" "64")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
              *,           *,          *,          *,          *,
-             *,           p8v,        p8v")])
+             *,           p8v,        p8v,        p10")])
 
 ;;           STD      LD       MR      MT<SPR> MF<SPR> G-const
 ;;           H-const  F-const  Special
@@ -8206,6 +8215,46 @@
    (set_attr "length"
             "*,       *,      *,      *,      *,      8,
              12,      16,     *")])
+
+;; Split the VSX prefixed instruction to support SFmode and DFmode scalar
+;; constants that look like DFmode floating point values where both elements
+;; are the same.  The constant has to be expressible as a SFmode constant that
+;; is not a SFmode denormal value.
+;;
+;; We don't need splitters for the 128-bit types, since the function
+;; rs6000_output_move_128bit handles the generation of XXSPLTIDP.
+(define_insn "xxspltidp_<mode>_internal"
+  [(set (match_operand:SFDF 0 "register_operand" "=wa")
+	(unspec:SFDF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
+		     UNSPEC_XXSPLTIDP_CONST))]
+  "TARGET_POWER10"
+  "xxspltidp %x0,%1"
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
+(define_split
+  [(set (match_operand:SFDF 0 "vsx_register_operand")
+	(match_operand:SFDF 1 "vsx_prefixed_constant"))]
+  "TARGET_POWER10"
+  [(pc)]
+{
+  rtx dest = operands[0];
+  rtx src = operands[1];
+  rs6000_const vsx_const;
+
+  if (!constant_to_bytes (src, <MODE>mode, &vsx_const, RS6000_CONST_SPLAT_16_BYTES))
+    gcc_unreachable ();
+
+  unsigned imm = constant_generates_xxspltidp (&vsx_const);
+  if (imm)
+    {
+      emit_insn (gen_xxspltidp_<mode>_internal (dest, GEN_INT (imm)));
+      DONE;
+    }
+
+  else
+    gcc_unreachable ();
+})
 \f
 (define_expand "mov<mode>"
   [(set (match_operand:FMOVE128 0 "general_operand")
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 9d7878f144a..429da57d19d 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -640,6 +640,10 @@ mprivileged
 Target Var(rs6000_privileged) Init(0)
 Generate code that will run in privileged state.
 
+msplat-float-constant
+Target Var(TARGET_SPLAT_FLOAT_CONSTANT) Init(1) Save
+Generate (do not generate) code that uses the XXSPLTIDP instruction.
+
 -param=rs6000-density-pct-threshold=
 Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) IntegerRange(0, 100) Param
 When costing for loop vectorization, we probably need to penalize the loop body
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index bf033e31c1c..0ceecc1975c 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1192,16 +1192,19 @@
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
+;;              XXLSPLTI*
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        r,         we,        ?wQ,
                 ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
+                wa,
                 ?wa,       v,         <??r>,     wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        we,        r,         r,
                 wQ,        Y,         r,         r,         wE,        jwM,
+                eP,
                 ?jwM,      W,         <nW>,      v,         wZ"))]
 
   "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
@@ -1213,36 +1216,47 @@
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
                 store,     load,      store,     *,         vecsimple, vecsimple,
+                vecperm,
                 vecsimple, *,         *,         vecstore,  vecload")
    (set_attr "num_insns"
                "*,         *,         *,         2,         *,         2,
                 2,         2,         2,         2,         *,         *,
+                *,
                 *,         5,         2,         *,         *")
    (set_attr "max_prefixed_insns"
                "*,         *,         *,         *,         *,         2,
                 2,         2,         2,         2,         *,         *,
+                *,
                 *,         *,         *,         *,         *")
    (set_attr "length"
                "*,         *,         *,         8,         *,         8,
                 8,         8,         8,         8,         *,         *,
+                *,
                 *,         20,        8,         *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
                 *,         *,         *,         *,         p9v,       *,
+                p10,
                 <VSisa>,   *,         *,         *,         *")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
-;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
+;;              XXSPLTIB   VSPLTISW   VSX 0/-1
+;;              XXSPLTI*
+;;              VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
-                wa,        v,         ?wa,       v,         <??r>,
+                wa,        v,         ?wa,
+                wa,
+                v,         <??r>,
                 wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        Y,         r,         r,
-                wE,        jwM,       ?jwM,      W,         <nW>,
+                wE,        jwM,       ?jwM,
+                eP,
+                W,         <nW>,
                 v,         wZ"))]
 
   "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
@@ -1253,15 +1267,21 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, load,      store,    *,
-                vecsimple, vecsimple, vecsimple, *,         *,
+                vecsimple, vecsimple, vecsimple,
+                vecperm,
+                *,         *,
                 vecstore,  vecload")
    (set_attr "length"
                "*,         *,         *,         16,        16,        16,
-                *,         *,         *,         20,        16,
+                *,         *,         *,
+                *,
+                20,        16,
                 *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
-                p9v,       *,         <VSisa>,   *,         *,
+                p9v,       *,         <VSisa>,
+                p10,
+                *,         *,
                 *,         *")])
 
 ;; Explicit  load/store expanders for the builtin functions
diff --git a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
index bd1502bb30a..dcb30e1d886 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
@@ -24,11 +24,12 @@ vector signed long long splats4(void)
         return (vector signed long long) vec_sl(mzero, mzero);
 }
 
-/* Codegen will consist of splat and shift instructions for most types.
-   If folding is enabled, the vec_sl tests using vector long long type will
-   generate a lvx instead of a vspltisw+vsld pair.  */
+/* Codegen will consist of splat and shift instructions for most types.  If
+   folding is enabled, the vec_sl tests using vector long long type will
+   generate a lvx instead of a vspltisw+vsld pair.  On power10, it will
+   generate a xxspltidp instruction instead of the lvx.  */
 
 /* { dg-final { scan-assembler-times {\mvspltis[bhw]\M} 0 } } */
 /* { dg-final { scan-assembler-times {\mvsl[bhwd]\M} 0 } } */
-/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M|\mxxspltidp\M} 2 } } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
new file mode 100644
index 00000000000..8f6e176f9af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+double
+scalar_double_0 (void)
+{
+  return 0.0;			/* XXSPLTIB or XXLXOR.  */
+}
+
+double
+scalar_double_1 (void)
+{
+  return 1.0;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+double
+scalar_double_m0 (void)
+{
+  return -0.0;			/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_nan (void)
+{
+  return __builtin_nan ("");	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_inf (void)
+{
+  return __builtin_inf ();	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inf ();
+}
+#endif
+
+double
+scalar_double_pi (void)
+{
+  return M_PI;			/* PLFD.  */
+}
+
+double
+scalar_double_denorm (void)
+{
+  return 0x1p-149f;		/* PLFD.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
new file mode 100644
index 00000000000..72504bdfbbd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating SFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+float
+scalar_float_0 (void)
+{
+  return 0.0f;			/* XXSPLTIB or XXLXOR.  */
+}
+
+float
+scalar_float_1 (void)
+{
+  return 1.0f;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+float
+scalar_float_m0 (void)
+{
+  return -0.0f;			/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_nan (void)
+{
+  return __builtin_nanf ("");	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_inf (void)
+{
+  return __builtin_inff ();	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inff ();
+}
+#endif
+
+float
+scalar_float_pi (void)
+{
+  return (float)M_PI;		/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_denorm (void)
+{
+  return 0x1p-149f;		/* PLFS.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
new file mode 100644
index 00000000000..82ffc86f8aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
@@ -0,0 +1,64 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating V2DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+vector double
+v2df_double_0 (void)
+{
+  return (vector double) { 0.0, 0.0 };			/* XXSPLTIB or XXLXOR.  */
+}
+
+vector double
+v2df_double_1 (void)
+{
+  return (vector double) { 1.0, 1.0 };			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+vector double
+v2df_double_m0 (void)
+{
+  return (vector double) { -0.0, -0.0 };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_nan (void)
+{
+  return (vector double) { __builtin_nan (""),
+			   __builtin_nan ("") };	/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_inf (void)
+{
+  return (vector double) { __builtin_inf (),
+			   __builtin_inf () };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_m_inf (void)
+{
+  return (vector double) { - __builtin_inf (),
+			   - __builtin_inf () };	/* XXSPLTIDP.  */
+}
+#endif
+
+vector double
+v2df_double_pi (void)
+{
+  return (vector double) { M_PI, M_PI };		/* PLVX.  */
+}
+
+vector double
+v2df_double_denorm (void)
+{
+  return (vector double) { (double)0x1p-149f,
+			   (double)0x1p-149f };		/* PLVX.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
new file mode 100644
index 00000000000..4d44f943d26
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test generating V2DImode constants that have the same bit pattern as
+   V2DFmode constants that can be loaded with the XXSPLTIDP instruction with
+   the ISA 3.1 (power10).  */
+
+vector long long
+vector_0 (void)
+{
+  /* XXSPLTIB or XXLXOR.  */
+  return (vector long long) { 0LL, 0LL };
+}
+
+vector long long
+vector_1 (void)
+{
+  /* XXSPLTIB and VEXTSB2D.  */
+  return (vector long long) { 1LL, 1LL };
+}
+
+/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated
+   with XXSPLTISDP.  */
+vector long long
+vector_float_neg_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x8000000000000000LL, 0x8000000000000000LL };
+}
+
+/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with
+   XXSPLTISDP.  */
+vector long long
+vector_float_1_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x3ff0000000000000LL, 0x3ff0000000000000LL };
+}
+
+/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated
+   with XXSPLTIDP.  */
+vector long long
+scalar_pi (void)
+{
+  /* PLXV.  */
+  return (vector long long) { 0x400921fb54442d18LL, 0x400921fb54442d18LL };
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
index a135279b1d7..5f84930e1a7 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -150,7 +150,7 @@ main (int argc, char *argv [])
 }
 
 /* { dg-final { scan-assembler-times {\mxxspltiw\M} 2 } } */
-/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [gcc(refs/users/meissner/heads/work071)] Generate XXSPLTIDP on power10.
@ 2021-10-20 21:46 Michael Meissner
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Meissner @ 2021-10-20 21:46 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:4119bec66ccc4ea84b7a01182004c65ad3ca070a

commit 4119bec66ccc4ea84b7a01182004c65ad3ca070a
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Wed Oct 20 17:46:00 2021 -0400

    Generate XXSPLTIDP on power10.
    
    This patch implements XXSPLTIDP support for SF, and DF scalar constants and all
    vector constants.  The XXSPLTIDP instruction is given a 32-bit immediate that
    is converted to a vector of two DFmode constants.  The immediate is in SFmode
    format, so only constants that fit as SFmode values can be loaded with
    XXSPLTIDP.
    
    I added one new constraint (eP) which matches instructions that load a VSX
    register with one prefixed instruction.  This patch adds the XXSPLTIDP
    support.  The next patch will add the XXSPLTIW support.
    
    This patch depends on the previous patch that addes the rs6000_const structure
    and the constant_to_bytes function.
    
    DImode scalar constants are not handled.  This is due to the majority of DImode
    constants will be in the GPR registers.  With vector registers, you have the
    problem that XXSPLTIDP splats the double word into both elements of the
    vector.  However, if TImode is loaded with an integer constant, it wants a full
    128-bit constant.
    
    I have added a temporary switch (-msplat-float-constant) to control whether or
    not the XXSPLTIDP instruction is generated.
    
    I added 4 new tests to test loading up SF/DF scalar and vector constants.
    
    This patch updates the previous patch to take into account the comments
    from the patch review.   The main change is that this patch does is map
    each vector and scalar to provide all of bits and then match those bits to
    see if the XXSPLTIDP instruction can generate the bits necessary, even if
    the values in the vector aren't DFmode constants.
    
    2021-10-20  Michael Meissner  <meissner@the-meissners.org>
    
    gcc/
    
            * config/rs6000/constraints.md (eP): New constraint.
            * config/rs6000/predicates.md (easy_fp_constant): Add support for
            generating XXSPLTIDP.
            (vsx_prefixed_constant): New predicate.
            (easy_vector_constant): Add support for generating XXSPLTIDP.
            * config/rs6000/rs6000-protos.h (prefixed_xxsplti_p): New
            declaration.
            (constant_generates_xxspltidp): New declaration.
            * config/rs6000/rs6000.c (prefixed_xxsplti_p): New function.
            (constant_generates_xxspltidp): New function.
            * config/rs6000/rs6000.md (UNSPEC_XXSPLTIDP_CONST): New unspec.
            (prefixed attribute): Add support for prefixed instructions to load
            constants into VSX registers.
            (movsf_hardfloat): Add support for XXSPLTIDP.
            (mov<mode>_hardfloat32, FMOVE64 iterator): Likewise.
            (mov<mode>_hardfloat64, FMOVE64 iterator): Likewise.
            (xxspltidp_<mode>_internal): New insns.
            (splitter for VSX prefix constants): New splitters.
            * config/rs6000/rs6000.opt (-msplat-float-constant): New debug option.
            * config/rs6000/vsx.md (vsx_mov<mode>_64bit): Add support for
            XXSPLTIDP.
            (vsx_mov<mode>_32bit): Likewise.
            * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
            eP constraint.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/pr86731-fwrapv-longlong.c: Update insn
            regex for power10.
            * gcc.target/powerpc/vec-splat-constant-df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2di.c: New test.
            * gcc.target/powerpc/vec-splati-runnable.c: Update insn counts.

Diff:
---
 gcc/config/rs6000/constraints.md                   |   6 +
 gcc/config/rs6000/predicates.md                    |  57 ++++++++
 gcc/config/rs6000/rs6000-protos.h                  |   2 +
 gcc/config/rs6000/rs6000.c                         | 149 +++++++++++++++++++++
 gcc/config/rs6000/rs6000.md                        |  86 +++++++++---
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           |  32 ++++-
 .../gcc.target/powerpc/pr86731-fwrapv-longlong.c   |   9 +-
 .../gcc.target/powerpc/vec-splat-constant-df.c     |  60 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-sf.c     |  60 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-v2df.c   |  64 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-v2di.c   |  50 +++++++
 .../gcc.target/powerpc/vec-splati-runnable.c       |   2 +-
 13 files changed, 552 insertions(+), 29 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index c8cff1a3038..7d594872a78 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -213,6 +213,12 @@
   "A signed 34-bit integer constant if prefixed instructions are supported."
   (match_operand 0 "cint34_operand"))
 
+;; A SF/DF scalar constant or a vector constant that can be loaded into vector
+;; registers with one prefixed instruction such as XXSPLTIDP.
+(define_constraint "eP"
+  "A constant that can be loaded into a VSX register with one prefixed insn."
+  (match_operand 0 "vsx_prefixed_constant"))
+
 ;; Floating-point constraints.  These two are defined so that insn
 ;; length attributes can be calculated exactly.
 
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 956e42bc514..b97b20a1c88 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -601,6 +601,16 @@
   if (TARGET_VSX && op == CONST0_RTX (mode))
     return 1;
 
+  /* Constants that can be generated with ISA 3.1 instructions are easy.  */
+  rs6000_const vsx_const;
+
+  if (TARGET_POWER10
+      && constant_to_bytes (op, mode, &vsx_const, RS6000_CONST_SPLAT_16_BYTES))
+    {
+      if (constant_generates_xxspltidp (&vsx_const))
+	return true;
+    }
+
   /* Otherwise consider floating point constants hard, so that the
      constant gets pushed to memory during the early RTL phases.  This
      has the advantage that double precision constants that can be
@@ -609,6 +619,42 @@
    return 0;
 })
 
+;; Return 1 if the operand is a 64-bit floating point scalar constant or a
+;; vector constant that can be loaded to a VSX register with one prefixed
+;; instruction, such as XXSPLTIDP.
+;;
+;; In addition regular constants, we also recognize constants formed with the
+;; VEC_DUPLICATE insn from scalar constants.
+;;
+;; We don't handle scalar integer constants here because the assumption is the
+;; normal integer constants will be loaded into GPR registers.  For the
+;; constants that need to be loaded into vector registers, the instructions
+;; don't work well with TImode variables assigned a constant.  This is because
+;; the 64-bit scalar constants are splatted into both halves of the register.
+
+(define_predicate "vsx_prefixed_constant"
+  (match_code "const_double,const_vector,vec_duplicate")
+{
+  /* If we can generate the constant with 1-2 Altivec instructions, don't
+      generate a prefixed instruction.  */
+  if (CONST_VECTOR_P (op) && easy_altivec_constant (op, mode))
+    return false;
+
+  /* Do we have prefixed instructions and are VSX registers available?  Is the
+     constant recognized?  */
+  if (!TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  rs6000_const vsx_const;
+  if (!constant_to_bytes (op, mode, &vsx_const, RS6000_CONST_SPLAT_16_BYTES))
+    return false;
+
+  if (constant_generates_xxspltidp (&vsx_const))
+    return true;
+
+  return false;
+})
+
 ;; Return 1 if the operand is a constant that can loaded with a XXSPLTIB
 ;; instruction and then a VUPKHSB, VECSB2W or VECSB2D instruction.
 
@@ -653,6 +699,17 @@
       if (zero_constant (op, mode) || all_ones_constant (op, mode))
 	return true;
 
+      /* Constants that can be generated with ISA 3.1 instructions are
+         easy.  */
+      rs6000_const vsx_const;
+
+      if (TARGET_POWER10
+	  && constant_to_bytes (op, mode, &vsx_const, RS6000_CONST_NO_SPLAT))
+	{
+	  if (constant_generates_xxspltidp (&vsx_const))
+	    return true;
+	}
+
       if (TARGET_P9_VECTOR
           && xxspltib_constant_p (op, mode, &num_insns, &value))
 	return true;
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index a3fecbb7812..ec4f78d9241 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -198,6 +198,7 @@ enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode);
 extern bool prefixed_load_p (rtx_insn *);
 extern bool prefixed_store_p (rtx_insn *);
 extern bool prefixed_paddi_p (rtx_insn *);
+extern bool prefixed_xxsplti_p (rtx_insn *);
 extern void rs6000_asm_output_opcode (FILE *);
 extern void output_pcrel_opt_reloc (rtx);
 extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int);
@@ -257,6 +258,7 @@ typedef struct {
 
 extern bool constant_to_bytes (rtx, machine_mode, rs6000_const *,
 			       rs6000_const_splat);
+extern unsigned constant_generates_xxspltidp (rs6000_const *);
 #endif /* RTX_CODE */
 
 #ifdef TREE_CODE
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index e2f48f5a1e2..40c7e5ceddf 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6990,6 +6990,21 @@ output_vec_const_move (rtx *operands)
 	    gcc_unreachable ();
 	}
 
+      if (TARGET_PREFIXED)
+	{
+	  rs6000_const vsx_const;
+	  if (constant_to_bytes (vec, mode, &vsx_const,
+				 RS6000_CONST_SPLAT_16_BYTES))
+	    {
+	      unsigned imm = constant_generates_xxspltidp (&vsx_const);
+	      if (imm)
+		{
+		  operands[2] = GEN_INT (imm);
+		  return "xxspltidp %x0,%2";
+		}
+	    }
+	}
+
       if (TARGET_P9_VECTOR
 	  && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
 	{
@@ -26724,6 +26739,41 @@ prefixed_paddi_p (rtx_insn *insn)
   return (iform == INSN_FORM_PCREL_EXTERNAL || iform == INSN_FORM_PCREL_LOCAL);
 }
 
+/* Whether an instruction is a prefixed XXSPLTI* instruction.  This is called
+   from the prefixed attribute processing.  */
+
+bool
+prefixed_xxsplti_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  if (!set)
+    return false;
+
+  rtx dest = SET_DEST (set);
+  rtx src = SET_SRC (set);
+  machine_mode mode = GET_MODE (dest);
+
+  if (!REG_P (dest) && !SUBREG_P (dest))
+    return false;
+
+  if (GET_CODE (src) == UNSPEC)
+    {
+      int unspec = XINT (src, 1);
+      return (unspec == UNSPEC_XXSPLTIW
+	      || unspec == UNSPEC_XXSPLTIDP
+	      || unspec == UNSPEC_XXSPLTI32DX);
+    }
+
+  rs6000_const vsx_const;
+  if (constant_to_bytes (src, mode, &vsx_const, RS6000_CONST_SPLAT_16_BYTES))
+    {
+      if (constant_generates_xxspltidp (&vsx_const))
+	return true;
+    }
+
+  return false;
+}
+
 /* Whether the next instruction needs a 'p' prefix issued before the
    instruction is printed out.  */
 static bool prepend_p_to_next_insn;
@@ -28852,6 +28902,105 @@ constant_to_bytes (rtx op,
   return true;
 }
 
+
+/* Determine if a vector constant can be loaded with XXSPLTIDP.  Return zero if
+   the XXSPLTIDP instruction cannot be used.  Otherwise return the immediate
+   value to be used with the XXSPLTIDP instruction.  */
+
+unsigned
+constant_generates_xxspltidp (rs6000_const *vsx_const)
+{
+  if (!TARGET_SPLAT_FLOAT_CONSTANT || !TARGET_PREFIXED || !TARGET_VSX)
+    return 0;
+
+  /* Make sure that the two 64-bit segments are the same.  */
+  if (!vsx_const->all_double_words_same)
+    return 0;
+
+  /* If the bytes, half words, or words are all the same, don't use XXSPLTIDP.
+     Use a simpler instruction (XXSPLTIB, VSPLTISB, VSPLTISH, or VSPLTISW).  */
+  if (vsx_const->all_bytes_same
+      || vsx_const->all_half_words_same
+      || vsx_const->all_words_same)
+    return 0;
+
+  unsigned HOST_WIDE_INT value = vsx_const->double_words[0];
+
+  /* Avoid values that look like DFmode NaN's, except for the normal NaN bit
+     pattern and the signalling NaN bit pattern.  Recognize infinity and
+     negative infinity.  */
+
+  /* Bit representation of DFmode normal quiet NaN.  */
+#define RS6000_CONST_DF_NAN	HOST_WIDE_INT_UC (0x7ff8000000000000)
+
+  /* Bit representation of DFmode normal signaling NaN.  */
+#define RS6000_CONST_DF_NANS	HOST_WIDE_INT_UC (0x7ff4000000000000)
+
+  /* Bit representation of DFmode positive infinity.  */
+#define RS6000_CONST_DF_INF	HOST_WIDE_INT_UC (0x7ff0000000000000)
+
+  /* Bit representation of DFmode negative infinity.  */
+#define RS6000_CONST_DF_NEG_INF	HOST_WIDE_INT_UC (0xfff0000000000000)
+
+  if (value != RS6000_CONST_DF_NAN
+      && value != RS6000_CONST_DF_NANS
+      && value != RS6000_CONST_DF_INF
+      && value != RS6000_CONST_DF_NEG_INF)
+    {
+      /* The IEEE 754 64-bit floating format has 1 bit for sign, 11 bits for
+	 the exponent, and 52 bits for the mantissa (not counting the hidden
+	 bit used for normal numbers).  NaN values have the exponent set to all
+	 1 bits, and the mantissa non-zero (mantissa == 0 is infinity).  */
+
+      int df_exponent = (value >> 52) & 0x7ff;
+      unsigned HOST_WIDE_INT df_mantissa
+	= value & ((HOST_WIDE_INT_1U << 52) - HOST_WIDE_INT_1U);
+
+      if (df_exponent == 0x7ff && df_mantissa != 0)	/* other NaNs.  */
+	return 0;
+
+      /* Avoid values that are DFmode subnormal values.  Subnormal numbers have
+	 the exponent all 0 bits, and the mantissa non-zero.  If the value is
+	 subnormal, then the hidden bit in the mantissa is not set.  */
+      if (df_exponent == 0 && df_mantissa != 0)		/* subnormal.  */
+	return 0;
+    }
+
+  /* Change the representation to DFmode constant.  */
+  long df_words[2] = { vsx_const->words[0], vsx_const->words[1] };
+
+  /* real_from_target takes the target words in  target order.  */
+  if (!BYTES_BIG_ENDIAN)
+    std::swap (df_words[0], df_words[1]);
+
+  REAL_VALUE_TYPE rv_type;
+  real_from_target (&rv_type, df_words, DFmode);
+
+  const REAL_VALUE_TYPE *rv = &rv_type;
+
+  /* Validate that the number can be stored as a SFmode value.  */
+  if (!exact_real_truncate (SFmode, rv))
+    return 0;
+
+  /* Validate that the number is not a SFmode subnormal value (exponent is 0,
+     mantissa field is non-zero) which is undefined for the XXSPLTIDP
+     instruction.  */
+  long sf_value;
+  real_to_target (&sf_value, rv, SFmode);
+
+  /* IEEE 754 32-bit values have 1 bit for the sign, 8 bits for the exponent,
+     and 23 bits for the mantissa.  Subnormal numbers have the exponent all
+     0 bits, and the mantissa non-zero.  */
+  long sf_exponent = (sf_value >> 23) & 0xFF;
+  long sf_mantissa = sf_value & 0x7FFFFF;
+
+  if (sf_exponent == 0 && sf_mantissa != 0)
+    return 0;
+
+  /* Return the immediate to be used.  */
+  return sf_value;
+}
+
 \f
 struct gcc_target targetm = TARGET_INITIALIZER;
 
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 6bec2bddbde..218645aa240 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -156,6 +156,7 @@
    UNSPEC_PEXTD
    UNSPEC_HASHST
    UNSPEC_HASHCHK
+   UNSPEC_XXSPLTIDP_CONST
   ])
 
 ;;
@@ -314,6 +315,11 @@
 
 	 (eq_attr "type" "integer,add")
 	 (if_then_else (match_test "prefixed_paddi_p (insn)")
+		       (const_string "yes")
+		       (const_string "no"))
+
+	 (eq_attr "type" "vecperm")
+	 (if_then_else (match_test "prefixed_xxsplti_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))]
 
@@ -7759,17 +7765,17 @@
 ;;
 ;;	LWZ          LFS        LXSSP       LXSSPX     STFS       STXSSP
 ;;	STXSSPX      STW        XXLXOR      LI         FMR        XSCPSGNDP
-;;	MR           MT<x>      MF<x>       NOP
+;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP
 
 (define_insn "movsf_hardfloat"
   [(set (match_operand:SF 0 "nonimmediate_operand"
 	 "=!r,       f,         v,          wa,        m,         wY,
 	  Z,         m,         wa,         !r,        f,         wa,
-	  !r,        *c*l,      !r,         *h")
+	  !r,        *c*l,      !r,         *h,        wa")
 	(match_operand:SF 1 "input_operand"
 	 "m,         m,         wY,         Z,         f,         v,
 	  wa,        r,         j,          j,         f,         wa,
-	  r,         r,         *h,         0"))]
+	  r,         r,         *h,         0,         eP"))]
   "(register_operand (operands[0], SFmode)
    || register_operand (operands[1], SFmode))
    && TARGET_HARD_FLOAT
@@ -7791,15 +7797,16 @@
    mr %0,%1
    mt%0 %1
    mf%1 %0
-   nop"
+   nop
+   #"
   [(set_attr "type"
 	"load,       fpload,    fpload,     fpload,    fpstore,   fpstore,
 	 fpstore,    store,     veclogical, integer,   fpsimple,  fpsimple,
-	 *,          mtjmpr,    mfjmpr,     *")
+	 *,          mtjmpr,    mfjmpr,     *,         vecperm")
    (set_attr "isa"
 	"*,          *,         p9v,        p8v,       *,         p9v,
 	 p8v,        *,         *,          *,         *,         *,
-	 *,          *,         *,          *")])
+	 *,          *,         *,          *,         p10")])
 
 ;;	LWZ          LFIWZX     STW        STFIWX     MTVSRWZ    MFVSRWZ
 ;;	FMR          MR         MT%0       MF%1       NOP
@@ -8059,18 +8066,18 @@
 
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSD         STXSD       XXLOR       XXLXOR      GPR<-0
-;;           LWZ          STW         MR
+;;           LWZ          STW         MR          XXSPLTIDP
 
 
 (define_insn "*mov<mode>_hardfloat32"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
             "=m,          d,          d,          <f64_p9>,   wY,
               <f64_av>,   Z,          <f64_vsx>,  <f64_vsx>,  !r,
-              Y,          r,          !r")
+              Y,          r,          !r,         wa")
 	(match_operand:FMOVE64 1 "input_operand"
              "d,          m,          d,          wY,         <f64_p9>,
               Z,          <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
-              r,          Y,          r"))]
+              r,          Y,          r,          eP"))]
   "! TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8087,20 +8094,21 @@
    #
    #
    #
+   #
    #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, two,
-             store,       load,       two")
+             store,       load,       two,        vecperm")
    (set_attr "size" "64")
    (set_attr "length"
             "*,           *,          *,          *,          *,
              *,           *,          *,          *,          8,
-             8,           8,          8")
+             8,           8,          8,          *")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
-             *,           *,          *")])
+             *,           *,          *,          p10")])
 
 ;;           STW      LWZ     MR      G-const H-const F-const
 
@@ -8127,19 +8135,19 @@
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSDX        STXSDX      XXLOR       XXLXOR      LI 0
 ;;           STD          LD          MR          MT{CTR,LR}  MF{CTR,LR}
-;;           NOP          MFVSRD      MTVSRD
+;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP
 
 (define_insn "*mov<mode>_hardfloat64"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
            "=m,           d,          d,          <f64_p9>,   wY,
              <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
              YZ,          r,          !r,         *c*l,       !r,
-            *h,           r,          <f64_dm>")
+            *h,           r,          <f64_dm>,   wa")
 	(match_operand:FMOVE64 1 "input_operand"
             "d,           m,          d,          wY,         <f64_p9>,
              Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
              r,           YZ,         r,          r,          *h,
-             0,           <f64_dm>,   r"))]
+             0,           <f64_dm>,   r,          eP"))]
   "TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8161,18 +8169,19 @@
    mf%1 %0
    nop
    mfvsrd %0,%x1
-   mtvsrd %x0,%1"
+   mtvsrd %x0,%1
+   #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, integer,
              store,       load,       *,          mtjmpr,     mfjmpr,
-             *,           mfvsr,      mtvsr")
+             *,           mfvsr,      mtvsr,      vecperm")
    (set_attr "size" "64")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
              *,           *,          *,          *,          *,
-             *,           p8v,        p8v")])
+             *,           p8v,        p8v,        p10")])
 
 ;;           STD      LD       MR      MT<SPR> MF<SPR> G-const
 ;;           H-const  F-const  Special
@@ -8206,6 +8215,47 @@
    (set_attr "length"
             "*,       *,      *,      *,      *,      8,
              12,      16,     *")])
+
+;; Split the VSX prefixed instruction to support SFmode and DFmode scalar
+;; constants that look like DFmode floating point values where both elements
+;; are the same.  The constant has to be expressible as a SFmode constant that
+;; is not a SFmode denormal value.
+;;
+;; We don't need splitters for the 128-bit types, since the function
+;; rs6000_output_move_128bit handles the generation of XXSPLTIDP.
+(define_insn "xxspltidp_<mode>_internal"
+  [(set (match_operand:SFDF 0 "register_operand" "=wa")
+	(unspec:SFDF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
+		     UNSPEC_XXSPLTIDP_CONST))]
+  "TARGET_POWER10"
+  "xxspltidp %x0,%1"
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
+(define_split
+  [(set (match_operand:SFDF 0 "vsx_register_operand")
+	(match_operand:SFDF 1 "vsx_prefixed_constant"))]
+  "TARGET_POWER10"
+  [(pc)]
+{
+  rtx dest = operands[0];
+  rtx src = operands[1];
+  rs6000_const vsx_const;
+
+  if (!constant_to_bytes (src, <MODE>mode, &vsx_const, RS6000_CONST_SPLAT_16_BYTES))
+    gcc_unreachable ();
+
+  unsigned imm = constant_generates_xxspltidp (&vsx_const);
+  if (imm)
+    {
+      emit_insn (gen_xxspltidp_<mode>_internal (dest, GEN_INT (imm)));
+      DONE;
+    }
+
+  else
+    gcc_unreachable ();
+})
+
 \f
 (define_expand "mov<mode>"
   [(set (match_operand:FMOVE128 0 "general_operand")
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 9d7878f144a..429da57d19d 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -640,6 +640,10 @@ mprivileged
 Target Var(rs6000_privileged) Init(0)
 Generate code that will run in privileged state.
 
+msplat-float-constant
+Target Var(TARGET_SPLAT_FLOAT_CONSTANT) Init(1) Save
+Generate (do not generate) code that uses the XXSPLTIDP instruction.
+
 -param=rs6000-density-pct-threshold=
 Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) IntegerRange(0, 100) Param
 When costing for loop vectorization, we probably need to penalize the loop body
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index bf033e31c1c..0ceecc1975c 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1192,16 +1192,19 @@
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
+;;              XXLSPLTI*
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        r,         we,        ?wQ,
                 ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
+                wa,
                 ?wa,       v,         <??r>,     wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        we,        r,         r,
                 wQ,        Y,         r,         r,         wE,        jwM,
+                eP,
                 ?jwM,      W,         <nW>,      v,         wZ"))]
 
   "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
@@ -1213,36 +1216,47 @@
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
                 store,     load,      store,     *,         vecsimple, vecsimple,
+                vecperm,
                 vecsimple, *,         *,         vecstore,  vecload")
    (set_attr "num_insns"
                "*,         *,         *,         2,         *,         2,
                 2,         2,         2,         2,         *,         *,
+                *,
                 *,         5,         2,         *,         *")
    (set_attr "max_prefixed_insns"
                "*,         *,         *,         *,         *,         2,
                 2,         2,         2,         2,         *,         *,
+                *,
                 *,         *,         *,         *,         *")
    (set_attr "length"
                "*,         *,         *,         8,         *,         8,
                 8,         8,         8,         8,         *,         *,
+                *,
                 *,         20,        8,         *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
                 *,         *,         *,         *,         p9v,       *,
+                p10,
                 <VSisa>,   *,         *,         *,         *")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
-;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
+;;              XXSPLTIB   VSPLTISW   VSX 0/-1
+;;              XXSPLTI*
+;;              VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
-                wa,        v,         ?wa,       v,         <??r>,
+                wa,        v,         ?wa,
+                wa,
+                v,         <??r>,
                 wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        Y,         r,         r,
-                wE,        jwM,       ?jwM,      W,         <nW>,
+                wE,        jwM,       ?jwM,
+                eP,
+                W,         <nW>,
                 v,         wZ"))]
 
   "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
@@ -1253,15 +1267,21 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, load,      store,    *,
-                vecsimple, vecsimple, vecsimple, *,         *,
+                vecsimple, vecsimple, vecsimple,
+                vecperm,
+                *,         *,
                 vecstore,  vecload")
    (set_attr "length"
                "*,         *,         *,         16,        16,        16,
-                *,         *,         *,         20,        16,
+                *,         *,         *,
+                *,
+                20,        16,
                 *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
-                p9v,       *,         <VSisa>,   *,         *,
+                p9v,       *,         <VSisa>,
+                p10,
+                *,         *,
                 *,         *")])
 
 ;; Explicit  load/store expanders for the builtin functions
diff --git a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
index bd1502bb30a..dcb30e1d886 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
@@ -24,11 +24,12 @@ vector signed long long splats4(void)
         return (vector signed long long) vec_sl(mzero, mzero);
 }
 
-/* Codegen will consist of splat and shift instructions for most types.
-   If folding is enabled, the vec_sl tests using vector long long type will
-   generate a lvx instead of a vspltisw+vsld pair.  */
+/* Codegen will consist of splat and shift instructions for most types.  If
+   folding is enabled, the vec_sl tests using vector long long type will
+   generate a lvx instead of a vspltisw+vsld pair.  On power10, it will
+   generate a xxspltidp instruction instead of the lvx.  */
 
 /* { dg-final { scan-assembler-times {\mvspltis[bhw]\M} 0 } } */
 /* { dg-final { scan-assembler-times {\mvsl[bhwd]\M} 0 } } */
-/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M|\mxxspltidp\M} 2 } } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
new file mode 100644
index 00000000000..8f6e176f9af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+double
+scalar_double_0 (void)
+{
+  return 0.0;			/* XXSPLTIB or XXLXOR.  */
+}
+
+double
+scalar_double_1 (void)
+{
+  return 1.0;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+double
+scalar_double_m0 (void)
+{
+  return -0.0;			/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_nan (void)
+{
+  return __builtin_nan ("");	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_inf (void)
+{
+  return __builtin_inf ();	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inf ();
+}
+#endif
+
+double
+scalar_double_pi (void)
+{
+  return M_PI;			/* PLFD.  */
+}
+
+double
+scalar_double_denorm (void)
+{
+  return 0x1p-149f;		/* PLFD.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
new file mode 100644
index 00000000000..72504bdfbbd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating SFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+float
+scalar_float_0 (void)
+{
+  return 0.0f;			/* XXSPLTIB or XXLXOR.  */
+}
+
+float
+scalar_float_1 (void)
+{
+  return 1.0f;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+float
+scalar_float_m0 (void)
+{
+  return -0.0f;			/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_nan (void)
+{
+  return __builtin_nanf ("");	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_inf (void)
+{
+  return __builtin_inff ();	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inff ();
+}
+#endif
+
+float
+scalar_float_pi (void)
+{
+  return (float)M_PI;		/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_denorm (void)
+{
+  return 0x1p-149f;		/* PLFS.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
new file mode 100644
index 00000000000..82ffc86f8aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
@@ -0,0 +1,64 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating V2DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+vector double
+v2df_double_0 (void)
+{
+  return (vector double) { 0.0, 0.0 };			/* XXSPLTIB or XXLXOR.  */
+}
+
+vector double
+v2df_double_1 (void)
+{
+  return (vector double) { 1.0, 1.0 };			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+vector double
+v2df_double_m0 (void)
+{
+  return (vector double) { -0.0, -0.0 };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_nan (void)
+{
+  return (vector double) { __builtin_nan (""),
+			   __builtin_nan ("") };	/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_inf (void)
+{
+  return (vector double) { __builtin_inf (),
+			   __builtin_inf () };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_m_inf (void)
+{
+  return (vector double) { - __builtin_inf (),
+			   - __builtin_inf () };	/* XXSPLTIDP.  */
+}
+#endif
+
+vector double
+v2df_double_pi (void)
+{
+  return (vector double) { M_PI, M_PI };		/* PLVX.  */
+}
+
+vector double
+v2df_double_denorm (void)
+{
+  return (vector double) { (double)0x1p-149f,
+			   (double)0x1p-149f };		/* PLVX.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
new file mode 100644
index 00000000000..4d44f943d26
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test generating V2DImode constants that have the same bit pattern as
+   V2DFmode constants that can be loaded with the XXSPLTIDP instruction with
+   the ISA 3.1 (power10).  */
+
+vector long long
+vector_0 (void)
+{
+  /* XXSPLTIB or XXLXOR.  */
+  return (vector long long) { 0LL, 0LL };
+}
+
+vector long long
+vector_1 (void)
+{
+  /* XXSPLTIB and VEXTSB2D.  */
+  return (vector long long) { 1LL, 1LL };
+}
+
+/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated
+   with XXSPLTISDP.  */
+vector long long
+vector_float_neg_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x8000000000000000LL, 0x8000000000000000LL };
+}
+
+/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with
+   XXSPLTISDP.  */
+vector long long
+vector_float_1_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x3ff0000000000000LL, 0x3ff0000000000000LL };
+}
+
+/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated
+   with XXSPLTIDP.  */
+vector long long
+scalar_pi (void)
+{
+  /* PLXV.  */
+  return (vector long long) { 0x400921fb54442d18LL, 0x400921fb54442d18LL };
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
index a135279b1d7..5f84930e1a7 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -150,7 +150,7 @@ main (int argc, char *argv [])
 }
 
 /* { dg-final { scan-assembler-times {\mxxspltiw\M} 2 } } */
-/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [gcc(refs/users/meissner/heads/work071)] Generate XXSPLTIDP on power10.
@ 2021-10-18 17:53 Michael Meissner
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Meissner @ 2021-10-18 17:53 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:b283d9e4e523c8b46f5f414d7d9aed8ea9d33b0c

commit b283d9e4e523c8b46f5f414d7d9aed8ea9d33b0c
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Mon Oct 18 13:52:46 2021 -0400

    Generate XXSPLTIDP on power10.
    
    This patch implements XXSPLTIDP support for SF, and DF scalar constants and all
    vector constants.  The XXSPLTIDP instruction is given a 32-bit immediate that
    is converted to a vector of two DFmode constants.  The immediate is in SFmode
    format, so only constants that fit as SFmode values can be loaded with
    XXSPLTIDP.
    
    I added one new constraint (eP) which matches instructions that load a VSX
    register with one prefixed instruction.  This patch adds the XXSPLTIDP
    support.  The next patch will add the XXSPLTIW support.
    
    DImode scalar constants are not handled.  This is due to the majority of DImode
    constants will be in the GPR registers.  With vector registers, you have the
    problem that XXSPLTIDP splats the double word into both elements of the
    vector.  However, if TImode is loaded with an integer constant, it wants a full
    128-bit constant.
    
    I have added a temporary switch (-mxxspltidp) to control whether or not the
    XXSPLTIDP instruction is generated.
    
    I added 4 new tests to test loading up SF/DF scalar and vector constants.
    
    This patch updates the previous patch to take into account the comments
    from the patch review.   The main change is that this patch does is map
    each vector and scalar to provide all of bits and then match those bits to
    see if the XXSPLTIDP instruction can generate the bits necessary, even if
    the values in the vector aren't DFmode constants.
    
    A framework is provided in this patch which will also be used in future
    patches adding LXVKQ and XXSPLTIW support (possibly XXSPLTI32DX).  This
    makes it easy for easy_fp_constant and easy_vector_constant to have
    multiple checks (such as for XXSPLTIW, LXVKQ, and XXSPLTI32DX) that each
    want to build the bitmask of what the vector constant looks like.
    
    While the PowerPC is currently limited to 128-bit vectors, I have written
    the code so it can be changed in the future if we ever have larger vection
    sizes.
    
    2021-10-18  Michael Meissner  <meissner@the-meissners.org>
    
    gcc/
    
            * config/rs6000/constraints.md (eP): New constraint.
            * config/rs6000/predicates.md (easy_fp_constant): Add support for
            generating XXSPLTIDP.
            (vsx_prefixed_constant): New predicate.
            (easy_vector_constant): Add support for generating XXSPLTIDP.
            * config/rs6000/rs6000-protos.h (prefixed_xxsplti_p): New
            declaration.
            (VECTOR_CONST_*): New macros.
            (rs6000_vec_const): New structure to hold information about vector
            constants.
            (vec_const_to_bytes): New function.
            (vec_const_use_xxspltidp): New function.
            * config/rs6000/rs6000.c (output_vec_const_move): Add support for
            XXSPLTIDP.
            (prefixed_xxsplti_p): New function.
            (vec_const_integer): New helper function.
            (vec_const_floating_point): New helper function.
            (vec_const_use_xxspltidp): New function.
            (vec_const_to_bytes): New function.
            * config/rs6000/rs6000.md (UNSPEC_XXSPLTIDP_CONST): New unspec.
            (prefixed attribute): Add support for prefixed instructions to load
            * constants into VSX registers.
            (movsf_hardfloat): Add support for XXSPLTIDP.
            (mov<mode>_hardfloat32, FMOVE64 iterator): Likewise.
            (mov<mode>_hardfloat64, FMOVE64 iterator): Likewise.
            (xxspltidp_<mode>_internal): New insns.
            (splitter for VSX prefix constants): New splitters.
            * config/rs6000/rs6000.opt (-mxxspltidp): New debug option.
            * config/rs6000/vsx.md (vsx_mov<mode>_64bit): Add support for
            XXSPLTIDP.
            (vsx_mov<mode>_32bit): Likewise.
            * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
            eP constraint.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/pr86731-fwrapv-longlong.c: Update insn
            regex for power10.
            * gcc.target/powerpc/vec-splat-constant-df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2di.c: New test.
            * gcc.target/powerpc/vec-splati-runnable.c: Update insn counts.

Diff:
---
 gcc/config/rs6000/constraints.md                   |   6 +
 gcc/config/rs6000/predicates.md                    |  51 +++
 gcc/config/rs6000/rs6000-protos.h                  |  27 ++
 gcc/config/rs6000/rs6000.c                         | 387 +++++++++++++++++++++
 gcc/config/rs6000/rs6000.md                        |  86 ++++-
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           |  32 +-
 gcc/doc/md.texi                                    |   4 +
 .../gcc.target/powerpc/pr86731-fwrapv-longlong.c   |   9 +-
 .../gcc.target/powerpc/vec-splat-constant-df.c     |  60 ++++
 .../gcc.target/powerpc/vec-splat-constant-sf.c     |  60 ++++
 .../gcc.target/powerpc/vec-splat-constant-v2df.c   |  64 ++++
 .../gcc.target/powerpc/vec-splat-constant-v2di.c   |  50 +++
 .../gcc.target/powerpc/vec-splati-runnable.c       |   2 +-
 14 files changed, 813 insertions(+), 29 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index c8cff1a3038..7d594872a78 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -213,6 +213,12 @@
   "A signed 34-bit integer constant if prefixed instructions are supported."
   (match_operand 0 "cint34_operand"))
 
+;; A SF/DF scalar constant or a vector constant that can be loaded into vector
+;; registers with one prefixed instruction such as XXSPLTIDP.
+(define_constraint "eP"
+  "A constant that can be loaded into a VSX register with one prefixed insn."
+  (match_operand 0 "vsx_prefixed_constant"))
+
 ;; Floating-point constraints.  These two are defined so that insn
 ;; length attributes can be calculated exactly.
 
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 956e42bc514..4b2bbdf40e8 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -601,6 +601,15 @@
   if (TARGET_VSX && op == CONST0_RTX (mode))
     return 1;
 
+  /* Constants that can be generated with ISA 3.1 instructions are easy.  */
+  rs6000_vec_const vec_const;
+
+  if (TARGET_POWER10 && vec_const_to_bytes (op, mode, &vec_const))
+    {
+      if (vec_const_use_xxspltidp (&vec_const))
+	return true;
+    }
+
   /* Otherwise consider floating point constants hard, so that the
      constant gets pushed to memory during the early RTL phases.  This
      has the advantage that double precision constants that can be
@@ -609,6 +618,38 @@
    return 0;
 })
 
+;; Return 1 if the operand is a 64-bit floating point scalar constant or a
+;; vector constant that can be loaded to a VSX register with one prefixed
+;; instruction, such as XXSPLTIDP.
+;;
+;; In addition regular constants, we also recognize constants formed with the
+;; VEC_DUPLICATE insn from scalar constants.
+;;
+;; We don't handle scalar integer constants here because the assumption is the
+;; normal integer constants will be loaded into GPR registers.  For the
+;; constants that need to be loaded into vector registers, the instructions
+;; don't work well with TImode variables assigned a constant.  This is because
+;; the 64-bit scalar constants are splatted into both halves of the register.
+
+(define_predicate "vsx_prefixed_constant"
+  (match_code "const_double,const_vector,vec_duplicate")
+{
+  rs6000_vec_const vec_const;
+
+  /* Do we have prefixed instructions and are VSX registers available?  Is the
+     constant recognized?  */
+  if (!TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  if (!vec_const_to_bytes (op, mode, &vec_const))
+    return false;
+  
+  if (vec_const_use_xxspltidp (&vec_const))
+    return true;
+
+  return false;
+})
+
 ;; Return 1 if the operand is a constant that can loaded with a XXSPLTIB
 ;; instruction and then a VUPKHSB, VECSB2W or VECSB2D instruction.
 
@@ -657,6 +698,16 @@
           && xxspltib_constant_p (op, mode, &num_insns, &value))
 	return true;
 
+      /* See if the constant can be generated with the ISA 3.1
+         instructions.  */
+      rs6000_vec_const vec_const;
+
+      if (TARGET_POWER10 && vec_const_to_bytes (op, mode, &vec_const))
+	{
+	  if (vec_const_use_xxspltidp (&vec_const))
+	    return true;
+	}
+
       return easy_altivec_constant (op, mode);
     }
 
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 14f6b313105..8eef955237a 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -198,6 +198,7 @@ enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode);
 extern bool prefixed_load_p (rtx_insn *);
 extern bool prefixed_store_p (rtx_insn *);
 extern bool prefixed_paddi_p (rtx_insn *);
+extern bool prefixed_xxsplti_p (rtx_insn *);
 extern void rs6000_asm_output_opcode (FILE *);
 extern void output_pcrel_opt_reloc (rtx);
 extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int);
@@ -222,6 +223,32 @@ address_is_prefixed (rtx addr,
   return (iform == INSN_FORM_PREFIXED_NUMERIC
 	  || iform == INSN_FORM_PCREL_LOCAL);
 }
+
+/* Functions and data structures relating to 128-bit vector constants.  All
+   fields are kept in big endian order.  */
+#define VECTOR_CONST_BITS		128
+#define VECTOR_CONST_BYTES		(VECTOR_CONST_BITS / 8)
+#define VECTOR_CONST_HALF_WORDS		(VECTOR_CONST_BITS / 16)
+#define VECTOR_CONST_WORDS		(VECTOR_CONST_BITS / 32)
+#define VECTOR_CONST_DOUBLE_WORDS	(VECTOR_CONST_BITS / 64)
+
+typedef struct {
+  /* Vector constant as various sized items.  */
+  unsigned HOST_WIDE_INT double_words[VECTOR_CONST_DOUBLE_WORDS];
+  unsigned int words[VECTOR_CONST_WORDS];
+  unsigned short half_words[VECTOR_CONST_HALF_WORDS];
+  unsigned char bytes[VECTOR_CONST_BYTES];
+
+  unsigned int xxspltidp_immediate;	/* Immediate value for XXSPLTIDP.  */
+  bool fp_constant_p;			/* Is the constant floating point?  */
+  bool all_double_words_same;		/* Are the double words all equal?  */
+  bool all_words_same;			/* Are the words all equal?  */
+  bool all_half_words_same;		/* Are the halft words all equal?  */
+  bool all_bytes_same;			/* Are the bytes all equal?  */
+} rs6000_vec_const;
+
+extern bool vec_const_to_bytes (rtx, machine_mode, rs6000_vec_const *);
+extern bool vec_const_use_xxspltidp (rs6000_vec_const *);
 #endif /* RTX_CODE */
 
 #ifdef TREE_CODE
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index acba4d9f26c..353ec2b572d 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6990,6 +6990,16 @@ output_vec_const_move (rtx *operands)
 	    gcc_unreachable ();
 	}
 
+      rs6000_vec_const vec_const;
+      if (TARGET_POWER10 && vec_const_to_bytes (vec, mode, &vec_const))
+	{
+	  if (vec_const_use_xxspltidp (&vec_const))
+	    {
+	      operands[2] = GEN_INT (vec_const.xxspltidp_immediate);
+	      return "xxspltidp %x0,%2";
+	    }
+	}
+
       if (TARGET_P9_VECTOR
 	  && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
 	{
@@ -26724,6 +26734,41 @@ prefixed_paddi_p (rtx_insn *insn)
   return (iform == INSN_FORM_PCREL_EXTERNAL || iform == INSN_FORM_PCREL_LOCAL);
 }
 
+/* Whether a permute type instruction is a prefixed XXSPLTI* instruction.
+   This is called from the prefixed attribute processing.  */
+
+bool
+prefixed_xxsplti_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  if (!set)
+    return false;
+
+  rtx dest = SET_DEST (set);
+  rtx src = SET_SRC (set);
+  machine_mode mode = GET_MODE (dest);
+
+  if (!REG_P (dest) && !SUBREG_P (dest))
+    return false;
+
+  if (GET_CODE (src) == UNSPEC)
+    {
+      int unspec = XINT (src, 1);
+      return (unspec == UNSPEC_XXSPLTIW
+	      || unspec == UNSPEC_XXSPLTIDP
+	      || unspec == UNSPEC_XXSPLTI32DX);
+    }
+
+  rs6000_vec_const vec_const;
+  if (vec_const_to_bytes (src, mode, &vec_const))
+    {
+      if (vec_const_use_xxspltidp (&vec_const))
+	return true;
+    }
+
+  return false;
+}
+
 /* Whether the next instruction needs a 'p' prefix issued before the
    instruction is printed out.  */
 static bool prepend_p_to_next_insn;
@@ -28587,6 +28632,348 @@ rs6000_output_addr_vec_elt (FILE *file, int value)
   fprintf (file, "\n");
 }
 
+\f
+/* Copy an integer constant to the vector constant structure.  */
+
+static void
+vec_const_integer (rtx op,
+		   machine_mode mode,
+		   size_t byte_num,
+		   rs6000_vec_const *vec_const)
+{
+  unsigned HOST_WIDE_INT uvalue = UINTVAL (op);
+  unsigned bitsize = GET_MODE_BITSIZE (mode);
+
+  for (int shift = bitsize - 8; shift >= 0; shift -= 8)
+    vec_const->bytes[byte_num++] = (uvalue >> shift) & 0xff;
+}
+
+/* Copy an floating point constant to the vector constant structure.  */
+
+static void
+vec_const_floating_point (rtx op,
+			  machine_mode mode,
+			  size_t byte_num,
+			  rs6000_vec_const *vec_const)
+{
+  unsigned bitsize = GET_MODE_BITSIZE (mode);
+  unsigned num_words = bitsize / 32;
+  const REAL_VALUE_TYPE *rtype = CONST_DOUBLE_REAL_VALUE (op);
+  long real_words[VECTOR_CONST_WORDS];
+
+  /* Make sure we don't overflow the real_words array and that it is
+     filled completely.  */
+  gcc_assert (bitsize <= VECTOR_CONST_BITS && (bitsize % 32) == 0);
+
+  real_to_target (real_words, rtype, mode);
+
+  /* Iterate over each 32-bit word in the floating point constant.  The
+     real_to_target function puts out words in endian fashion.  We need
+     to arrange so the words are written in big endian order.  */
+  for (unsigned num = 0; num < num_words; num++)
+    {
+      unsigned endian_num = (BYTES_BIG_ENDIAN
+			     ? num
+			     : num_words - 1 - num);
+
+      unsigned uvalue = real_words[endian_num];
+      for (int shift = 32 - 8; shift >= 0; shift -= 8)
+	vec_const->bytes[byte_num++] = (uvalue >> shift) & 0xff;
+    }
+
+  /* Mark that this constant involes floating point.  */
+  vec_const->fp_constant_p = true;
+}
+
+/* Determine if a vector constant can be loaded with XXSPLTIDP.  If so,
+   fill out the fields used to generate the instruction.  */
+
+bool
+vec_const_use_xxspltidp (rs6000_vec_const *vec_const)
+{
+  if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  /* Make sure that the two 64-bit segments are the same.  */
+  if (!vec_const->all_double_words_same)
+    return false;
+
+  /* If the bytes, half words, or words are all the same, don't use XXSPLTIDP.
+     Use a simpler instruction (XXSPLTIB, VSPLTISB, VSPLTISH, or VSPLTISW).  */
+  if (vec_const->all_bytes_same
+      || vec_const->all_half_words_same
+      || vec_const->all_words_same)
+    return false;
+
+  unsigned HOST_WIDE_INT value = vec_const->double_words[0];
+
+  /* Avoid values that look like DFmode NaN's, except for the normal NaN bit
+     pattern and the signalling NaN bit pattern.  Recognize infinity and
+     negative infinity.  */
+
+  /* Bit representation of DFmode normal quiet NaN.  */
+#define VECTOR_CONST_DF_NAN	HOST_WIDE_INT_UC (0x7ff8000000000000)
+
+  /* Bit representation of DFmode normal signaling NaN.  */
+#define VECTOR_CONST_DF_NANS	HOST_WIDE_INT_UC (0x7ff4000000000000)
+
+  /* Bit representation of DFmode positive infinity.  */
+#define VECTOR_CONST_DF_INF	HOST_WIDE_INT_UC (0x7ff0000000000000)
+
+  /* Bit representation of DFmode negative infinity.  */
+#define VECTOR_CONST_DF_NEG_INF	HOST_WIDE_INT_UC (0xfff0000000000000)
+
+  if (value != VECTOR_CONST_DF_NAN
+      && value != VECTOR_CONST_DF_NANS
+      && value != VECTOR_CONST_DF_INF
+      && value != VECTOR_CONST_DF_NEG_INF)
+    {
+      /* The IEEE 754 64-bit floating format has 1 bit for sign, 11 bits for
+	 the exponent, and 52 bits for the mantissa (not counting the hidden
+	 bit used for normal numbers).  NaN values have the exponent set to all
+	 1 bits, and the mantissa non-zero (mantissa == 0 is infinity).  */
+
+      int df_exponent = (value >> 52) & 0x7ff;
+      unsigned HOST_WIDE_INT df_mantissa
+	= value & ((HOST_WIDE_INT_1U << 52) - HOST_WIDE_INT_1U);
+
+      if (df_exponent == 0x7ff && df_mantissa != 0)	/* other NaNs.  */
+	return false;
+
+      /* Avoid values that are DFmode subnormal values.  Subnormal numbers have
+	 the exponent all 0 bits, and the mantissa non-zero.  If the value is
+	 subnormal, then the hidden bit in the mantissa is not set.  */
+      if (df_exponent == 0 && df_mantissa != 0)		/* subnormal.  */
+	return false;
+    }
+
+  /* Change the representation to DFmode constant.  */
+  long df_words[2] = { vec_const->words[0], vec_const->words[1] };
+
+  /* real_from_target takes the target words in  target order.  */
+  if (!BYTES_BIG_ENDIAN)
+    std::swap (df_words[0], df_words[1]);
+
+  REAL_VALUE_TYPE rv_type;
+  real_from_target (&rv_type, df_words, DFmode);
+
+  const REAL_VALUE_TYPE *rv = &rv_type;
+
+  /* Validate that the number can be stored as a SFmode value.  */
+  if (!exact_real_truncate (SFmode, rv))
+    return false;
+
+  /* Validate that the number is not a SFmode subnormal value (exponent is 0,
+     mantissa field is non-zero) which is undefined for the XXSPLTIDP
+     instruction.  */
+  long sf_value;
+  real_to_target (&sf_value, rv, SFmode);
+
+  /* IEEE 754 32-bit values have 1 bit for the sign, 8 bits for the exponent,
+     and 23 bits for the mantissa.  Subnormal numbers have the exponent all
+     0 bits, and the mantissa non-zero.  */
+  long sf_exponent = (sf_value >> 23) & 0xFF;
+  long sf_mantissa = sf_value & 0x7FFFFF;
+
+  if (sf_exponent == 0 && sf_mantissa != 0)
+    return false;
+
+  /* Record the information in the vec_const structure for XXSPLTIDP.  */
+  vec_const->xxspltidp_immediate = sf_value;
+
+  return true;
+}
+
+/* Convert a vector constant to an internal structure, breaking it out to
+   bytes, half words, words, and double words.  Return true if we have
+   successfully broken it out.  */
+
+bool
+vec_const_to_bytes (rtx op,
+		    machine_mode mode,
+		    rs6000_vec_const *vec_const)
+{
+  /* Initialize vec const structure.  */
+  memset ((void *)vec_const, 0, sizeof (rs6000_vec_const));
+
+  /* Set up the vector bits.  */
+  switch (GET_CODE (op))
+    {
+      /* Integer constants, default to double word.  */
+    case CONST_INT:
+      {
+	/* Scalars are treated as 64-bit integers.  */
+	if (mode == VOIDmode)
+	  mode = DImode;
+
+	vec_const_integer (op, mode, 0, vec_const);
+
+	/* Splat the constant to the rest of the vector constant structure.  */
+	unsigned size = GET_MODE_SIZE (mode);
+	gcc_assert (size <= VECTOR_CONST_BYTES);
+	gcc_assert ((VECTOR_CONST_BYTES % size) == 0);
+
+	for (size_t splat = size; splat < VECTOR_CONST_BYTES; splat += size)
+	  memcpy ((void *) &vec_const->bytes[splat],
+		  (void *) &vec_const->bytes[0],
+		  size);
+	break;
+      }
+
+      /* Floating point constants.  */
+    case CONST_DOUBLE:
+      {
+	/* Fail if the floating point constant is the wrong mode.  */
+	if (mode == VOIDmode)
+	  mode = GET_MODE (op);
+
+	else if (GET_MODE (op) != mode)
+	  return false;
+
+	/* SFmode stored as scalars are stored in DFmode format.  */
+	if (mode == SFmode)
+	  mode = DFmode;
+
+	vec_const_floating_point (op, mode, 0, vec_const);
+
+	/* Splat the constant to the rest of the vector constant structure.  */
+	unsigned size = GET_MODE_SIZE (mode);
+	gcc_assert (size <= VECTOR_CONST_BYTES);
+	gcc_assert ((VECTOR_CONST_BYTES % size) == 0);
+
+	for (size_t splat = size; splat < VECTOR_CONST_BYTES; splat += size)
+	  memcpy ((void *) &vec_const->bytes[splat],
+		  (void *) &vec_const->bytes[0],
+		  size);
+	break;
+      }
+
+      /* Vector constants, iterate each element.  On little endian systems, we
+	 have to reverse the element numbers.  */
+    case CONST_VECTOR:
+      {
+	/* Fail if the vector constant is the wrong mode.  */
+	if (mode == VOIDmode)
+	  mode = GET_MODE (op);
+
+	else if (GET_MODE (op) != mode)
+	  return false;
+
+	machine_mode ele_mode = GET_MODE_INNER (mode);
+	size_t nunits = GET_MODE_NUNITS (mode);
+	size_t size = GET_MODE_SIZE (ele_mode);
+
+	for (size_t num = 0; num < nunits; num++)
+	  {
+	    rtx ele = (GET_CODE (op) == VEC_DUPLICATE
+		       ? XEXP (op, 0)
+		       : CONST_VECTOR_ELT (op, num));
+	    size_t byte_num = (BYTES_BIG_ENDIAN
+			       ? num
+			       : nunits - 1 - num) * size;
+
+	    if (CONST_INT_P (ele))
+	      vec_const_integer (ele, ele_mode, byte_num, vec_const);
+	    else if (CONST_DOUBLE_P (ele))
+	      vec_const_floating_point (ele, ele_mode, byte_num, vec_const);
+	    else
+	      return false;
+	  }
+
+	break;
+      }
+
+	/* Treat VEC_DUPLICATE of a constant just like a vector constant.  */
+    case VEC_DUPLICATE:
+      {
+	/* Fail if the vector duplicate is the wrong mode.  */
+	if (mode == VOIDmode)
+	  mode = GET_MODE (op);
+
+	else if (GET_MODE (op) != mode)
+	  return false;
+
+	machine_mode ele_mode = GET_MODE_INNER (mode);
+	size_t nunits = GET_MODE_NUNITS (mode);
+	size_t size = GET_MODE_SIZE (ele_mode);
+	rtx ele = XEXP (op, 0);
+
+	if (!CONST_INT_P (ele) && !CONST_DOUBLE_P (ele))
+	  return false;
+
+	for (size_t num = 0; num < nunits; num++)
+	  {
+	    size_t byte_num = num * size;
+
+	    if (CONST_INT_P (ele))
+	      vec_const_integer (ele, ele_mode, byte_num, vec_const);
+	    else
+	      vec_const_floating_point (ele, ele_mode, byte_num, vec_const);
+	  }
+
+	break;
+      }
+
+      /* Any thing else, just return failure.  */
+    default:
+      return false;
+    }
+
+  /* Pack half words together.  */
+  for (size_t i = 0; i < VECTOR_CONST_HALF_WORDS; i++)
+    vec_const->half_words[i] = ((vec_const->bytes[2*i] << 8)
+				| vec_const->bytes[(2 * i) + 1]);
+
+  /* Pack words together.  */
+  for (size_t i = 0; i < VECTOR_CONST_WORDS; i++)
+    {
+      unsigned word = 0;
+      for (size_t j = 0; j < 4; j++)
+	word = (word << 8) | vec_const->bytes[(4 * i) + j];
+
+      vec_const->words[i] = word;
+    }
+
+  /* Pack double words together.  */
+  for (size_t i = 0; i < VECTOR_CONST_DOUBLE_WORDS; i++)
+    {
+      unsigned HOST_WIDE_INT d_word = 0;
+      for (size_t j = 0; j < 8; j++)
+	d_word = (d_word << 8) | vec_const->bytes[(8 * i) + j];
+
+      vec_const->double_words[i] = d_word;
+    }
+
+  /* Determine if the double words, words, half words, and bytes are all
+     equal.  */
+  unsigned HOST_WIDE_INT first_dword = vec_const->double_words[0];
+  vec_const->all_double_words_same = true;
+  for (size_t i = 1; i < VECTOR_CONST_DOUBLE_WORDS; i++)
+    if (first_dword != vec_const->double_words[i])
+      vec_const->all_double_words_same = false;
+
+  unsigned int first_word = vec_const->words[0];
+  vec_const->all_words_same = true;
+  for (size_t i = 1; i < VECTOR_CONST_WORDS; i++)
+    if (first_word != vec_const->words[i])
+      vec_const->all_words_same = false;
+
+  unsigned short first_hword = vec_const->half_words[0];
+  vec_const->all_half_words_same = true;
+  for (size_t i = 1; i < VECTOR_CONST_HALF_WORDS; i++)
+    if (first_hword != vec_const->half_words[i])
+      vec_const->all_half_words_same = false;
+
+  unsigned char first_byte = vec_const->bytes[0];
+  vec_const->all_bytes_same = true;
+  for (size_t i = 1; i < VECTOR_CONST_BYTES; i++)
+    if (first_byte != vec_const->bytes[i])
+      vec_const->all_bytes_same = false;
+
+  return true;
+}
+
+\f
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-rs6000.h"
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 6bec2bddbde..5d830e0db15 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -156,6 +156,7 @@
    UNSPEC_PEXTD
    UNSPEC_HASHST
    UNSPEC_HASHCHK
+   UNSPEC_XXSPLTIDP_CONST
   ])
 
 ;;
@@ -314,6 +315,11 @@
 
 	 (eq_attr "type" "integer,add")
 	 (if_then_else (match_test "prefixed_paddi_p (insn)")
+		       (const_string "yes")
+		       (const_string "no"))
+
+	 (eq_attr "type" "vecperm")
+	 (if_then_else (match_test "prefixed_xxsplti_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))]
 
@@ -7759,17 +7765,17 @@
 ;;
 ;;	LWZ          LFS        LXSSP       LXSSPX     STFS       STXSSP
 ;;	STXSSPX      STW        XXLXOR      LI         FMR        XSCPSGNDP
-;;	MR           MT<x>      MF<x>       NOP
+;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP
 
 (define_insn "movsf_hardfloat"
   [(set (match_operand:SF 0 "nonimmediate_operand"
 	 "=!r,       f,         v,          wa,        m,         wY,
 	  Z,         m,         wa,         !r,        f,         wa,
-	  !r,        *c*l,      !r,         *h")
+	  !r,        *c*l,      !r,         *h,        wa")
 	(match_operand:SF 1 "input_operand"
 	 "m,         m,         wY,         Z,         f,         v,
 	  wa,        r,         j,          j,         f,         wa,
-	  r,         r,         *h,         0"))]
+	  r,         r,         *h,         0,         eP"))]
   "(register_operand (operands[0], SFmode)
    || register_operand (operands[1], SFmode))
    && TARGET_HARD_FLOAT
@@ -7791,15 +7797,16 @@
    mr %0,%1
    mt%0 %1
    mf%1 %0
-   nop"
+   nop
+   #"
   [(set_attr "type"
 	"load,       fpload,    fpload,     fpload,    fpstore,   fpstore,
 	 fpstore,    store,     veclogical, integer,   fpsimple,  fpsimple,
-	 *,          mtjmpr,    mfjmpr,     *")
+	 *,          mtjmpr,    mfjmpr,     *,         vecperm")
    (set_attr "isa"
 	"*,          *,         p9v,        p8v,       *,         p9v,
 	 p8v,        *,         *,          *,         *,         *,
-	 *,          *,         *,          *")])
+	 *,          *,         *,          *,         p10")])
 
 ;;	LWZ          LFIWZX     STW        STFIWX     MTVSRWZ    MFVSRWZ
 ;;	FMR          MR         MT%0       MF%1       NOP
@@ -8059,18 +8066,18 @@
 
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSD         STXSD       XXLOR       XXLXOR      GPR<-0
-;;           LWZ          STW         MR
+;;           LWZ          STW         MR          XXSPLTIDP
 
 
 (define_insn "*mov<mode>_hardfloat32"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
             "=m,          d,          d,          <f64_p9>,   wY,
               <f64_av>,   Z,          <f64_vsx>,  <f64_vsx>,  !r,
-              Y,          r,          !r")
+              Y,          r,          !r,         wa")
 	(match_operand:FMOVE64 1 "input_operand"
              "d,          m,          d,          wY,         <f64_p9>,
               Z,          <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
-              r,          Y,          r"))]
+              r,          Y,          r,          eP"))]
   "! TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8087,20 +8094,21 @@
    #
    #
    #
+   #
    #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, two,
-             store,       load,       two")
+             store,       load,       two,        vecperm")
    (set_attr "size" "64")
    (set_attr "length"
             "*,           *,          *,          *,          *,
              *,           *,          *,          *,          8,
-             8,           8,          8")
+             8,           8,          8,          *")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
-             *,           *,          *")])
+             *,           *,          *,          p10")])
 
 ;;           STW      LWZ     MR      G-const H-const F-const
 
@@ -8127,19 +8135,19 @@
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSDX        STXSDX      XXLOR       XXLXOR      LI 0
 ;;           STD          LD          MR          MT{CTR,LR}  MF{CTR,LR}
-;;           NOP          MFVSRD      MTVSRD
+;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP
 
 (define_insn "*mov<mode>_hardfloat64"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
            "=m,           d,          d,          <f64_p9>,   wY,
              <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
              YZ,          r,          !r,         *c*l,       !r,
-            *h,           r,          <f64_dm>")
+            *h,           r,          <f64_dm>,   wa")
 	(match_operand:FMOVE64 1 "input_operand"
             "d,           m,          d,          wY,         <f64_p9>,
              Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
              r,           YZ,         r,          r,          *h,
-             0,           <f64_dm>,   r"))]
+             0,           <f64_dm>,   r,          eP"))]
   "TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8161,18 +8169,19 @@
    mf%1 %0
    nop
    mfvsrd %0,%x1
-   mtvsrd %x0,%1"
+   mtvsrd %x0,%1
+   #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, integer,
              store,       load,       *,          mtjmpr,     mfjmpr,
-             *,           mfvsr,      mtvsr")
+             *,           mfvsr,      mtvsr,      vecperm")
    (set_attr "size" "64")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
              *,           *,          *,          *,          *,
-             *,           p8v,        p8v")])
+             *,           p8v,        p8v,        p10")])
 
 ;;           STD      LD       MR      MT<SPR> MF<SPR> G-const
 ;;           H-const  F-const  Special
@@ -8206,6 +8215,47 @@
    (set_attr "length"
             "*,       *,      *,      *,      *,      8,
              12,      16,     *")])
+
+;; Split the VSX prefixed instruction to support SFmode and DFmode scalar
+;; constants that look like DFmode floating point values where both elements
+;; are the same.  The constant has to be expressible as a SFmode constant that
+;; is not a SFmode denormal value.
+;;
+;; We don't need splitters for the 128-bit types, since the function
+;; rs6000_output_move_128bit handles the generation of XXSPLTIDP.
+(define_insn "xxspltidp_<mode>_internal"
+  [(set (match_operand:SFDF 0 "register_operand" "=wa")
+	(unspec:SFDF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
+		     UNSPEC_XXSPLTIDP_CONST))]
+  "TARGET_POWER10"
+  "xxspltidp %x0,%1"
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
+(define_split
+  [(set (match_operand:SFDF 0 "vsx_register_operand")
+	(match_operand:SFDF 1 "vsx_prefixed_constant"))]
+  "TARGET_POWER10"
+  [(pc)]
+{
+  rtx dest = operands[0];
+  rtx src = operands[1];
+  rs6000_vec_const vec_const;
+
+  if (!vec_const_to_bytes (src, <MODE>mode, &vec_const))
+    gcc_unreachable ();
+
+  if (vec_const_use_xxspltidp (&vec_const))
+    {
+      rtx imm = GEN_INT (vec_const.xxspltidp_immediate);
+      emit_insn (gen_xxspltidp_<mode>_internal (dest, imm));
+      DONE;
+    }
+
+  else
+    gcc_unreachable ();
+})
+
 \f
 (define_expand "mov<mode>"
   [(set (match_operand:FMOVE128 0 "general_operand")
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 9d7878f144a..1d7ce4cc94a 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -640,6 +640,10 @@ mprivileged
 Target Var(rs6000_privileged) Init(0)
 Generate code that will run in privileged state.
 
+mxxspltidp
+Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
+Generate (do not generate) XXSPLTIDP instructions.
+
 -param=rs6000-density-pct-threshold=
 Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) IntegerRange(0, 100) Param
 When costing for loop vectorization, we probably need to penalize the loop body
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index bf033e31c1c..c8518496339 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1192,16 +1192,19 @@
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
+;;              XXLSPLTIDP
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        r,         we,        ?wQ,
                 ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
+                wa,
                 ?wa,       v,         <??r>,     wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        we,        r,         r,
                 wQ,        Y,         r,         r,         wE,        jwM,
+                eP,
                 ?jwM,      W,         <nW>,      v,         wZ"))]
 
   "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
@@ -1213,36 +1216,47 @@
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
                 store,     load,      store,     *,         vecsimple, vecsimple,
+                vecperm,
                 vecsimple, *,         *,         vecstore,  vecload")
    (set_attr "num_insns"
                "*,         *,         *,         2,         *,         2,
                 2,         2,         2,         2,         *,         *,
+                *,
                 *,         5,         2,         *,         *")
    (set_attr "max_prefixed_insns"
                "*,         *,         *,         *,         *,         2,
                 2,         2,         2,         2,         *,         *,
+                *,
                 *,         *,         *,         *,         *")
    (set_attr "length"
                "*,         *,         *,         8,         *,         8,
                 8,         8,         8,         8,         *,         *,
+                *,
                 *,         20,        8,         *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
                 *,         *,         *,         *,         p9v,       *,
+                p10,
                 <VSisa>,   *,         *,         *,         *")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
-;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
+;;              XXSPLTIB   VSPLTISW   VSX 0/-1
+;;              XXSPLTIDP
+;;              VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
-                wa,        v,         ?wa,       v,         <??r>,
+                wa,        v,         ?wa,
+                wa,
+                v,         <??r>,
                 wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        Y,         r,         r,
-                wE,        jwM,       ?jwM,      W,         <nW>,
+                wE,        jwM,       ?jwM,
+                eP,
+                W,         <nW>,
                 v,         wZ"))]
 
   "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
@@ -1253,15 +1267,21 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, load,      store,    *,
-                vecsimple, vecsimple, vecsimple, *,         *,
+                vecsimple, vecsimple, vecsimple,
+                vecperm,
+                *,         *,
                 vecstore,  vecload")
    (set_attr "length"
                "*,         *,         *,         16,        16,        16,
-                *,         *,         *,         20,        16,
+                *,         *,         *,
+                *,
+                20,        16,
                 *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
-                p9v,       *,         <VSisa>,   *,         *,
+                p9v,       *,         <VSisa>,
+                p10,
+                *,         *,
                 *,         *")])
 
 ;; Explicit  load/store expanders for the builtin functions
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 41f1850bf6e..13b56279565 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3336,6 +3336,10 @@ A constant whose negation is a signed 16-bit constant.
 @item eI
 A signed 34-bit integer constant if prefixed instructions are supported.
 
+@item eP
+A scalar floating point constant or a vector constant that can be
+loaded with one prefixed instruction to a VSX register.
+
 @ifset INTERNALS
 @item G
 A floating point constant that can be loaded into a register with one
diff --git a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
index bd1502bb30a..dcb30e1d886 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
@@ -24,11 +24,12 @@ vector signed long long splats4(void)
         return (vector signed long long) vec_sl(mzero, mzero);
 }
 
-/* Codegen will consist of splat and shift instructions for most types.
-   If folding is enabled, the vec_sl tests using vector long long type will
-   generate a lvx instead of a vspltisw+vsld pair.  */
+/* Codegen will consist of splat and shift instructions for most types.  If
+   folding is enabled, the vec_sl tests using vector long long type will
+   generate a lvx instead of a vspltisw+vsld pair.  On power10, it will
+   generate a xxspltidp instruction instead of the lvx.  */
 
 /* { dg-final { scan-assembler-times {\mvspltis[bhw]\M} 0 } } */
 /* { dg-final { scan-assembler-times {\mvsl[bhwd]\M} 0 } } */
-/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M|\mxxspltidp\M} 2 } } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
new file mode 100644
index 00000000000..8f6e176f9af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+double
+scalar_double_0 (void)
+{
+  return 0.0;			/* XXSPLTIB or XXLXOR.  */
+}
+
+double
+scalar_double_1 (void)
+{
+  return 1.0;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+double
+scalar_double_m0 (void)
+{
+  return -0.0;			/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_nan (void)
+{
+  return __builtin_nan ("");	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_inf (void)
+{
+  return __builtin_inf ();	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inf ();
+}
+#endif
+
+double
+scalar_double_pi (void)
+{
+  return M_PI;			/* PLFD.  */
+}
+
+double
+scalar_double_denorm (void)
+{
+  return 0x1p-149f;		/* PLFD.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
new file mode 100644
index 00000000000..72504bdfbbd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating SFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+float
+scalar_float_0 (void)
+{
+  return 0.0f;			/* XXSPLTIB or XXLXOR.  */
+}
+
+float
+scalar_float_1 (void)
+{
+  return 1.0f;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+float
+scalar_float_m0 (void)
+{
+  return -0.0f;			/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_nan (void)
+{
+  return __builtin_nanf ("");	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_inf (void)
+{
+  return __builtin_inff ();	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inff ();
+}
+#endif
+
+float
+scalar_float_pi (void)
+{
+  return (float)M_PI;		/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_denorm (void)
+{
+  return 0x1p-149f;		/* PLFS.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
new file mode 100644
index 00000000000..82ffc86f8aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
@@ -0,0 +1,64 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating V2DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+vector double
+v2df_double_0 (void)
+{
+  return (vector double) { 0.0, 0.0 };			/* XXSPLTIB or XXLXOR.  */
+}
+
+vector double
+v2df_double_1 (void)
+{
+  return (vector double) { 1.0, 1.0 };			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+vector double
+v2df_double_m0 (void)
+{
+  return (vector double) { -0.0, -0.0 };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_nan (void)
+{
+  return (vector double) { __builtin_nan (""),
+			   __builtin_nan ("") };	/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_inf (void)
+{
+  return (vector double) { __builtin_inf (),
+			   __builtin_inf () };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_m_inf (void)
+{
+  return (vector double) { - __builtin_inf (),
+			   - __builtin_inf () };	/* XXSPLTIDP.  */
+}
+#endif
+
+vector double
+v2df_double_pi (void)
+{
+  return (vector double) { M_PI, M_PI };		/* PLVX.  */
+}
+
+vector double
+v2df_double_denorm (void)
+{
+  return (vector double) { (double)0x1p-149f,
+			   (double)0x1p-149f };		/* PLVX.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
new file mode 100644
index 00000000000..4d44f943d26
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test generating V2DImode constants that have the same bit pattern as
+   V2DFmode constants that can be loaded with the XXSPLTIDP instruction with
+   the ISA 3.1 (power10).  */
+
+vector long long
+vector_0 (void)
+{
+  /* XXSPLTIB or XXLXOR.  */
+  return (vector long long) { 0LL, 0LL };
+}
+
+vector long long
+vector_1 (void)
+{
+  /* XXSPLTIB and VEXTSB2D.  */
+  return (vector long long) { 1LL, 1LL };
+}
+
+/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated
+   with XXSPLTISDP.  */
+vector long long
+vector_float_neg_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x8000000000000000LL, 0x8000000000000000LL };
+}
+
+/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with
+   XXSPLTISDP.  */
+vector long long
+vector_float_1_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x3ff0000000000000LL, 0x3ff0000000000000LL };
+}
+
+/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated
+   with XXSPLTIDP.  */
+vector long long
+scalar_pi (void)
+{
+  /* PLXV.  */
+  return (vector long long) { 0x400921fb54442d18LL, 0x400921fb54442d18LL };
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
index a135279b1d7..5f84930e1a7 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -150,7 +150,7 @@ main (int argc, char *argv [])
 }
 
 /* { dg-final { scan-assembler-times {\mxxspltiw\M} 2 } } */
-/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [gcc(refs/users/meissner/heads/work071)] Generate XXSPLTIDP on power10.
@ 2021-10-18 17:02 Michael Meissner
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Meissner @ 2021-10-18 17:02 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:36dfb31512eaa866cc27741b300f6ef38671a640

commit 36dfb31512eaa866cc27741b300f6ef38671a640
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Mon Oct 18 13:01:36 2021 -0400

    Generate XXSPLTIDP on power10.
    
    This patch implements XXSPLTIDP support for SF, and DF scalar constants and all
    vector constants.  The XXSPLTIDP instruction is given a 32-bit immediate that
    is converted to a vector of two DFmode constants.  The immediate is in SFmode
    format, so only constants that fit as SFmode values can be loaded with
    XXSPLTIDP.
    
    I added one new constraint (eP) which matches instructions that load a VSX
    register with one prefixed instruction.  This patch adds the XXSPLTIDP
    support.  The next patch will add the XXSPLTIW support.
    
    DImode scalar constants are not handled.  This is due to the majority of DImode
    constants will be in the GPR registers.  With vector registers, you have the
    problem that XXSPLTIDP splats the double word into both elements of the
    vector.  However, if TImode is loaded with an integer constant, it wants a full
    128-bit constant.
    
    I have added a temporary switch (-mxxspltidp) to control whether or not the
    XXSPLTIDP instruction is generated.
    
    I added 4 new tests to test loading up SF/DF scalar and vector constants.
    
    This patch updates the previous patch to take into account the comments
    from the patch review.   The main change is that this patch does is map
    each vector and scalar to provide all of bits and then match those bits to
    see if the XXSPLTIDP instruction can generate the bits necessary, even if
    the values in the vector aren't DFmode constants.
    
    A framework is provided in this patch which will also be used in future
    patches adding LXVKQ and XXSPLTIW support (possibly XXSPLTI32DX).  This
    makes it easy for easy_fp_constant and easy_vector_constant to have
    multiple checks (such as for XXSPLTIW, LXVKQ, and XXSPLTI32DX) that each
    want to build the bitmask of what the vector constant looks like.
    
    While the PowerPC is currently limited to 128-bit vectors, I have written
    the code so it can be changed in the future if we ever have larger vection
    sizes.
    
    2021-10-18  Michael Meissner  <meissner@the-meissners.org>
    
    gcc/
    
            * config/rs6000/constraints.md (eP): New constraint.
            * config/rs6000/predicates.md (easy_fp_constant): Add support for
            generating XXSPLTIDP.
            (vsx_prefixed_constant): New predicate.
            (easy_vector_constant): Add support for generating XXSPLTIDP.
            * config/rs6000/rs6000-protos.h (prefixed_xxsplti_p): New
            declaration.
            (VECTOR_CONST_*): New macros.
            (rs6000_vec_const): New structure to hold information about vector
            constants.
            (vec_const_to_bytes): New function.
            (vec_const_use_xxspltidp): New function.
            * config/rs6000/rs6000.c (output_vec_const_move): Add support for
            XXSPLTIDP.
            (prefixed_xxsplti_p): New function.
            (vec_const_integer): New helper function.
            (vec_const_floating_point): New helper function.
            (vec_const_use_xxspltidp): New function.
            (vec_const_to_bytes): New function.
            * config/rs6000/rs6000.md (UNSPEC_VSX_PREFIXED_CONST): New unspec.
            (prefixed attribute): Add support for prefixed instructions to load
            * constants into VSX registers.
            (movsf_hardfloat): Add support for XXSPLTIDP.
            (mov<mode>_hardfloat32, FMOVE64 iterator): Likewise.
            (mov<mode>_hardfloat64, FMOVE64 iterator): Likewise.
            (xxspltidp_<mode>_internal): New insns.
            (splitter for VSX prefix constants): New splitters.
            * config/rs6000/rs6000.opt (-mxxspltidp): New debug option.
            * config/rs6000/vsx.md (vsx_mov<mode>_64bit): Add support for
            XXSPLTIDP.
            (vsx_mov<mode>_32bit): Likewise.
            * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
            eP constraint.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/pr86731-fwrapv-longlong.c: Update insn
            regex for power10.
            * gcc.target/powerpc/vec-splat-constant-df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2di.c: New test.
            * gcc.target/powerpc/vec-splati-runnable.c: Update insn counts.

Diff:
---
 gcc/config/rs6000/constraints.md                   |   6 +
 gcc/config/rs6000/predicates.md                    |  51 +++
 gcc/config/rs6000/rs6000-protos.h                  |  27 ++
 gcc/config/rs6000/rs6000.c                         | 387 +++++++++++++++++++++
 gcc/config/rs6000/rs6000.md                        |  82 ++++-
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           |  32 +-
 gcc/doc/md.texi                                    |   4 +
 .../gcc.target/powerpc/pr86731-fwrapv-longlong.c   |   9 +-
 .../gcc.target/powerpc/vec-splat-constant-df.c     |  60 ++++
 .../gcc.target/powerpc/vec-splat-constant-sf.c     |  60 ++++
 .../gcc.target/powerpc/vec-splat-constant-v2df.c   |  64 ++++
 .../gcc.target/powerpc/vec-splat-constant-v2di.c   |  50 +++
 .../gcc.target/powerpc/vec-splati-runnable.c       |   2 +-
 14 files changed, 809 insertions(+), 29 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index c8cff1a3038..7d594872a78 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -213,6 +213,12 @@
   "A signed 34-bit integer constant if prefixed instructions are supported."
   (match_operand 0 "cint34_operand"))
 
+;; A SF/DF scalar constant or a vector constant that can be loaded into vector
+;; registers with one prefixed instruction such as XXSPLTIDP.
+(define_constraint "eP"
+  "A constant that can be loaded into a VSX register with one prefixed insn."
+  (match_operand 0 "vsx_prefixed_constant"))
+
 ;; Floating-point constraints.  These two are defined so that insn
 ;; length attributes can be calculated exactly.
 
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 956e42bc514..4b2bbdf40e8 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -601,6 +601,15 @@
   if (TARGET_VSX && op == CONST0_RTX (mode))
     return 1;
 
+  /* Constants that can be generated with ISA 3.1 instructions are easy.  */
+  rs6000_vec_const vec_const;
+
+  if (TARGET_POWER10 && vec_const_to_bytes (op, mode, &vec_const))
+    {
+      if (vec_const_use_xxspltidp (&vec_const))
+	return true;
+    }
+
   /* Otherwise consider floating point constants hard, so that the
      constant gets pushed to memory during the early RTL phases.  This
      has the advantage that double precision constants that can be
@@ -609,6 +618,38 @@
    return 0;
 })
 
+;; Return 1 if the operand is a 64-bit floating point scalar constant or a
+;; vector constant that can be loaded to a VSX register with one prefixed
+;; instruction, such as XXSPLTIDP.
+;;
+;; In addition regular constants, we also recognize constants formed with the
+;; VEC_DUPLICATE insn from scalar constants.
+;;
+;; We don't handle scalar integer constants here because the assumption is the
+;; normal integer constants will be loaded into GPR registers.  For the
+;; constants that need to be loaded into vector registers, the instructions
+;; don't work well with TImode variables assigned a constant.  This is because
+;; the 64-bit scalar constants are splatted into both halves of the register.
+
+(define_predicate "vsx_prefixed_constant"
+  (match_code "const_double,const_vector,vec_duplicate")
+{
+  rs6000_vec_const vec_const;
+
+  /* Do we have prefixed instructions and are VSX registers available?  Is the
+     constant recognized?  */
+  if (!TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  if (!vec_const_to_bytes (op, mode, &vec_const))
+    return false;
+  
+  if (vec_const_use_xxspltidp (&vec_const))
+    return true;
+
+  return false;
+})
+
 ;; Return 1 if the operand is a constant that can loaded with a XXSPLTIB
 ;; instruction and then a VUPKHSB, VECSB2W or VECSB2D instruction.
 
@@ -657,6 +698,16 @@
           && xxspltib_constant_p (op, mode, &num_insns, &value))
 	return true;
 
+      /* See if the constant can be generated with the ISA 3.1
+         instructions.  */
+      rs6000_vec_const vec_const;
+
+      if (TARGET_POWER10 && vec_const_to_bytes (op, mode, &vec_const))
+	{
+	  if (vec_const_use_xxspltidp (&vec_const))
+	    return true;
+	}
+
       return easy_altivec_constant (op, mode);
     }
 
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 14f6b313105..8eef955237a 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -198,6 +198,7 @@ enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode);
 extern bool prefixed_load_p (rtx_insn *);
 extern bool prefixed_store_p (rtx_insn *);
 extern bool prefixed_paddi_p (rtx_insn *);
+extern bool prefixed_xxsplti_p (rtx_insn *);
 extern void rs6000_asm_output_opcode (FILE *);
 extern void output_pcrel_opt_reloc (rtx);
 extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int);
@@ -222,6 +223,32 @@ address_is_prefixed (rtx addr,
   return (iform == INSN_FORM_PREFIXED_NUMERIC
 	  || iform == INSN_FORM_PCREL_LOCAL);
 }
+
+/* Functions and data structures relating to 128-bit vector constants.  All
+   fields are kept in big endian order.  */
+#define VECTOR_CONST_BITS		128
+#define VECTOR_CONST_BYTES		(VECTOR_CONST_BITS / 8)
+#define VECTOR_CONST_HALF_WORDS		(VECTOR_CONST_BITS / 16)
+#define VECTOR_CONST_WORDS		(VECTOR_CONST_BITS / 32)
+#define VECTOR_CONST_DOUBLE_WORDS	(VECTOR_CONST_BITS / 64)
+
+typedef struct {
+  /* Vector constant as various sized items.  */
+  unsigned HOST_WIDE_INT double_words[VECTOR_CONST_DOUBLE_WORDS];
+  unsigned int words[VECTOR_CONST_WORDS];
+  unsigned short half_words[VECTOR_CONST_HALF_WORDS];
+  unsigned char bytes[VECTOR_CONST_BYTES];
+
+  unsigned int xxspltidp_immediate;	/* Immediate value for XXSPLTIDP.  */
+  bool fp_constant_p;			/* Is the constant floating point?  */
+  bool all_double_words_same;		/* Are the double words all equal?  */
+  bool all_words_same;			/* Are the words all equal?  */
+  bool all_half_words_same;		/* Are the halft words all equal?  */
+  bool all_bytes_same;			/* Are the bytes all equal?  */
+} rs6000_vec_const;
+
+extern bool vec_const_to_bytes (rtx, machine_mode, rs6000_vec_const *);
+extern bool vec_const_use_xxspltidp (rs6000_vec_const *);
 #endif /* RTX_CODE */
 
 #ifdef TREE_CODE
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index acba4d9f26c..353ec2b572d 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6990,6 +6990,16 @@ output_vec_const_move (rtx *operands)
 	    gcc_unreachable ();
 	}
 
+      rs6000_vec_const vec_const;
+      if (TARGET_POWER10 && vec_const_to_bytes (vec, mode, &vec_const))
+	{
+	  if (vec_const_use_xxspltidp (&vec_const))
+	    {
+	      operands[2] = GEN_INT (vec_const.xxspltidp_immediate);
+	      return "xxspltidp %x0,%2";
+	    }
+	}
+
       if (TARGET_P9_VECTOR
 	  && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
 	{
@@ -26724,6 +26734,41 @@ prefixed_paddi_p (rtx_insn *insn)
   return (iform == INSN_FORM_PCREL_EXTERNAL || iform == INSN_FORM_PCREL_LOCAL);
 }
 
+/* Whether a permute type instruction is a prefixed XXSPLTI* instruction.
+   This is called from the prefixed attribute processing.  */
+
+bool
+prefixed_xxsplti_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  if (!set)
+    return false;
+
+  rtx dest = SET_DEST (set);
+  rtx src = SET_SRC (set);
+  machine_mode mode = GET_MODE (dest);
+
+  if (!REG_P (dest) && !SUBREG_P (dest))
+    return false;
+
+  if (GET_CODE (src) == UNSPEC)
+    {
+      int unspec = XINT (src, 1);
+      return (unspec == UNSPEC_XXSPLTIW
+	      || unspec == UNSPEC_XXSPLTIDP
+	      || unspec == UNSPEC_XXSPLTI32DX);
+    }
+
+  rs6000_vec_const vec_const;
+  if (vec_const_to_bytes (src, mode, &vec_const))
+    {
+      if (vec_const_use_xxspltidp (&vec_const))
+	return true;
+    }
+
+  return false;
+}
+
 /* Whether the next instruction needs a 'p' prefix issued before the
    instruction is printed out.  */
 static bool prepend_p_to_next_insn;
@@ -28587,6 +28632,348 @@ rs6000_output_addr_vec_elt (FILE *file, int value)
   fprintf (file, "\n");
 }
 
+\f
+/* Copy an integer constant to the vector constant structure.  */
+
+static void
+vec_const_integer (rtx op,
+		   machine_mode mode,
+		   size_t byte_num,
+		   rs6000_vec_const *vec_const)
+{
+  unsigned HOST_WIDE_INT uvalue = UINTVAL (op);
+  unsigned bitsize = GET_MODE_BITSIZE (mode);
+
+  for (int shift = bitsize - 8; shift >= 0; shift -= 8)
+    vec_const->bytes[byte_num++] = (uvalue >> shift) & 0xff;
+}
+
+/* Copy an floating point constant to the vector constant structure.  */
+
+static void
+vec_const_floating_point (rtx op,
+			  machine_mode mode,
+			  size_t byte_num,
+			  rs6000_vec_const *vec_const)
+{
+  unsigned bitsize = GET_MODE_BITSIZE (mode);
+  unsigned num_words = bitsize / 32;
+  const REAL_VALUE_TYPE *rtype = CONST_DOUBLE_REAL_VALUE (op);
+  long real_words[VECTOR_CONST_WORDS];
+
+  /* Make sure we don't overflow the real_words array and that it is
+     filled completely.  */
+  gcc_assert (bitsize <= VECTOR_CONST_BITS && (bitsize % 32) == 0);
+
+  real_to_target (real_words, rtype, mode);
+
+  /* Iterate over each 32-bit word in the floating point constant.  The
+     real_to_target function puts out words in endian fashion.  We need
+     to arrange so the words are written in big endian order.  */
+  for (unsigned num = 0; num < num_words; num++)
+    {
+      unsigned endian_num = (BYTES_BIG_ENDIAN
+			     ? num
+			     : num_words - 1 - num);
+
+      unsigned uvalue = real_words[endian_num];
+      for (int shift = 32 - 8; shift >= 0; shift -= 8)
+	vec_const->bytes[byte_num++] = (uvalue >> shift) & 0xff;
+    }
+
+  /* Mark that this constant involes floating point.  */
+  vec_const->fp_constant_p = true;
+}
+
+/* Determine if a vector constant can be loaded with XXSPLTIDP.  If so,
+   fill out the fields used to generate the instruction.  */
+
+bool
+vec_const_use_xxspltidp (rs6000_vec_const *vec_const)
+{
+  if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  /* Make sure that the two 64-bit segments are the same.  */
+  if (!vec_const->all_double_words_same)
+    return false;
+
+  /* If the bytes, half words, or words are all the same, don't use XXSPLTIDP.
+     Use a simpler instruction (XXSPLTIB, VSPLTISB, VSPLTISH, or VSPLTISW).  */
+  if (vec_const->all_bytes_same
+      || vec_const->all_half_words_same
+      || vec_const->all_words_same)
+    return false;
+
+  unsigned HOST_WIDE_INT value = vec_const->double_words[0];
+
+  /* Avoid values that look like DFmode NaN's, except for the normal NaN bit
+     pattern and the signalling NaN bit pattern.  Recognize infinity and
+     negative infinity.  */
+
+  /* Bit representation of DFmode normal quiet NaN.  */
+#define VECTOR_CONST_DF_NAN	HOST_WIDE_INT_UC (0x7ff8000000000000)
+
+  /* Bit representation of DFmode normal signaling NaN.  */
+#define VECTOR_CONST_DF_NANS	HOST_WIDE_INT_UC (0x7ff4000000000000)
+
+  /* Bit representation of DFmode positive infinity.  */
+#define VECTOR_CONST_DF_INF	HOST_WIDE_INT_UC (0x7ff0000000000000)
+
+  /* Bit representation of DFmode negative infinity.  */
+#define VECTOR_CONST_DF_NEG_INF	HOST_WIDE_INT_UC (0xfff0000000000000)
+
+  if (value != VECTOR_CONST_DF_NAN
+      && value != VECTOR_CONST_DF_NANS
+      && value != VECTOR_CONST_DF_INF
+      && value != VECTOR_CONST_DF_NEG_INF)
+    {
+      /* The IEEE 754 64-bit floating format has 1 bit for sign, 11 bits for
+	 the exponent, and 52 bits for the mantissa (not counting the hidden
+	 bit used for normal numbers).  NaN values have the exponent set to all
+	 1 bits, and the mantissa non-zero (mantissa == 0 is infinity).  */
+
+      int df_exponent = (value >> 52) & 0x7ff;
+      unsigned HOST_WIDE_INT df_mantissa
+	= value & ((HOST_WIDE_INT_1U << 52) - HOST_WIDE_INT_1U);
+
+      if (df_exponent == 0x7ff && df_mantissa != 0)	/* other NaNs.  */
+	return false;
+
+      /* Avoid values that are DFmode subnormal values.  Subnormal numbers have
+	 the exponent all 0 bits, and the mantissa non-zero.  If the value is
+	 subnormal, then the hidden bit in the mantissa is not set.  */
+      if (df_exponent == 0 && df_mantissa != 0)		/* subnormal.  */
+	return false;
+    }
+
+  /* Change the representation to DFmode constant.  */
+  long df_words[2] = { vec_const->words[0], vec_const->words[1] };
+
+  /* real_from_target takes the target words in  target order.  */
+  if (!BYTES_BIG_ENDIAN)
+    std::swap (df_words[0], df_words[1]);
+
+  REAL_VALUE_TYPE rv_type;
+  real_from_target (&rv_type, df_words, DFmode);
+
+  const REAL_VALUE_TYPE *rv = &rv_type;
+
+  /* Validate that the number can be stored as a SFmode value.  */
+  if (!exact_real_truncate (SFmode, rv))
+    return false;
+
+  /* Validate that the number is not a SFmode subnormal value (exponent is 0,
+     mantissa field is non-zero) which is undefined for the XXSPLTIDP
+     instruction.  */
+  long sf_value;
+  real_to_target (&sf_value, rv, SFmode);
+
+  /* IEEE 754 32-bit values have 1 bit for the sign, 8 bits for the exponent,
+     and 23 bits for the mantissa.  Subnormal numbers have the exponent all
+     0 bits, and the mantissa non-zero.  */
+  long sf_exponent = (sf_value >> 23) & 0xFF;
+  long sf_mantissa = sf_value & 0x7FFFFF;
+
+  if (sf_exponent == 0 && sf_mantissa != 0)
+    return false;
+
+  /* Record the information in the vec_const structure for XXSPLTIDP.  */
+  vec_const->xxspltidp_immediate = sf_value;
+
+  return true;
+}
+
+/* Convert a vector constant to an internal structure, breaking it out to
+   bytes, half words, words, and double words.  Return true if we have
+   successfully broken it out.  */
+
+bool
+vec_const_to_bytes (rtx op,
+		    machine_mode mode,
+		    rs6000_vec_const *vec_const)
+{
+  /* Initialize vec const structure.  */
+  memset ((void *)vec_const, 0, sizeof (rs6000_vec_const));
+
+  /* Set up the vector bits.  */
+  switch (GET_CODE (op))
+    {
+      /* Integer constants, default to double word.  */
+    case CONST_INT:
+      {
+	/* Scalars are treated as 64-bit integers.  */
+	if (mode == VOIDmode)
+	  mode = DImode;
+
+	vec_const_integer (op, mode, 0, vec_const);
+
+	/* Splat the constant to the rest of the vector constant structure.  */
+	unsigned size = GET_MODE_SIZE (mode);
+	gcc_assert (size <= VECTOR_CONST_BYTES);
+	gcc_assert ((VECTOR_CONST_BYTES % size) == 0);
+
+	for (size_t splat = size; splat < VECTOR_CONST_BYTES; splat += size)
+	  memcpy ((void *) &vec_const->bytes[splat],
+		  (void *) &vec_const->bytes[0],
+		  size);
+	break;
+      }
+
+      /* Floating point constants.  */
+    case CONST_DOUBLE:
+      {
+	/* Fail if the floating point constant is the wrong mode.  */
+	if (mode == VOIDmode)
+	  mode = GET_MODE (op);
+
+	else if (GET_MODE (op) != mode)
+	  return false;
+
+	/* SFmode stored as scalars are stored in DFmode format.  */
+	if (mode == SFmode)
+	  mode = DFmode;
+
+	vec_const_floating_point (op, mode, 0, vec_const);
+
+	/* Splat the constant to the rest of the vector constant structure.  */
+	unsigned size = GET_MODE_SIZE (mode);
+	gcc_assert (size <= VECTOR_CONST_BYTES);
+	gcc_assert ((VECTOR_CONST_BYTES % size) == 0);
+
+	for (size_t splat = size; splat < VECTOR_CONST_BYTES; splat += size)
+	  memcpy ((void *) &vec_const->bytes[splat],
+		  (void *) &vec_const->bytes[0],
+		  size);
+	break;
+      }
+
+      /* Vector constants, iterate each element.  On little endian systems, we
+	 have to reverse the element numbers.  */
+    case CONST_VECTOR:
+      {
+	/* Fail if the vector constant is the wrong mode.  */
+	if (mode == VOIDmode)
+	  mode = GET_MODE (op);
+
+	else if (GET_MODE (op) != mode)
+	  return false;
+
+	machine_mode ele_mode = GET_MODE_INNER (mode);
+	size_t nunits = GET_MODE_NUNITS (mode);
+	size_t size = GET_MODE_SIZE (ele_mode);
+
+	for (size_t num = 0; num < nunits; num++)
+	  {
+	    rtx ele = (GET_CODE (op) == VEC_DUPLICATE
+		       ? XEXP (op, 0)
+		       : CONST_VECTOR_ELT (op, num));
+	    size_t byte_num = (BYTES_BIG_ENDIAN
+			       ? num
+			       : nunits - 1 - num) * size;
+
+	    if (CONST_INT_P (ele))
+	      vec_const_integer (ele, ele_mode, byte_num, vec_const);
+	    else if (CONST_DOUBLE_P (ele))
+	      vec_const_floating_point (ele, ele_mode, byte_num, vec_const);
+	    else
+	      return false;
+	  }
+
+	break;
+      }
+
+	/* Treat VEC_DUPLICATE of a constant just like a vector constant.  */
+    case VEC_DUPLICATE:
+      {
+	/* Fail if the vector duplicate is the wrong mode.  */
+	if (mode == VOIDmode)
+	  mode = GET_MODE (op);
+
+	else if (GET_MODE (op) != mode)
+	  return false;
+
+	machine_mode ele_mode = GET_MODE_INNER (mode);
+	size_t nunits = GET_MODE_NUNITS (mode);
+	size_t size = GET_MODE_SIZE (ele_mode);
+	rtx ele = XEXP (op, 0);
+
+	if (!CONST_INT_P (ele) && !CONST_DOUBLE_P (ele))
+	  return false;
+
+	for (size_t num = 0; num < nunits; num++)
+	  {
+	    size_t byte_num = num * size;
+
+	    if (CONST_INT_P (ele))
+	      vec_const_integer (ele, ele_mode, byte_num, vec_const);
+	    else
+	      vec_const_floating_point (ele, ele_mode, byte_num, vec_const);
+	  }
+
+	break;
+      }
+
+      /* Any thing else, just return failure.  */
+    default:
+      return false;
+    }
+
+  /* Pack half words together.  */
+  for (size_t i = 0; i < VECTOR_CONST_HALF_WORDS; i++)
+    vec_const->half_words[i] = ((vec_const->bytes[2*i] << 8)
+				| vec_const->bytes[(2 * i) + 1]);
+
+  /* Pack words together.  */
+  for (size_t i = 0; i < VECTOR_CONST_WORDS; i++)
+    {
+      unsigned word = 0;
+      for (size_t j = 0; j < 4; j++)
+	word = (word << 8) | vec_const->bytes[(4 * i) + j];
+
+      vec_const->words[i] = word;
+    }
+
+  /* Pack double words together.  */
+  for (size_t i = 0; i < VECTOR_CONST_DOUBLE_WORDS; i++)
+    {
+      unsigned HOST_WIDE_INT d_word = 0;
+      for (size_t j = 0; j < 8; j++)
+	d_word = (d_word << 8) | vec_const->bytes[(8 * i) + j];
+
+      vec_const->double_words[i] = d_word;
+    }
+
+  /* Determine if the double words, words, half words, and bytes are all
+     equal.  */
+  unsigned HOST_WIDE_INT first_dword = vec_const->double_words[0];
+  vec_const->all_double_words_same = true;
+  for (size_t i = 1; i < VECTOR_CONST_DOUBLE_WORDS; i++)
+    if (first_dword != vec_const->double_words[i])
+      vec_const->all_double_words_same = false;
+
+  unsigned int first_word = vec_const->words[0];
+  vec_const->all_words_same = true;
+  for (size_t i = 1; i < VECTOR_CONST_WORDS; i++)
+    if (first_word != vec_const->words[i])
+      vec_const->all_words_same = false;
+
+  unsigned short first_hword = vec_const->half_words[0];
+  vec_const->all_half_words_same = true;
+  for (size_t i = 1; i < VECTOR_CONST_HALF_WORDS; i++)
+    if (first_hword != vec_const->half_words[i])
+      vec_const->all_half_words_same = false;
+
+  unsigned char first_byte = vec_const->bytes[0];
+  vec_const->all_bytes_same = true;
+  for (size_t i = 1; i < VECTOR_CONST_BYTES; i++)
+    if (first_byte != vec_const->bytes[i])
+      vec_const->all_bytes_same = false;
+
+  return true;
+}
+
+\f
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-rs6000.h"
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 6bec2bddbde..b0ead908fe9 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -156,6 +156,7 @@
    UNSPEC_PEXTD
    UNSPEC_HASHST
    UNSPEC_HASHCHK
+   UNSPEC_VSX_PREFIXED_CONST
   ])
 
 ;;
@@ -314,6 +315,11 @@
 
 	 (eq_attr "type" "integer,add")
 	 (if_then_else (match_test "prefixed_paddi_p (insn)")
+		       (const_string "yes")
+		       (const_string "no"))
+
+	 (eq_attr "type" "vecperm")
+	 (if_then_else (match_test "prefixed_xxsplti_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))]
 
@@ -7759,17 +7765,17 @@
 ;;
 ;;	LWZ          LFS        LXSSP       LXSSPX     STFS       STXSSP
 ;;	STXSSPX      STW        XXLXOR      LI         FMR        XSCPSGNDP
-;;	MR           MT<x>      MF<x>       NOP
+;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP
 
 (define_insn "movsf_hardfloat"
   [(set (match_operand:SF 0 "nonimmediate_operand"
 	 "=!r,       f,         v,          wa,        m,         wY,
 	  Z,         m,         wa,         !r,        f,         wa,
-	  !r,        *c*l,      !r,         *h")
+	  !r,        *c*l,      !r,         *h,        wa")
 	(match_operand:SF 1 "input_operand"
 	 "m,         m,         wY,         Z,         f,         v,
 	  wa,        r,         j,          j,         f,         wa,
-	  r,         r,         *h,         0"))]
+	  r,         r,         *h,         0,         eP"))]
   "(register_operand (operands[0], SFmode)
    || register_operand (operands[1], SFmode))
    && TARGET_HARD_FLOAT
@@ -7791,15 +7797,16 @@
    mr %0,%1
    mt%0 %1
    mf%1 %0
-   nop"
+   nop
+   #"
   [(set_attr "type"
 	"load,       fpload,    fpload,     fpload,    fpstore,   fpstore,
 	 fpstore,    store,     veclogical, integer,   fpsimple,  fpsimple,
-	 *,          mtjmpr,    mfjmpr,     *")
+	 *,          mtjmpr,    mfjmpr,     *,         vecperm")
    (set_attr "isa"
 	"*,          *,         p9v,        p8v,       *,         p9v,
 	 p8v,        *,         *,          *,         *,         *,
-	 *,          *,         *,          *")])
+	 *,          *,         *,          *,         p10")])
 
 ;;	LWZ          LFIWZX     STW        STFIWX     MTVSRWZ    MFVSRWZ
 ;;	FMR          MR         MT%0       MF%1       NOP
@@ -8059,18 +8066,18 @@
 
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSD         STXSD       XXLOR       XXLXOR      GPR<-0
-;;           LWZ          STW         MR
+;;           LWZ          STW         MR          XXSPLTIDP
 
 
 (define_insn "*mov<mode>_hardfloat32"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
             "=m,          d,          d,          <f64_p9>,   wY,
               <f64_av>,   Z,          <f64_vsx>,  <f64_vsx>,  !r,
-              Y,          r,          !r")
+              Y,          r,          !r,         wa")
 	(match_operand:FMOVE64 1 "input_operand"
              "d,          m,          d,          wY,         <f64_p9>,
               Z,          <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
-              r,          Y,          r"))]
+              r,          Y,          r,          eP"))]
   "! TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8087,20 +8094,21 @@
    #
    #
    #
+   #
    #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, two,
-             store,       load,       two")
+             store,       load,       two,        vecperm")
    (set_attr "size" "64")
    (set_attr "length"
             "*,           *,          *,          *,          *,
              *,           *,          *,          *,          8,
-             8,           8,          8")
+             8,           8,          8,          *")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
-             *,           *,          *")])
+             *,           *,          *,          p10")])
 
 ;;           STW      LWZ     MR      G-const H-const F-const
 
@@ -8127,19 +8135,19 @@
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSDX        STXSDX      XXLOR       XXLXOR      LI 0
 ;;           STD          LD          MR          MT{CTR,LR}  MF{CTR,LR}
-;;           NOP          MFVSRD      MTVSRD
+;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP
 
 (define_insn "*mov<mode>_hardfloat64"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
            "=m,           d,          d,          <f64_p9>,   wY,
              <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
              YZ,          r,          !r,         *c*l,       !r,
-            *h,           r,          <f64_dm>")
+            *h,           r,          <f64_dm>,   wa")
 	(match_operand:FMOVE64 1 "input_operand"
             "d,           m,          d,          wY,         <f64_p9>,
              Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
              r,           YZ,         r,          r,          *h,
-             0,           <f64_dm>,   r"))]
+             0,           <f64_dm>,   r,          eP"))]
   "TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8161,18 +8169,19 @@
    mf%1 %0
    nop
    mfvsrd %0,%x1
-   mtvsrd %x0,%1"
+   mtvsrd %x0,%1
+   #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, integer,
              store,       load,       *,          mtjmpr,     mfjmpr,
-             *,           mfvsr,      mtvsr")
+             *,           mfvsr,      mtvsr,      vecperm")
    (set_attr "size" "64")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
              *,           *,          *,          *,          *,
-             *,           p8v,        p8v")])
+             *,           p8v,        p8v,        p10")])
 
 ;;           STD      LD       MR      MT<SPR> MF<SPR> G-const
 ;;           H-const  F-const  Special
@@ -8206,6 +8215,43 @@
    (set_attr "length"
             "*,       *,      *,      *,      *,      8,
              12,      16,     *")])
+
+;; Split the VSX prefixed instruction to support SFmode and DFmode scalar
+;; constants that look like DFmode floating point values where both elements
+;; are the same.  The constant has to be expressible as a SFmode constant that
+;; is not a SFmode denormal value.
+;;
+;; We don't need splitters for the 128-bit types, since the function
+;; rs6000_output_move_128bit handles the generation of XXSPLTIDP.
+(define_insn "*xxspltidp_<mode>_internal"
+  [(set (match_operand:SFDF 0 "register_operand" "=wa")
+	(unspec:SFDF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
+		     UNSPEC_VSX_PREFIXED_CONST))]
+  "TARGET_POWER10"
+  "xxspltidp %x0,%1"
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
+(define_split
+  [(set (match_operand:SFDF 0 "vsx_register_operand")
+	(match_operand:SFDF 1 "vsx_prefixed_constant"))]
+  "TARGET_POWER10"
+  [(set (match_dup 0)
+	(unspec:SFDF [(match_dup 2)] UNSPEC_VSX_PREFIXED_CONST))]
+{
+  rtx src = operands[1];
+  rs6000_vec_const vec_const;
+
+  if (!vec_const_to_bytes (src, <MODE>mode, &vec_const))
+    gcc_unreachable ();
+
+  if (vec_const_use_xxspltidp (&vec_const))
+    operands[2] = GEN_INT (vec_const.xxspltidp_immediate);
+
+  else
+    gcc_unreachable ();
+})
+
 \f
 (define_expand "mov<mode>"
   [(set (match_operand:FMOVE128 0 "general_operand")
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 9d7878f144a..1d7ce4cc94a 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -640,6 +640,10 @@ mprivileged
 Target Var(rs6000_privileged) Init(0)
 Generate code that will run in privileged state.
 
+mxxspltidp
+Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
+Generate (do not generate) XXSPLTIDP instructions.
+
 -param=rs6000-density-pct-threshold=
 Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) IntegerRange(0, 100) Param
 When costing for loop vectorization, we probably need to penalize the loop body
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index bf033e31c1c..c8518496339 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1192,16 +1192,19 @@
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
+;;              XXLSPLTIDP
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        r,         we,        ?wQ,
                 ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
+                wa,
                 ?wa,       v,         <??r>,     wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        we,        r,         r,
                 wQ,        Y,         r,         r,         wE,        jwM,
+                eP,
                 ?jwM,      W,         <nW>,      v,         wZ"))]
 
   "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
@@ -1213,36 +1216,47 @@
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
                 store,     load,      store,     *,         vecsimple, vecsimple,
+                vecperm,
                 vecsimple, *,         *,         vecstore,  vecload")
    (set_attr "num_insns"
                "*,         *,         *,         2,         *,         2,
                 2,         2,         2,         2,         *,         *,
+                *,
                 *,         5,         2,         *,         *")
    (set_attr "max_prefixed_insns"
                "*,         *,         *,         *,         *,         2,
                 2,         2,         2,         2,         *,         *,
+                *,
                 *,         *,         *,         *,         *")
    (set_attr "length"
                "*,         *,         *,         8,         *,         8,
                 8,         8,         8,         8,         *,         *,
+                *,
                 *,         20,        8,         *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
                 *,         *,         *,         *,         p9v,       *,
+                p10,
                 <VSisa>,   *,         *,         *,         *")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
-;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
+;;              XXSPLTIB   VSPLTISW   VSX 0/-1
+;;              XXSPLTIDP
+;;              VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
-                wa,        v,         ?wa,       v,         <??r>,
+                wa,        v,         ?wa,
+                wa,
+                v,         <??r>,
                 wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        Y,         r,         r,
-                wE,        jwM,       ?jwM,      W,         <nW>,
+                wE,        jwM,       ?jwM,
+                eP,
+                W,         <nW>,
                 v,         wZ"))]
 
   "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
@@ -1253,15 +1267,21 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, load,      store,    *,
-                vecsimple, vecsimple, vecsimple, *,         *,
+                vecsimple, vecsimple, vecsimple,
+                vecperm,
+                *,         *,
                 vecstore,  vecload")
    (set_attr "length"
                "*,         *,         *,         16,        16,        16,
-                *,         *,         *,         20,        16,
+                *,         *,         *,
+                *,
+                20,        16,
                 *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
-                p9v,       *,         <VSisa>,   *,         *,
+                p9v,       *,         <VSisa>,
+                p10,
+                *,         *,
                 *,         *")])
 
 ;; Explicit  load/store expanders for the builtin functions
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 41f1850bf6e..13b56279565 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3336,6 +3336,10 @@ A constant whose negation is a signed 16-bit constant.
 @item eI
 A signed 34-bit integer constant if prefixed instructions are supported.
 
+@item eP
+A scalar floating point constant or a vector constant that can be
+loaded with one prefixed instruction to a VSX register.
+
 @ifset INTERNALS
 @item G
 A floating point constant that can be loaded into a register with one
diff --git a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
index bd1502bb30a..dcb30e1d886 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
@@ -24,11 +24,12 @@ vector signed long long splats4(void)
         return (vector signed long long) vec_sl(mzero, mzero);
 }
 
-/* Codegen will consist of splat and shift instructions for most types.
-   If folding is enabled, the vec_sl tests using vector long long type will
-   generate a lvx instead of a vspltisw+vsld pair.  */
+/* Codegen will consist of splat and shift instructions for most types.  If
+   folding is enabled, the vec_sl tests using vector long long type will
+   generate a lvx instead of a vspltisw+vsld pair.  On power10, it will
+   generate a xxspltidp instruction instead of the lvx.  */
 
 /* { dg-final { scan-assembler-times {\mvspltis[bhw]\M} 0 } } */
 /* { dg-final { scan-assembler-times {\mvsl[bhwd]\M} 0 } } */
-/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M|\mxxspltidp\M} 2 } } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
new file mode 100644
index 00000000000..8f6e176f9af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+double
+scalar_double_0 (void)
+{
+  return 0.0;			/* XXSPLTIB or XXLXOR.  */
+}
+
+double
+scalar_double_1 (void)
+{
+  return 1.0;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+double
+scalar_double_m0 (void)
+{
+  return -0.0;			/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_nan (void)
+{
+  return __builtin_nan ("");	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_inf (void)
+{
+  return __builtin_inf ();	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inf ();
+}
+#endif
+
+double
+scalar_double_pi (void)
+{
+  return M_PI;			/* PLFD.  */
+}
+
+double
+scalar_double_denorm (void)
+{
+  return 0x1p-149f;		/* PLFD.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
new file mode 100644
index 00000000000..72504bdfbbd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating SFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+float
+scalar_float_0 (void)
+{
+  return 0.0f;			/* XXSPLTIB or XXLXOR.  */
+}
+
+float
+scalar_float_1 (void)
+{
+  return 1.0f;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+float
+scalar_float_m0 (void)
+{
+  return -0.0f;			/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_nan (void)
+{
+  return __builtin_nanf ("");	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_inf (void)
+{
+  return __builtin_inff ();	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inff ();
+}
+#endif
+
+float
+scalar_float_pi (void)
+{
+  return (float)M_PI;		/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_denorm (void)
+{
+  return 0x1p-149f;		/* PLFS.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
new file mode 100644
index 00000000000..82ffc86f8aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
@@ -0,0 +1,64 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating V2DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+vector double
+v2df_double_0 (void)
+{
+  return (vector double) { 0.0, 0.0 };			/* XXSPLTIB or XXLXOR.  */
+}
+
+vector double
+v2df_double_1 (void)
+{
+  return (vector double) { 1.0, 1.0 };			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+vector double
+v2df_double_m0 (void)
+{
+  return (vector double) { -0.0, -0.0 };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_nan (void)
+{
+  return (vector double) { __builtin_nan (""),
+			   __builtin_nan ("") };	/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_inf (void)
+{
+  return (vector double) { __builtin_inf (),
+			   __builtin_inf () };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_m_inf (void)
+{
+  return (vector double) { - __builtin_inf (),
+			   - __builtin_inf () };	/* XXSPLTIDP.  */
+}
+#endif
+
+vector double
+v2df_double_pi (void)
+{
+  return (vector double) { M_PI, M_PI };		/* PLVX.  */
+}
+
+vector double
+v2df_double_denorm (void)
+{
+  return (vector double) { (double)0x1p-149f,
+			   (double)0x1p-149f };		/* PLVX.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
new file mode 100644
index 00000000000..4d44f943d26
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test generating V2DImode constants that have the same bit pattern as
+   V2DFmode constants that can be loaded with the XXSPLTIDP instruction with
+   the ISA 3.1 (power10).  */
+
+vector long long
+vector_0 (void)
+{
+  /* XXSPLTIB or XXLXOR.  */
+  return (vector long long) { 0LL, 0LL };
+}
+
+vector long long
+vector_1 (void)
+{
+  /* XXSPLTIB and VEXTSB2D.  */
+  return (vector long long) { 1LL, 1LL };
+}
+
+/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated
+   with XXSPLTISDP.  */
+vector long long
+vector_float_neg_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x8000000000000000LL, 0x8000000000000000LL };
+}
+
+/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with
+   XXSPLTISDP.  */
+vector long long
+vector_float_1_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x3ff0000000000000LL, 0x3ff0000000000000LL };
+}
+
+/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated
+   with XXSPLTIDP.  */
+vector long long
+scalar_pi (void)
+{
+  /* PLXV.  */
+  return (vector long long) { 0x400921fb54442d18LL, 0x400921fb54442d18LL };
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
index a135279b1d7..5f84930e1a7 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -150,7 +150,7 @@ main (int argc, char *argv [])
 }
 
 /* { dg-final { scan-assembler-times {\mxxspltiw\M} 2 } } */
-/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [gcc(refs/users/meissner/heads/work071)] Generate XXSPLTIDP on power10.
@ 2021-10-15  2:42 Michael Meissner
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Meissner @ 2021-10-15  2:42 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:22ff3e2ad7a222f8acbb3de998443968912eff9f

commit 22ff3e2ad7a222f8acbb3de998443968912eff9f
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Thu Oct 14 22:41:45 2021 -0400

    Generate XXSPLTIDP on power10.
    
    This patch implements XXSPLTIDP support for SF, DF, and DI scalar constants and
    V2DF and V2DI vector constants.  The XXSPLTIDP instruction is given a 32-bit
    immediate that is converted to a vector of two DFmode constants.  The immediate
    is in SFmode format, so only constants that fit as SFmode values can be loaded
    with XXSPLTIDP.
    
    I added two new constraints.  The eS constraint matches scalars that are
    loaded with one prefixed instruction.  The eV constraint matches vectors
    that are loaded with one prefixed instruction.  I originally used one
    constraint, but TImode has problems when the constant 0x8000000000000000
    is loaded and optimized.
    
    I have added a temporary switch (-mxxspltidp) to control whether or not the
    XXSPLTIDP instruction is generated.
    
    I added 5 new tests to test loading up SF/DF/DI scalar and V2DI/V2DF vector
    constants.
    
    This patch updates the previous patch to take into account the comments
    from the patch review.   The main change is that this patch does is map
    each vector and scalar to provide all of bits and then match those bits to
    see if the XXSPLTIDP instruction can generate the bits necessary, even if
    the values in the vector aren't DFmode constants.
    
    A framework is provided in this patch which will also be used in future
    patches adding LXVKQ and XXSPLTIW support (possibly XXSPLTI32DX).  This
    makes it easy for easy_fp_constant and easy_vector_constant to have
    multiple checks (such as for XXSPLTIW, LXVKQ, and XXSPLTI32DX) that each
    want to build the bitmask of what the vector constant looks like.
    
    While the PowerPC is currently limited to 128-bit vectors, I have written
    the code so it can be changed in the future if we ever have larger vection
    sizes.
    
    2021-10-14  Michael Meissner  <meissner@the-meissners.org>
    
    gcc/
    
            * config/rs6000/constraints.md (eS): New constraint.
            (eV): New constraint.
            * config/rs6000/predicates.md (easy_fp_constant): Add support for
            generating XXSPLTIDP.
            (easy_scalar_constant_prefixed): New predicate.
            (easy_vector_constant_prefixed): New predicate.
            (easy_vector_constant): Add support for generating XXSPLTIDP.
            * config/rs6000/rs6000-protos.h (prefixed_xxsplti_p): New
            declaration.
            (VECTOR_CONST_*): New macros.
            (rs6000_vec_const): New structure to hold information about vector
            constants.
            (vec_const_to_bytes): New function.
            (vec_const_use_xxspltidp): New function.
            * config/rs6000/rs6000.c (output_vec_const_move): Add support for
            XXSPLTIDP.
            (prefixed_xxsplti_p): New function.
            (vec_const_integer): New helper function.
            (vec_const_floating_point): New helper function.
            (vec_const_use_xxspltidp): New function.
            (vec_const_to_bytes): New function.
            * config/rs6000/rs6000.md (prefixed attribute): Add support for
            insns that generate XXSPLTIDP.
            (movsf_hardfloat): Add support for XXSPLTIDP.
            (mov<mode>_hardfloat32, FMOVE64 iterator): Likewise.
            (mov<mode>_hardfloat64, FMOVE64 iterator): Likewise.
            (movdi_internal32): Likewise.
            (movdi_internal64): Likewise.
            * config/rs6000/rs6000.opt (-mxxspltidp): New debug option.
            * config/rs6000/vsx.md (vsx_mov<mode>_64bit): Add support for
            XXSPLTIDP.
            (vsx_mov<mode>_32bit): Likewise.
            (XXSPLTIDP): New mode iterator.
            (xxspltidp_<mode>_internal): New insn.
            (XXSPLTIDP splitters): New splitters for XXSPLTIDP.
            * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
            eD constraint.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/pr86731-fwrapv-longlong.c: Update insn
            regex for power10.
            * gcc.target/powerpc/vec-splat-constant-df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-di.c: New test.
            * gcc.target/powerpc/vec-splat-constant-sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2di.c: New test.
            * gcc.target/powerpc/vec-splati-runnable.c: Update insn counts.

Diff:
---
 gcc/config/rs6000/constraints.md                   |  12 +
 gcc/config/rs6000/predicates.md                    |  67 ++++
 gcc/config/rs6000/rs6000-protos.h                  |  22 ++
 gcc/config/rs6000/rs6000.c                         | 358 +++++++++++++++++++++
 gcc/config/rs6000/rs6000.md                        |  58 ++--
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           |  73 ++++-
 gcc/doc/md.texi                                    |   8 +
 .../gcc.target/powerpc/pr86731-fwrapv-longlong.c   |   9 +-
 .../gcc.target/powerpc/vec-splat-constant-df.c     |  60 ++++
 .../gcc.target/powerpc/vec-splat-constant-di.c     |  70 ++++
 .../gcc.target/powerpc/vec-splat-constant-sf.c     |  60 ++++
 .../gcc.target/powerpc/vec-splat-constant-v2df.c   |  64 ++++
 .../gcc.target/powerpc/vec-splat-constant-v2di.c   |  50 +++
 .../gcc.target/powerpc/vec-splati-runnable.c       |   2 +-
 15 files changed, 888 insertions(+), 29 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index c8cff1a3038..f4f4794eef3 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -213,6 +213,18 @@
   "A signed 34-bit integer constant if prefixed instructions are supported."
   (match_operand 0 "cint34_operand"))
 
+;; A scalar constant that can be loaded into vector registers with one prefixed
+;; instruction such as XXSPLTIDP.
+(define_constraint "eS"
+  "A scalar constant that can be loaded with one prefixed instruction."
+  (match_operand 0 "vsx_prefixed_scalar_constant"))
+
+;; A vector constant that can be loaded into vector registers with one prefixed
+;; instruction such as XXSPLTIDP
+(define_constraint "eV"
+  "A vector constant that can be loaded with one prefixed instruction."
+  (match_operand 0 "vsx_prefixed_vector_constant"))
+
 ;; Floating-point constraints.  These two are defined so that insn
 ;; length attributes can be calculated exactly.
 
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 956e42bc514..06d7f34006d 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -601,6 +601,15 @@
   if (TARGET_VSX && op == CONST0_RTX (mode))
     return 1;
 
+  /* Constants that can be generated with ISA 3.1 instructions are easy.  */
+  rs6000_vec_const vec_const;
+
+  if (TARGET_POWER10 && vec_const_to_bytes (op, mode, &vec_const))
+    {
+      if (vec_const_use_xxspltidp (&vec_const))
+	return true;
+    }
+
   /* Otherwise consider floating point constants hard, so that the
      constant gets pushed to memory during the early RTL phases.  This
      has the advantage that double precision constants that can be
@@ -609,6 +618,54 @@
    return 0;
 })
 
+;; Return 1 if the operand is a scalar constant that can be loaded to a VSX
+;; register with one prefixed instruction, such as XXSPLTIDP.
+
+(define_predicate "vsx_prefixed_scalar_constant"
+  (match_code "const_int,const_double")
+{
+  rs6000_vec_const vec_const;
+
+  /* Do we have prefixed instructions and VSX registers available?  Is the
+     constant recognized?  */
+  if (!TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  if (!vec_const_to_bytes (op, mode, &vec_const))
+    return false;
+  
+  if (vec_const_use_xxspltidp (&vec_const))
+    return true;
+
+  return false;
+})
+
+;; Return 1 if the operand is a scalar constant that can be loaded to a VSX
+;; register with one prefixed instruction, such as XXSPLTIDP.
+;;
+;; We have to have separate predicates and constraints for scalars and vectors,
+;; otherwise things get messed up with TImode when you try to load very large
+;; integer constants.
+
+(define_predicate "vsx_prefixed_vector_constant"
+  (match_code "const_vector,vec_duplicate")
+{
+  rs6000_vec_const vec_const;
+
+  /* Do we have prefixed instructions and VSX registers available?  Is the
+     constant recognized?  */
+  if (!TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  if (!vec_const_to_bytes (op, mode, &vec_const))
+    return false;
+  
+  if (vec_const_use_xxspltidp (&vec_const))
+    return true;
+
+  return false;
+})
+
 ;; Return 1 if the operand is a constant that can loaded with a XXSPLTIB
 ;; instruction and then a VUPKHSB, VECSB2W or VECSB2D instruction.
 
@@ -657,6 +714,16 @@
           && xxspltib_constant_p (op, mode, &num_insns, &value))
 	return true;
 
+      /* See if the constant can be generated with the ISA 3.1
+         instructions.  */
+      rs6000_vec_const vec_const;
+
+      if (TARGET_POWER10 && vec_const_to_bytes (op, mode, &vec_const))
+	{
+	  if (vec_const_use_xxspltidp (&vec_const))
+	    return true;
+	}
+
       return easy_altivec_constant (op, mode);
     }
 
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 14f6b313105..df4ae364bfb 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -198,6 +198,7 @@ enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode);
 extern bool prefixed_load_p (rtx_insn *);
 extern bool prefixed_store_p (rtx_insn *);
 extern bool prefixed_paddi_p (rtx_insn *);
+extern bool prefixed_xxsplti_p (rtx_insn *);
 extern void rs6000_asm_output_opcode (FILE *);
 extern void output_pcrel_opt_reloc (rtx);
 extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int);
@@ -222,6 +223,27 @@ address_is_prefixed (rtx addr,
   return (iform == INSN_FORM_PREFIXED_NUMERIC
 	  || iform == INSN_FORM_PCREL_LOCAL);
 }
+
+/* Functions and data structures relating to 128-bit vector constants.  All
+   fields are kept in big endian order.  */
+#define VECTOR_CONST_BITS	128
+#define VECTOR_CONST_BYTES	(VECTOR_CONST_BITS / 8)
+#define VECTOR_CONST_16BIT	(VECTOR_CONST_BITS / 16)
+#define VECTOR_CONST_32BIT	(VECTOR_CONST_BITS / 32)
+#define VECTOR_CONST_64BIT	(VECTOR_CONST_BITS / 64)
+
+typedef struct {
+  /* Vector constant as various sized items.  */
+  unsigned HOST_WIDE_INT d_words[VECTOR_CONST_64BIT];
+  unsigned int words[VECTOR_CONST_32BIT];
+  unsigned short h_words[VECTOR_CONST_16BIT];
+  unsigned char bytes[VECTOR_CONST_BYTES];
+  machine_mode orig_mode;		/* Original mode.  */
+  unsigned int xxspltidp_immediate;	/* Immediate value for XXSPLTIDP.  */
+} rs6000_vec_const;
+
+extern bool vec_const_to_bytes (rtx, machine_mode, rs6000_vec_const *);
+extern bool vec_const_use_xxspltidp (rs6000_vec_const *);
 #endif /* RTX_CODE */
 
 #ifdef TREE_CODE
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index acba4d9f26c..593903ff8c9 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6990,6 +6990,16 @@ output_vec_const_move (rtx *operands)
 	    gcc_unreachable ();
 	}
 
+      rs6000_vec_const vec_const;
+      if (TARGET_POWER10 && vec_const_to_bytes (vec, mode, &vec_const))
+	{
+	  if (vec_const_use_xxspltidp (&vec_const))
+	    {
+	      operands[2] = GEN_INT (vec_const.xxspltidp_immediate);
+	      return "xxspltidp %x0,%2";
+	    }
+	}
+
       if (TARGET_P9_VECTOR
 	  && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
 	{
@@ -26724,6 +26734,41 @@ prefixed_paddi_p (rtx_insn *insn)
   return (iform == INSN_FORM_PCREL_EXTERNAL || iform == INSN_FORM_PCREL_LOCAL);
 }
 
+/* Whether a permute type instruction is a prefixed XXSPLTI* instruction.
+   This is called from the prefixed attribute processing.  */
+
+bool
+prefixed_xxsplti_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  if (!set)
+    return false;
+
+  rtx dest = SET_DEST (set);
+  rtx src = SET_SRC (set);
+  machine_mode mode = GET_MODE (dest);
+
+  if (!REG_P (dest) && !SUBREG_P (dest))
+    return false;
+
+  if (GET_CODE (src) == UNSPEC)
+    {
+      int unspec = XINT (src, 1);
+      return (unspec == UNSPEC_XXSPLTIW
+	      || unspec == UNSPEC_XXSPLTIDP
+	      || unspec == UNSPEC_XXSPLTI32DX);
+    }
+
+  rs6000_vec_const vec_const;
+  if (vec_const_to_bytes (src, mode, &vec_const))
+    {
+      if (vec_const_use_xxspltidp (&vec_const))
+	return true;
+    }
+
+  return false;
+}
+
 /* Whether the next instruction needs a 'p' prefix issued before the
    instruction is printed out.  */
 static bool prepend_p_to_next_insn;
@@ -28587,6 +28632,319 @@ rs6000_output_addr_vec_elt (FILE *file, int value)
   fprintf (file, "\n");
 }
 
+\f
+/* Copy an integer constant to the vector constant structure.  */
+
+static void
+vec_const_integer (rtx op,
+		   machine_mode mode,
+		   size_t byte_num,
+		   rs6000_vec_const *vec_const)
+{
+  unsigned HOST_WIDE_INT uvalue = UINTVAL (op);
+  unsigned bitsize = GET_MODE_BITSIZE (mode);
+
+  for (int shift = bitsize - 8; shift >= 0; shift -= 8)
+    vec_const->bytes[byte_num++] = (uvalue >> shift) & 0xff;
+}
+
+/* Copy an floating point constant to the vector constant structure.  */
+
+static void
+vec_const_floating_point (rtx op,
+			  machine_mode mode,
+			  size_t byte_num,
+			  rs6000_vec_const *vec_const)
+{
+  unsigned bitsize = GET_MODE_BITSIZE (mode);
+  unsigned num_words = bitsize / 32;
+  const REAL_VALUE_TYPE *rtype = CONST_DOUBLE_REAL_VALUE (op);
+  long real_words[VECTOR_CONST_32BIT];
+
+  /* Make sure we don't overflow the real_words array and that it is
+     filled completely.  */
+  gcc_assert (bitsize <= VECTOR_CONST_BITS && (bitsize % 32) == 0);
+
+  real_to_target (real_words, rtype, mode);
+
+  /* Iterate over each 32-bit word in the floating point constant.  The
+     real_to_target function puts out words in endian fashion.  We need
+     to arrange so the words are written in big endian order.  */
+  for (unsigned num = 0; num < num_words; num++)
+    {
+      unsigned endian_num = (BYTES_BIG_ENDIAN
+			     ? num
+			     : num_words - 1 - num);
+
+      unsigned uvalue = real_words[endian_num];
+      for (int shift = 32 - 8; shift >= 0; shift -= 8)
+	vec_const->bytes[byte_num++] = (uvalue >> shift) & 0xff;
+    }
+}
+
+/* Determine if a vector constant can be loaded with XXSPLTIDP.  If so,
+   fill out the fields used to generate the instruction.  */
+
+bool
+vec_const_use_xxspltidp (rs6000_vec_const *vec_const)
+{
+  if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  /* Make sure that the two 64-bit segments are the same.  */
+  unsigned HOST_WIDE_INT df_upper = vec_const->d_words[0];
+  unsigned HOST_WIDE_INT df_lower = vec_const->d_words[1];
+  if (df_upper != df_lower)
+    return false;
+  
+  /* Avoid 0 since that is easy to generate without using XXSPLTIDP.  */
+  if (df_upper == 0)
+    return false;
+
+  /* Avoid values that look like DFmode NaN's, except for the normal NaN bit
+     pattern and signalling NaN bit pattern.  Recognize infinity and negative
+     infinity.
+
+     The IEEE 754 64-bit floating format has 1 bit for sign, 11 bits for the
+     exponent, and 52 bits for the mantissa (not counting the hidden bit used
+     for normal numbers).  NaN values have the exponent set to all 1 bits, and
+     the mantissa non-zero (mantissa == 0 is infinity).  */
+
+  /* Bit representation of DFmode normal quiet NaN.  */
+#define VECTOR_CONST_DF_NAN	HOST_WIDE_INT_UC (0x7ff8000000000000)
+
+  /* Bit representation of DFmode normal signaling NaN.  */
+#define VECTOR_CONST_DF_NANS	HOST_WIDE_INT_UC (0x7ff4000000000000)
+
+  /* Bit representation of DFmode positive infinity.  */
+#define VECTOR_CONST_DF_INF	HOST_WIDE_INT_UC (0x7ff0000000000000)
+
+  /* Bit representation of DFmode negative infinity.  */
+#define VECTOR_CONST_DF_NEG_INF	HOST_WIDE_INT_UC (0xfff0000000000000)
+
+  if (df_upper != VECTOR_CONST_DF_NAN
+      && df_upper != VECTOR_CONST_DF_NANS
+      && df_upper != VECTOR_CONST_DF_INF
+      && df_upper != VECTOR_CONST_DF_NEG_INF)
+    {
+      int df_exponent = (df_upper >> 52) & 0x7ff;
+      unsigned HOST_WIDE_INT df_mantissa
+	= df_upper & ((HOST_WIDE_INT_1U << 52) - HOST_WIDE_INT_1U);
+
+      if (df_exponent == 0x7ff && df_mantissa != 0)	/* other NaNs.  */
+	return false;
+
+      /* Avoid values that are DFmode subnormal values.  Subnormal numbers have
+	 the exponent all 0 bits, and the mantissa non-zero.  If the value is
+	 subnormal, then the hidden bit in the mantissa is not set.  */
+      if (df_exponent == 0 && df_mantissa != 0)		/* subnormal.  */
+	return false;
+    }
+
+  /* Change the representation to DFmode constant.  */
+  long df_words[2] = { vec_const->words[0], vec_const->words[1] };
+
+  /* real_from_target takes the target words in  target order.  */
+  if (!BYTES_BIG_ENDIAN)
+    std::swap (df_words[0], df_words[1]);
+
+  REAL_VALUE_TYPE rv_type;
+  real_from_target (&rv_type, df_words, DFmode);
+
+  const REAL_VALUE_TYPE *rv = &rv_type;
+
+  /* Validate that the number can be stored as a SFmode value.  */
+  if (!exact_real_truncate (SFmode, rv))
+    return false;
+
+  /* Validate that the number is not a SFmode subnormal value (exponent is 0,
+     mantissa field is non-zero) which is undefined for the XXSPLTIDP
+     instruction.  */
+  long sf_value;
+  real_to_target (&sf_value, rv, SFmode);
+
+  /* IEEE 754 32-bit values have 1 bit for the sign, 8 bits for the exponent,
+     and 23 bits for the mantissa.  Subnormal numbers have the exponent all
+     0 bits, and the mantissa non-zero.  */
+  long sf_exponent = (sf_value >> 23) & 0xFF;
+  long sf_mantissa = sf_value & 0x7FFFFF;
+
+  if (sf_exponent == 0 && sf_mantissa != 0)
+    return false;
+
+  /* Record the information in the vec_const structure for XXSPLTIDP.  */
+  vec_const->xxspltidp_immediate = sf_value;
+
+  return true;
+}
+
+/* Convert a vector constant to an internal structure, breaking it out to
+   bytes, half words, words, and double words.  Return true if we have
+   successfully broken it out.  */
+
+bool
+vec_const_to_bytes (rtx op,
+		    machine_mode mode,
+		    rs6000_vec_const *vec_const)
+{
+  /* Initialize vec const structure.  */
+  memset ((void *)vec_const, 0, sizeof (rs6000_vec_const));
+
+  /* Set up the vector bits.  */
+  switch (GET_CODE (op))
+    {
+      /* Integer constants, default to double word.  */
+    case CONST_INT:
+      {
+	/* Scalars are treated as 64-bit integers.  */
+	if (mode == VOIDmode)
+	  mode = DImode;
+
+	vec_const_integer (op, mode, 0, vec_const);
+
+	/* Splat the constant to the rest of the vector constant structure.  */
+	unsigned size = GET_MODE_SIZE (mode);
+	gcc_assert (size <= VECTOR_CONST_BYTES);
+	gcc_assert ((VECTOR_CONST_BYTES % size) == 0);
+
+	for (size_t splat = size; splat < VECTOR_CONST_BYTES; splat += size)
+	  memcpy ((void *) &vec_const->bytes[splat],
+		  (void *) &vec_const->bytes[0],
+		  size);
+	break;
+      }
+
+      /* Floating point constants.  */
+    case CONST_DOUBLE:
+      {
+	/* Fail if the floating point constant is the wrong mode.  */
+	if (mode == VOIDmode)
+	  mode = GET_MODE (op);
+
+	else if (GET_MODE (op) != mode)
+	  return false;
+
+	/* SFmode stored as scalars are stored in DFmode format.  */
+	if (mode == SFmode)
+	  mode = DFmode;
+
+	vec_const_floating_point (op, mode, 0, vec_const);
+
+	/* Splat the constant to the rest of the vector constant structure.  */
+	unsigned size = GET_MODE_SIZE (mode);
+	gcc_assert (size <= VECTOR_CONST_BYTES);
+	gcc_assert ((VECTOR_CONST_BYTES % size) == 0);
+
+	for (size_t splat = size; splat < VECTOR_CONST_BYTES; splat += size)
+	  memcpy ((void *) &vec_const->bytes[splat],
+		  (void *) &vec_const->bytes[0],
+		  size);
+	break;
+      }
+
+      /* Vector constants, iterate each element.  On little endian systems, we
+	 have to reverse the element numbers.  */
+    case CONST_VECTOR:
+      {
+	/* Fail if the vector constant is the wrong mode.  */
+	if (mode == VOIDmode)
+	  mode = GET_MODE (op);
+
+	else if (GET_MODE (op) != mode)
+	  return false;
+
+	machine_mode ele_mode = GET_MODE_INNER (mode);
+	size_t nunits = GET_MODE_NUNITS (mode);
+	size_t size = GET_MODE_SIZE (ele_mode);
+
+	for (size_t num = 0; num < nunits; num++)
+	  {
+	    rtx ele = (GET_CODE (op) == VEC_DUPLICATE
+		       ? XEXP (op, 0)
+		       : CONST_VECTOR_ELT (op, num));
+	    size_t byte_num = (BYTES_BIG_ENDIAN
+			       ? num
+			       : nunits - 1 - num) * size;
+
+	    if (CONST_INT_P (ele))
+	      vec_const_integer (ele, ele_mode, byte_num, vec_const);
+	    else if (CONST_DOUBLE_P (ele))
+	      vec_const_floating_point (ele, ele_mode, byte_num, vec_const);
+	    else
+	      return false;
+	  }
+
+	break;
+      }
+
+	/* Treat VEC_DUPLICATE of a constant just like a vector constant.  */
+    case VEC_DUPLICATE:
+      {
+	/* Fail if the vector duplicate is the wrong mode.  */
+	if (mode == VOIDmode)
+	  mode = GET_MODE (op);
+
+	else if (GET_MODE (op) != mode)
+	  return false;
+
+	machine_mode ele_mode = GET_MODE_INNER (mode);
+	size_t nunits = GET_MODE_NUNITS (mode);
+	size_t size = GET_MODE_SIZE (ele_mode);
+	rtx ele = XEXP (op, 0);
+
+	if (!CONST_INT_P (ele) && !CONST_DOUBLE_P (ele))
+	  return false;
+
+	for (size_t num = 0; num < nunits; num++)
+	  {
+	    size_t byte_num = num * size;
+
+	    if (CONST_INT_P (ele))
+	      vec_const_integer (ele, ele_mode, byte_num, vec_const);
+	    else
+	      vec_const_floating_point (ele, ele_mode, byte_num, vec_const);
+	  }
+
+	break;
+      }
+
+      /* Any thing else, just return failure.  */
+    default:
+      return false;
+    }
+
+  /* Pack half words together.  */
+  for (size_t i = 0; i < VECTOR_CONST_16BIT; i++)
+    vec_const->h_words[i] = ((vec_const->bytes[2*i] << 8)
+			     | vec_const->bytes[2*i + 1]);
+
+  /* Pack words together.  */
+  for (size_t i = 0; i < VECTOR_CONST_32BIT; i++)
+    {
+      unsigned word = 0;
+      for (size_t j = 0; j < 4; j++)
+	word = (word << 8) | vec_const->bytes[(4*i) + j];
+
+      vec_const->words[i] = word;
+    }
+
+  /* Pack double words together.  */
+  for (size_t i = 0; i < VECTOR_CONST_64BIT; i++)
+    {
+      unsigned HOST_WIDE_INT d_word = 0;
+      for (size_t j = 0; j < 8; j++)
+	d_word = (d_word << 8) | vec_const->bytes[(8*i) + j];
+
+      vec_const->d_words[i] = d_word;
+    }
+
+  /* Remember original mode that the vector/scalar used.  */
+  vec_const->orig_mode = mode;
+
+  return true;
+}
+
+\f
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-rs6000.h"
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 6bec2bddbde..79ea4a82b4f 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -314,6 +314,11 @@
 
 	 (eq_attr "type" "integer,add")
 	 (if_then_else (match_test "prefixed_paddi_p (insn)")
+		       (const_string "yes")
+		       (const_string "no"))
+
+	 (eq_attr "type" "vecperm")
+	 (if_then_else (match_test "prefixed_xxsplti_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))]
 
@@ -7759,17 +7764,17 @@
 ;;
 ;;	LWZ          LFS        LXSSP       LXSSPX     STFS       STXSSP
 ;;	STXSSPX      STW        XXLXOR      LI         FMR        XSCPSGNDP
-;;	MR           MT<x>      MF<x>       NOP
+;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP
 
 (define_insn "movsf_hardfloat"
   [(set (match_operand:SF 0 "nonimmediate_operand"
 	 "=!r,       f,         v,          wa,        m,         wY,
 	  Z,         m,         wa,         !r,        f,         wa,
-	  !r,        *c*l,      !r,         *h")
+	  !r,        *c*l,      !r,         *h,        wa")
 	(match_operand:SF 1 "input_operand"
 	 "m,         m,         wY,         Z,         f,         v,
 	  wa,        r,         j,          j,         f,         wa,
-	  r,         r,         *h,         0"))]
+	  r,         r,         *h,         0,         eS"))]
   "(register_operand (operands[0], SFmode)
    || register_operand (operands[1], SFmode))
    && TARGET_HARD_FLOAT
@@ -7791,15 +7796,16 @@
    mr %0,%1
    mt%0 %1
    mf%1 %0
-   nop"
+   nop
+   #"
   [(set_attr "type"
 	"load,       fpload,    fpload,     fpload,    fpstore,   fpstore,
 	 fpstore,    store,     veclogical, integer,   fpsimple,  fpsimple,
-	 *,          mtjmpr,    mfjmpr,     *")
+	 *,          mtjmpr,    mfjmpr,     *,         vecperm")
    (set_attr "isa"
 	"*,          *,         p9v,        p8v,       *,         p9v,
 	 p8v,        *,         *,          *,         *,         *,
-	 *,          *,         *,          *")])
+	 *,          *,         *,          *,         p10")])
 
 ;;	LWZ          LFIWZX     STW        STFIWX     MTVSRWZ    MFVSRWZ
 ;;	FMR          MR         MT%0       MF%1       NOP
@@ -8059,18 +8065,18 @@
 
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSD         STXSD       XXLOR       XXLXOR      GPR<-0
-;;           LWZ          STW         MR
+;;           LWZ          STW         MR          XXSPLTIDP
 
 
 (define_insn "*mov<mode>_hardfloat32"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
             "=m,          d,          d,          <f64_p9>,   wY,
               <f64_av>,   Z,          <f64_vsx>,  <f64_vsx>,  !r,
-              Y,          r,          !r")
+              Y,          r,          !r,         wa")
 	(match_operand:FMOVE64 1 "input_operand"
              "d,          m,          d,          wY,         <f64_p9>,
               Z,          <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
-              r,          Y,          r"))]
+              r,          Y,          r,          eS"))]
   "! TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8087,20 +8093,21 @@
    #
    #
    #
+   #
    #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, two,
-             store,       load,       two")
+             store,       load,       two,        vecperm")
    (set_attr "size" "64")
    (set_attr "length"
             "*,           *,          *,          *,          *,
              *,           *,          *,          *,          8,
-             8,           8,          8")
+             8,           8,          8,          *")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
-             *,           *,          *")])
+             *,           *,          *,          p10")])
 
 ;;           STW      LWZ     MR      G-const H-const F-const
 
@@ -8127,19 +8134,19 @@
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSDX        STXSDX      XXLOR       XXLXOR      LI 0
 ;;           STD          LD          MR          MT{CTR,LR}  MF{CTR,LR}
-;;           NOP          MFVSRD      MTVSRD
+;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP
 
 (define_insn "*mov<mode>_hardfloat64"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
            "=m,           d,          d,          <f64_p9>,   wY,
              <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
              YZ,          r,          !r,         *c*l,       !r,
-            *h,           r,          <f64_dm>")
+            *h,           r,          <f64_dm>,   wa")
 	(match_operand:FMOVE64 1 "input_operand"
             "d,           m,          d,          wY,         <f64_p9>,
              Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
              r,           YZ,         r,          r,          *h,
-             0,           <f64_dm>,   r"))]
+             0,           <f64_dm>,   r,          eS"))]
   "TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8161,18 +8168,19 @@
    mf%1 %0
    nop
    mfvsrd %0,%x1
-   mtvsrd %x0,%1"
+   mtvsrd %x0,%1
+   #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, integer,
              store,       load,       *,          mtjmpr,     mfjmpr,
-             *,           mfvsr,      mtvsr")
+             *,           mfvsr,      mtvsr,      vecperm")
    (set_attr "size" "64")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
              *,           *,          *,          *,          *,
-             *,           p8v,        p8v")])
+             *,           p8v,        p8v,        p10")])
 
 ;;           STD      LD       MR      MT<SPR> MF<SPR> G-const
 ;;           H-const  F-const  Special
@@ -9220,6 +9228,7 @@
 ;; a gpr into a fpr instead of reloading an invalid 'Y' address
 
 ;;        GPR store  GPR load   GPR move   FPR store  FPR load   FPR move
+;;        XXSPLTIDP
 ;;        GPR const  AVX store  AVX store  AVX load   AVX load   VSX move
 ;;        P9 0       P9 -1      AVX 0/-1   VSX 0      VSX -1     P9 const
 ;;        AVX const  
@@ -9227,11 +9236,13 @@
 (define_insn "*movdi_internal32"
   [(set (match_operand:DI 0 "nonimmediate_operand"
          "=Y,        r,         r,         m,         ^d,        ^d,
+          ^wa,
           r,         wY,        Z,         ^v,        $v,        ^wa,
           wa,        wa,        v,         wa,        *i,        v,
           v")
 	(match_operand:DI 1 "input_operand"
          "r,         Y,         r,         ^d,        m,         ^d,
+          eS,
           IJKnF,     ^v,        $v,        wY,        Z,         ^wa,
           Oj,        wM,        OjwM,      Oj,        wM,        wS,
           wB"))]
@@ -9246,6 +9257,7 @@
    lfd%U1%X1 %0,%1
    fmr %0,%1
    #
+   #
    stxsd %1,%0
    stxsdx %x1,%y0
    lxsd %0,%1
@@ -9260,17 +9272,20 @@
    #"
   [(set_attr "type"
          "store,     load,      *,         fpstore,   fpload,    fpsimple,
+          vecperm,
           *,         fpstore,   fpstore,   fpload,    fpload,    veclogical,
           vecsimple, vecsimple, vecsimple, veclogical,veclogical,vecsimple,
           vecsimple")
    (set_attr "size" "64")
    (set_attr "length"
          "8,         8,         8,         *,         *,         *,
+          *,
           16,        *,         *,         *,         *,         *,
           *,         *,         *,         *,         *,         8,
           *")
    (set_attr "isa"
          "*,         *,         *,         *,         *,         *,
+          p10,
           *,         p9v,       p7v,       p9v,       p7v,       *,
           p9v,       p9v,       p7v,       *,         *,         p7v,
           p7v")])
@@ -9306,6 +9321,7 @@
 })
 
 ;;	   GPR store   GPR load    GPR move
+;;	   XXSPLTIDP
 ;;	   GPR li      GPR lis     GPR pli     GPR #
 ;;	   FPR store   FPR load    FPR move
 ;;	   AVX store   AVX store   AVX load    AVX load    VSX move
@@ -9316,6 +9332,7 @@
 (define_insn "*movdi_internal64"
   [(set (match_operand:DI 0 "nonimmediate_operand"
 	  "=YZ,        r,          r,
+	   ^wa,
 	   r,          r,          r,          r,
 	   m,          ^d,         ^d,
 	   wY,         Z,          $v,         $v,         ^wa,
@@ -9325,6 +9342,7 @@
 	   ?r,         ?wa")
 	(match_operand:DI 1 "input_operand"
 	  "r,          YZ,         r,
+	   eS,
 	   I,          L,          eI,         nF,
 	   ^d,         m,          ^d,
 	   ^v,         $v,         wY,         Z,          ^wa,
@@ -9339,6 +9357,7 @@
    std%U0%X0 %1,%0
    ld%U1%X1 %0,%1
    mr %0,%1
+   #
    li %0,%1
    lis %0,%v1
    li %0,%1
@@ -9365,6 +9384,7 @@
    mtvsrd %x0,%1"
   [(set_attr "type"
 	  "store,      load,       *,
+	   vecperm,
 	   *,          *,          *,          *,
 	   fpstore,    fpload,     fpsimple,
 	   fpstore,    fpstore,    fpload,     fpload,     veclogical,
@@ -9375,6 +9395,7 @@
    (set_attr "size" "64")
    (set_attr "length"
 	  "*,          *,          *,
+	   *,
 	   *,          *,          *,          20,
 	   *,          *,          *,
 	   *,          *,          *,          *,          *,
@@ -9384,6 +9405,7 @@
 	   *,          *")
    (set_attr "isa"
 	  "*,          *,          *,
+	   p10,
 	   *,          *,          p10,        *,
 	   *,          *,          *,
 	   p9v,        p7v,        p9v,        p7v,        *,
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 9d7878f144a..1d7ce4cc94a 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -640,6 +640,10 @@ mprivileged
 Target Var(rs6000_privileged) Init(0)
 Generate code that will run in privileged state.
 
+mxxspltidp
+Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
+Generate (do not generate) XXSPLTIDP instructions.
+
 -param=rs6000-density-pct-threshold=
 Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) IntegerRange(0, 100) Param
 When costing for loop vectorization, we probably need to penalize the loop body
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index bf033e31c1c..ef5f43eb820 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1192,16 +1192,19 @@
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
+;;              XXLSPLTIDP
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        r,         we,        ?wQ,
                 ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
+                wa,
                 ?wa,       v,         <??r>,     wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        we,        r,         r,
                 wQ,        Y,         r,         r,         wE,        jwM,
+                eV,
                 ?jwM,      W,         <nW>,      v,         wZ"))]
 
   "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
@@ -1213,36 +1216,47 @@
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
                 store,     load,      store,     *,         vecsimple, vecsimple,
+                vecperm,
                 vecsimple, *,         *,         vecstore,  vecload")
    (set_attr "num_insns"
                "*,         *,         *,         2,         *,         2,
                 2,         2,         2,         2,         *,         *,
+                *,
                 *,         5,         2,         *,         *")
    (set_attr "max_prefixed_insns"
                "*,         *,         *,         *,         *,         2,
                 2,         2,         2,         2,         *,         *,
+                *,
                 *,         *,         *,         *,         *")
    (set_attr "length"
                "*,         *,         *,         8,         *,         8,
                 8,         8,         8,         8,         *,         *,
+                *,
                 *,         20,        8,         *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
                 *,         *,         *,         *,         p9v,       *,
+                p10,
                 <VSisa>,   *,         *,         *,         *")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
-;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
+;;              XXSPLTIB   VSPLTISW   VSX 0/-1
+;;              XXSPLTIDP
+;;              VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
-                wa,        v,         ?wa,       v,         <??r>,
+                wa,        v,         ?wa,
+                wa,
+                v,         <??r>,
                 wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        Y,         r,         r,
-                wE,        jwM,       ?jwM,      W,         <nW>,
+                wE,        jwM,       ?jwM,
+                eV,
+                W,         <nW>,
                 v,         wZ"))]
 
   "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
@@ -1253,15 +1267,21 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, load,      store,    *,
-                vecsimple, vecsimple, vecsimple, *,         *,
+                vecsimple, vecsimple, vecsimple,
+                vecperm,
+                *,         *,
                 vecstore,  vecload")
    (set_attr "length"
                "*,         *,         *,         16,        16,        16,
-                *,         *,         *,         20,        16,
+                *,         *,         *,
+                *,
+                20,        16,
                 *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
-                p9v,       *,         <VSisa>,   *,         *,
+                p9v,       *,         <VSisa>,
+                p10,
+                *,         *,
                 *,         *")])
 
 ;; Explicit  load/store expanders for the builtin functions
@@ -6458,6 +6478,47 @@
   [(set_attr "type" "vecperm")
    (set_attr "prefixed" "yes")])
 
+;; Generate the XXSPLTIDP instruction to support SFmode, DFmode, and DImode
+;; scalar constants and vector constants that look like DFmode floating point
+;; values where both elements are the same.  The constant has to be expressible
+;; as a SFmode constant that is not a SFmode denormal value.
+;;
+;; We don't need splitters for the 128-bit types, since the function
+;; rs6000_output_move_128bit handles the generation of XXSPLTIDP.
+(define_mode_iterator XXSPLTIDP [DI SF DF])
+
+(define_insn "xxspltidp_<mode>_internal"
+  [(set (match_operand:XXSPLTIDP 0 "register_operand" "=wa")
+	(unspec:XXSPLTIDP [(match_operand:SI 1 "c32bit_cint_operand" "n")]
+			  UNSPEC_XXSPLTIDP))]
+  "TARGET_POWER10"
+  "xxspltidp %x0,%1"
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
+(define_split
+  [(set (match_operand:XXSPLTIDP 0 "vsx_register_operand")
+	(match_operand:XXSPLTIDP 1 "vsx_prefixed_scalar_constant"))]
+  "TARGET_POWER10"
+  [(pc)]
+{
+  rtx dest = operands[0];
+  rtx src = operands[1];
+  rs6000_vec_const vec_const;
+
+  if (!vec_const_to_bytes (src, <MODE>mode, &vec_const))
+    gcc_unreachable ();
+
+  if (vec_const_use_xxspltidp (&vec_const))
+    {
+      rtx imm = GEN_INT (vec_const.xxspltidp_immediate);
+      emit_insn (gen_xxspltidp_<mode>_internal (dest, imm));
+      DONE;
+    }
+
+  gcc_unreachable ();
+})
+
 ;; XXSPLTI32DX built-in function support
 (define_expand "xxsplti32dx_v4si"
   [(set (match_operand:V4SI 0 "register_operand" "=wa")
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 41f1850bf6e..4b9ca062688 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3336,6 +3336,14 @@ A constant whose negation is a signed 16-bit constant.
 @item eI
 A signed 34-bit integer constant if prefixed instructions are supported.
 
+@item eS
+A scalar constant that can be loaded with one prefixed instruction to
+a VSX register.
+
+@item eV
+A vector constant that can be loaded with one prefixed instruction to
+a VSX register.
+
 @ifset INTERNALS
 @item G
 A floating point constant that can be loaded into a register with one
diff --git a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
index bd1502bb30a..dcb30e1d886 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
@@ -24,11 +24,12 @@ vector signed long long splats4(void)
         return (vector signed long long) vec_sl(mzero, mzero);
 }
 
-/* Codegen will consist of splat and shift instructions for most types.
-   If folding is enabled, the vec_sl tests using vector long long type will
-   generate a lvx instead of a vspltisw+vsld pair.  */
+/* Codegen will consist of splat and shift instructions for most types.  If
+   folding is enabled, the vec_sl tests using vector long long type will
+   generate a lvx instead of a vspltisw+vsld pair.  On power10, it will
+   generate a xxspltidp instruction instead of the lvx.  */
 
 /* { dg-final { scan-assembler-times {\mvspltis[bhw]\M} 0 } } */
 /* { dg-final { scan-assembler-times {\mvsl[bhwd]\M} 0 } } */
-/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M|\mxxspltidp\M} 2 } } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
new file mode 100644
index 00000000000..8f6e176f9af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+double
+scalar_double_0 (void)
+{
+  return 0.0;			/* XXSPLTIB or XXLXOR.  */
+}
+
+double
+scalar_double_1 (void)
+{
+  return 1.0;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+double
+scalar_double_m0 (void)
+{
+  return -0.0;			/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_nan (void)
+{
+  return __builtin_nan ("");	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_inf (void)
+{
+  return __builtin_inf ();	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inf ();
+}
+#endif
+
+double
+scalar_double_pi (void)
+{
+  return M_PI;			/* PLFD.  */
+}
+
+double
+scalar_double_denorm (void)
+{
+  return 0x1p-149f;		/* PLFD.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c
new file mode 100644
index 00000000000..75714d0b11d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c
@@ -0,0 +1,70 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test generating DImode constants that have the same bit pattern as DFmode
+   constants that can be loaded with the XXSPLTIDP instruction with the ISA 3.1
+   (power10).  We use asm to force the value into vector registers.  */
+
+double
+scalar_0 (void)
+{
+  /* XXSPLTIB or XXLXOR.  */
+  double d;
+  long long ll = 0;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+double
+scalar_1 (void)
+{
+  /* VSPLTISW/VUPKLSW or XXSPLTIB/VEXTSB2D.  */
+  double d;
+  long long ll = 1;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated
+   with XXSPLTIDP.  */
+double
+scalar_float_neg_0 (void)
+{
+  /* XXSPLTIDP.  */
+  double d;
+  long long ll = 0x8000000000000000LL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with
+   XXSPLTIDP.  */
+double
+scalar_float_1_0 (void)
+{
+  /* XXSPLTIDP.  */
+  double d;
+  long long ll = 0x3ff0000000000000LL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated
+   with XXSPLTIDP.  */
+double
+scalar_pi (void)
+{
+  /* PLXV.  */
+  double d;
+  long long ll = 0x400921fb54442d18LL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
new file mode 100644
index 00000000000..72504bdfbbd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating SFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+float
+scalar_float_0 (void)
+{
+  return 0.0f;			/* XXSPLTIB or XXLXOR.  */
+}
+
+float
+scalar_float_1 (void)
+{
+  return 1.0f;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+float
+scalar_float_m0 (void)
+{
+  return -0.0f;			/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_nan (void)
+{
+  return __builtin_nanf ("");	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_inf (void)
+{
+  return __builtin_inff ();	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inff ();
+}
+#endif
+
+float
+scalar_float_pi (void)
+{
+  return (float)M_PI;		/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_denorm (void)
+{
+  return 0x1p-149f;		/* PLFS.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
new file mode 100644
index 00000000000..82ffc86f8aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
@@ -0,0 +1,64 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating V2DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+vector double
+v2df_double_0 (void)
+{
+  return (vector double) { 0.0, 0.0 };			/* XXSPLTIB or XXLXOR.  */
+}
+
+vector double
+v2df_double_1 (void)
+{
+  return (vector double) { 1.0, 1.0 };			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+vector double
+v2df_double_m0 (void)
+{
+  return (vector double) { -0.0, -0.0 };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_nan (void)
+{
+  return (vector double) { __builtin_nan (""),
+			   __builtin_nan ("") };	/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_inf (void)
+{
+  return (vector double) { __builtin_inf (),
+			   __builtin_inf () };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_m_inf (void)
+{
+  return (vector double) { - __builtin_inf (),
+			   - __builtin_inf () };	/* XXSPLTIDP.  */
+}
+#endif
+
+vector double
+v2df_double_pi (void)
+{
+  return (vector double) { M_PI, M_PI };		/* PLVX.  */
+}
+
+vector double
+v2df_double_denorm (void)
+{
+  return (vector double) { (double)0x1p-149f,
+			   (double)0x1p-149f };		/* PLVX.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
new file mode 100644
index 00000000000..4d44f943d26
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test generating V2DImode constants that have the same bit pattern as
+   V2DFmode constants that can be loaded with the XXSPLTIDP instruction with
+   the ISA 3.1 (power10).  */
+
+vector long long
+vector_0 (void)
+{
+  /* XXSPLTIB or XXLXOR.  */
+  return (vector long long) { 0LL, 0LL };
+}
+
+vector long long
+vector_1 (void)
+{
+  /* XXSPLTIB and VEXTSB2D.  */
+  return (vector long long) { 1LL, 1LL };
+}
+
+/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated
+   with XXSPLTISDP.  */
+vector long long
+vector_float_neg_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x8000000000000000LL, 0x8000000000000000LL };
+}
+
+/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with
+   XXSPLTISDP.  */
+vector long long
+vector_float_1_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x3ff0000000000000LL, 0x3ff0000000000000LL };
+}
+
+/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated
+   with XXSPLTIDP.  */
+vector long long
+scalar_pi (void)
+{
+  /* PLXV.  */
+  return (vector long long) { 0x400921fb54442d18LL, 0x400921fb54442d18LL };
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
index a135279b1d7..5f84930e1a7 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -150,7 +150,7 @@ main (int argc, char *argv [])
 }
 
 /* { dg-final { scan-assembler-times {\mxxspltiw\M} 2 } } */
-/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [gcc(refs/users/meissner/heads/work071)] Generate XXSPLTIDP on power10.
@ 2021-10-14 16:49 Michael Meissner
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Meissner @ 2021-10-14 16:49 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:adaaaf3e50039cb31951b98ea2252c6199f82007

commit adaaaf3e50039cb31951b98ea2252c6199f82007
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Thu Oct 14 12:48:57 2021 -0400

    Generate XXSPLTIDP on power10.
    
    This patch implements XXSPLTIDP support for SF, DF, and DI scalar constants and
    V2DF and V2DI vector constants.  The XXSPLTIDP instruction is given a 32-bit
    immediate that is converted to a vector of two DFmode constants.  The immediate
    is in SFmode format, so only constants that fit as SFmode values can be loaded
    with XXSPLTIDP.
    
    I added a new constraint (eD) to match scalar and vector constants that can be
    loaded with the XXSPLTIDP instruction.
    
    I have added a temporary switch (-mxxspltidp) to control whether or not the
    XXSPLTIDP instruction is generated.
    
    I added 5 new tests to test loading up SF/DF/DI scalar and V2DI/V2DF vector
    constants.
    
    This patch updates the previous patch to take into account the comments
    from the patch review.   The main change is that this patch does is map
    each vector and scalar to provide all of bits and then match those bits to
    see if the XXSPLTIDP instruction can generate the bits necessary, even if
    the values in the vector aren't DFmode constants.
    
    Some framework is provided in this patch which will also be used in future
    patches adding LXVKQ and XXSPLTIW support (possibly XXSPLTI32DX).  This way for
    instance in easy_fp_constant and easy_vector_constant when we first check
    whether the constant can be generated by XXSPLTIDP, we don't have to build the
    128-bits of the vector for each successive test.
    
    While the PowerPC is currently limited to 128-bit vectors, I have written
    the code so it can be changed in the future if we ever have larger vection
    sizes.
    
    2021-10-14  Michael Meissner  <meissner@the-meissners.org>
    
    gcc/
    
            * config/rs6000/constraints.md (eD): New constraint.
            * config/rs6000/predicates.md (easy_fp_constant): Add support for
            generating XXSPLTIDP.
            (easy_vector_constant_64bit_element): New predicate.
            (easy_vector_constant): Add support for generating XXSPLTIDP.
            * config/rs6000/rs6000-protos.h (prefixed_xxsplti_p): New
            declaration.
            (VECTOR_CONST_*): New macros.
            (rs6000_vec_const): New structure to hold information about vector
            constants.
            (vec_const_to_bytes): New function.
            (vec_const_use_xxspltidp): New function.
            * config/rs6000/rs6000.c (output_vec_const_move): Add support for
            XXSPLTIDP.
            (prefixed_xxsplti_p): New function.
            (vec_const_integer): New helper function.
            (vec_const_floating_point): New helper function.
            (vec_const_use_xxspltidp): New function.
            (vec_const_to_bytes): New function.
            * config/rs6000/rs6000.md (prefixed attribute): Add support for
            insns that generate XXSPLTIDP.
            (movsf_hardfloat): Add support for XXSPLTIDP.
            (mov<mode>_hardfloat32, FMOVE64 iterator): Likewise.
            (mov<mode>_hardfloat64, FMOVE64 iterator): Likewise.
            (movdi_internal32): Likewise.
            (movdi_internal64): Likewise.
            * config/rs6000/rs6000.opt (-mxxspltidp): New debug option.
            * config/rs6000/vsx.md (vsx_mov<mode>_64bit): Add support for
            XXSPLTIDP.
            (vsx_mov<mode>_32bit): Likewise.
            (XXSPLTIDP): New mode iterator.
            (xxspltidp_<mode>_internal): New insn.
            (XXSPLTIDP splitters): New splitters for XXSPLTIDP.
            * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
            eD constraint.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/pr86731-fwrapv-longlong.c: Update insn
            regex for power10.
            * gcc.target/powerpc/vec-splat-constant-df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-di.c: New test.
            * gcc.target/powerpc/vec-splat-constant-sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2di.c: New test.

Diff:
---
 gcc/config/rs6000/constraints.md                   |   5 +
 gcc/config/rs6000/predicates.md                    |  36 ++
 gcc/config/rs6000/rs6000-protos.h                  |  22 ++
 gcc/config/rs6000/rs6000.c                         | 367 +++++++++++++++++++++
 gcc/config/rs6000/rs6000.md                        |  58 +++-
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           |  65 +++-
 gcc/doc/md.texi                                    |   3 +
 .../gcc.target/powerpc/pr86731-fwrapv-longlong.c   |   9 +-
 .../gcc.target/powerpc/vec-splat-constant-df.c     |  60 ++++
 .../gcc.target/powerpc/vec-splat-constant-di.c     |  70 ++++
 .../gcc.target/powerpc/vec-splat-constant-sf.c     |  60 ++++
 .../gcc.target/powerpc/vec-splat-constant-v2df.c   |  64 ++++
 .../gcc.target/powerpc/vec-splat-constant-v2di.c   |  50 +++
 14 files changed, 845 insertions(+), 28 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index c8cff1a3038..d26c8940104 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -208,6 +208,11 @@
   (and (match_code "const_int")
        (match_test "((- (unsigned HOST_WIDE_INT) ival) + 0x8000) < 0x10000")))
 
+;; A scalar or vector constant that can be loaded with the XXSPLTIDP instruction.
+(define_constraint "eD"
+  "A constant that can be loaded with the XXSPLTIDP instruction."
+  (match_operand 0 "easy_vector_constant_64bit_element"))
+
 ;; 34-bit signed integer constant
 (define_constraint "eI"
   "A signed 34-bit integer constant if prefixed instructions are supported."
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 956e42bc514..d4b50276bac 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -601,6 +601,15 @@
   if (TARGET_VSX && op == CONST0_RTX (mode))
     return 1;
 
+  /* Constants that can be generated with ISA 3.1 instructions are easy.  */
+  rs6000_vec_const vec_const;
+
+  if (TARGET_POWER10 && vec_const_to_bytes (op, mode, &vec_const))
+    {
+      if (vec_const_use_xxspltidp (&vec_const))
+	return true;
+    }
+
   /* Otherwise consider floating point constants hard, so that the
      constant gets pushed to memory during the early RTL phases.  This
      has the advantage that double precision constants that can be
@@ -609,6 +618,23 @@
    return 0;
 })
 
+;; Return 1 if the operand is a 64-bit vector constant that can be loaded via
+;; the XXSPLTIDP instruction, which takes a SFmode value and produces a
+;; V2DFmode or V2DI result.
+
+(define_predicate "easy_vector_constant_64bit_element"
+  (match_code "const_vector,vec_duplicate,const_int,const_double")
+{
+  rs6000_vec_const vec_const;
+
+  /* Can we generate the XXSPLTIDP instruction?  */
+  if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  return (vec_const_to_bytes (op, mode, &vec_const)
+	  && vec_const_use_xxspltidp (&vec_const));
+})
+
 ;; Return 1 if the operand is a constant that can loaded with a XXSPLTIB
 ;; instruction and then a VUPKHSB, VECSB2W or VECSB2D instruction.
 
@@ -657,6 +683,16 @@
           && xxspltib_constant_p (op, mode, &num_insns, &value))
 	return true;
 
+      /* See if the constant can be generated with the ISA 3.1
+         instructions.  */
+      rs6000_vec_const vec_const;
+
+      if (TARGET_POWER10 && vec_const_to_bytes (op, mode, &vec_const))
+	{
+	  if (vec_const_use_xxspltidp (&vec_const))
+	    return true;
+	}
+
       return easy_altivec_constant (op, mode);
     }
 
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 14f6b313105..df4ae364bfb 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -198,6 +198,7 @@ enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode);
 extern bool prefixed_load_p (rtx_insn *);
 extern bool prefixed_store_p (rtx_insn *);
 extern bool prefixed_paddi_p (rtx_insn *);
+extern bool prefixed_xxsplti_p (rtx_insn *);
 extern void rs6000_asm_output_opcode (FILE *);
 extern void output_pcrel_opt_reloc (rtx);
 extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int);
@@ -222,6 +223,27 @@ address_is_prefixed (rtx addr,
   return (iform == INSN_FORM_PREFIXED_NUMERIC
 	  || iform == INSN_FORM_PCREL_LOCAL);
 }
+
+/* Functions and data structures relating to 128-bit vector constants.  All
+   fields are kept in big endian order.  */
+#define VECTOR_CONST_BITS	128
+#define VECTOR_CONST_BYTES	(VECTOR_CONST_BITS / 8)
+#define VECTOR_CONST_16BIT	(VECTOR_CONST_BITS / 16)
+#define VECTOR_CONST_32BIT	(VECTOR_CONST_BITS / 32)
+#define VECTOR_CONST_64BIT	(VECTOR_CONST_BITS / 64)
+
+typedef struct {
+  /* Vector constant as various sized items.  */
+  unsigned HOST_WIDE_INT d_words[VECTOR_CONST_64BIT];
+  unsigned int words[VECTOR_CONST_32BIT];
+  unsigned short h_words[VECTOR_CONST_16BIT];
+  unsigned char bytes[VECTOR_CONST_BYTES];
+  machine_mode orig_mode;		/* Original mode.  */
+  unsigned int xxspltidp_immediate;	/* Immediate value for XXSPLTIDP.  */
+} rs6000_vec_const;
+
+extern bool vec_const_to_bytes (rtx, machine_mode, rs6000_vec_const *);
+extern bool vec_const_use_xxspltidp (rs6000_vec_const *);
 #endif /* RTX_CODE */
 
 #ifdef TREE_CODE
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index acba4d9f26c..3ec59ed2a5e 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6990,6 +6990,16 @@ output_vec_const_move (rtx *operands)
 	    gcc_unreachable ();
 	}
 
+      rs6000_vec_const vec_const;
+      if (TARGET_POWER10 && vec_const_to_bytes (vec, mode, &vec_const))
+	{
+	  if (vec_const_use_xxspltidp (&vec_const))
+	    {
+	      operands[2] = GEN_INT (vec_const.xxspltidp_immediate);
+	      return "xxspltidp %x0,%2";
+	    }
+	}
+
       if (TARGET_P9_VECTOR
 	  && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
 	{
@@ -26724,6 +26734,41 @@ prefixed_paddi_p (rtx_insn *insn)
   return (iform == INSN_FORM_PCREL_EXTERNAL || iform == INSN_FORM_PCREL_LOCAL);
 }
 
+/* Whether a permute type instruction is a prefixed XXSPLTI* instruction.
+   This is called from the prefixed attribute processing.  */
+
+bool
+prefixed_xxsplti_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  if (!set)
+    return false;
+
+  rtx dest = SET_DEST (set);
+  rtx src = SET_SRC (set);
+  machine_mode mode = GET_MODE (dest);
+
+  if (!REG_P (dest) && !SUBREG_P (dest))
+    return false;
+
+  if (GET_CODE (src) == UNSPEC)
+    {
+      int unspec = XINT (src, 1);
+      return (unspec == UNSPEC_XXSPLTIW
+	      || unspec == UNSPEC_XXSPLTIDP
+	      || unspec == UNSPEC_XXSPLTI32DX);
+    }
+
+  rs6000_vec_const vec_const;
+  if (vec_const_to_bytes (src, mode, &vec_const))
+    {
+      if (vec_const_use_xxspltidp (&vec_const))
+	return true;
+    }
+
+  return false;
+}
+
 /* Whether the next instruction needs a 'p' prefix issued before the
    instruction is printed out.  */
 static bool prepend_p_to_next_insn;
@@ -28587,6 +28632,328 @@ rs6000_output_addr_vec_elt (FILE *file, int value)
   fprintf (file, "\n");
 }
 
+\f
+/* Copy an integer constant to the vector constant structure.  */
+
+static void
+vec_const_integer (rtx op,
+		   machine_mode mode,
+		   size_t byte_num,
+		   rs6000_vec_const *vec_const)
+{
+  unsigned HOST_WIDE_INT uvalue = UINTVAL (op);
+  unsigned bitsize = GET_MODE_BITSIZE (mode);
+
+  for (int shift = bitsize - 8; shift >= 0; shift -= 8)
+    vec_const->bytes[byte_num++] = (uvalue >> shift) & 0xff;
+}
+
+/* Copy an floating point constant to the vector constant structure.  */
+
+static void
+vec_const_floating_point (rtx op,
+			  machine_mode mode,
+			  size_t byte_num,
+			  rs6000_vec_const *vec_const)
+{
+  unsigned bitsize = GET_MODE_BITSIZE (mode);
+  unsigned num_words = bitsize / 32;
+  const REAL_VALUE_TYPE *rtype = CONST_DOUBLE_REAL_VALUE (op);
+  long real_words[VECTOR_CONST_32BIT];
+
+  /* Make sure we don't overflow the real_words array and that it is
+     filled completely.  */
+  gcc_assert (bitsize <= VECTOR_CONST_BITS && (bitsize % 32) == 0);
+
+  real_to_target (real_words, rtype, mode);
+
+  /* Iterate over each 32-bit word in the floating point constant.  The
+     real_to_target function puts out words in endian fashion.  We need
+     to arrange so the words are written in big endian order.  */
+  for (unsigned num = 0; num < num_words; num++)
+    {
+      unsigned endian_num = (BYTES_BIG_ENDIAN
+			     ? num
+			     : num_words - 1 - num);
+
+      unsigned uvalue = real_words[endian_num];
+      for (int shift = 32 - 8; shift >= 0; shift -= 8)
+	vec_const->bytes[byte_num++] = (uvalue >> shift) & 0xff;
+    }
+}
+
+/* Determine if a vector constant can be loaded with XXSPLTIDP.  If so,
+   fill out the fields used to generate the instruction.  */
+
+bool
+vec_const_use_xxspltidp (rs6000_vec_const *vec_const)
+{
+  if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  /* Make sure that the two 64-bit segments are the same.  */
+  unsigned HOST_WIDE_INT df_upper = vec_const->d_words[0];
+  unsigned HOST_WIDE_INT df_lower = vec_const->d_words[1];
+  if (df_upper != df_lower)
+    return false;
+  
+  /* Avoid values that are easy to create with other instructions (0.0 for
+     floating point, and values that can be loaded with XXSPLTIB and sign
+     extension for integer.  */
+  if (df_upper == 0)
+    return false;
+
+  machine_mode mode = vec_const->orig_mode;
+  if (mode == VOIDmode)
+    mode = DImode;
+
+  if (!FLOAT_MODE_P (mode) && IN_RANGE (df_upper, -128, 127))
+    return false;
+
+  /* Avoid values that look like DFmode NaN's, except for the normal NaN bit
+     pattern and signalling NaN bit pattern.  Recognize infinity and negative
+     infinity.
+
+     The IEEE 754 64-bit floating format has 1 bit for sign, 11 bits for the
+     exponent, and 52 bits for the mantissa (not counting the hidden bit used
+     for normal numbers).  NaN values have the exponent set to all 1 bits, and
+     the mantissa non-zero (mantissa == 0 is infinity).  */
+
+  /* Bit representation of DFmode normal quiet NaN.  */
+#define VECTOR_CONST_DF_NAN	HOST_WIDE_INT_UC (0x7ff8000000000000)
+
+  /* Bit representation of DFmode normal signaling NaN.  */
+#define VECTOR_CONST_DF_NANS	HOST_WIDE_INT_UC (0x7ff4000000000000)
+
+  /* Bit representation of DFmode positive infinity.  */
+#define VECTOR_CONST_DF_INF	HOST_WIDE_INT_UC (0x7ff0000000000000)
+
+  /* Bit representation of DFmode negative infinity.  */
+#define VECTOR_CONST_DF_NEG_INF	HOST_WIDE_INT_UC (0xfff0000000000000)
+
+  if (df_upper != VECTOR_CONST_DF_NAN
+      && df_upper != VECTOR_CONST_DF_NANS
+      && df_upper != VECTOR_CONST_DF_INF
+      && df_upper != VECTOR_CONST_DF_NEG_INF)
+    {
+      int df_exponent = (df_upper >> 52) & 0x7ff;
+      unsigned HOST_WIDE_INT df_mantissa
+	= df_upper & ((HOST_WIDE_INT_1U << 52) - HOST_WIDE_INT_1U);
+
+      if (df_exponent == 0x7ff && df_mantissa != 0)	/* other NaNs.  */
+	return false;
+
+      /* Avoid values that are DFmode subnormal values.  Subnormal numbers have
+	 the exponent all 0 bits, and the mantissa non-zero.  If the value is
+	 subnormal, then the hidden bit in the mantissa is not set.  */
+      if (df_exponent == 0 && df_mantissa != 0)		/* subnormal.  */
+	return false;
+    }
+
+  /* Change the representation to DFmode constant.  */
+  long df_words[2] = { vec_const->words[0], vec_const->words[1] };
+
+  /* real_from_target takes the target words in  target order.  */
+  if (!BYTES_BIG_ENDIAN)
+    std::swap (df_words[0], df_words[1]);
+
+  REAL_VALUE_TYPE rv_type;
+  real_from_target (&rv_type, df_words, DFmode);
+
+  const REAL_VALUE_TYPE *rv = &rv_type;
+
+  /* Validate that the number can be stored as a SFmode value.  */
+  if (!exact_real_truncate (SFmode, rv))
+    return false;
+
+  /* Validate that the number is not a SFmode subnormal value (exponent is 0,
+     mantissa field is non-zero) which is undefined for the XXSPLTIDP
+     instruction.  */
+  long sf_value;
+  real_to_target (&sf_value, rv, SFmode);
+
+  /* IEEE 754 32-bit values have 1 bit for the sign, 8 bits for the exponent,
+     and 23 bits for the mantissa.  Subnormal numbers have the exponent all
+     0 bits, and the mantissa non-zero.  */
+  long sf_exponent = (sf_value >> 23) & 0xFF;
+  long sf_mantissa = sf_value & 0x7FFFFF;
+
+  if (sf_exponent == 0 && sf_mantissa != 0)
+    return false;
+
+  /* Record the information in the vec_const structure for XXSPLTIDP.  */
+  vec_const->xxspltidp_immediate = sf_value;
+
+  return true;
+}
+
+/* Convert a vector constant to an internal structure, breaking it out to
+   bytes, half words, words, and double words.  Return true if we have
+   successfully broken it out.  */
+
+bool
+vec_const_to_bytes (rtx op,
+		    machine_mode mode,
+		    rs6000_vec_const *vec_const)
+{
+  /* Initialize vec const structure.  */
+  memset ((void *)vec_const, 0, sizeof (rs6000_vec_const));
+
+  /* Set up the vector bits.  */
+  switch (GET_CODE (op))
+    {
+      /* Integer constants, default to double word.  */
+    case CONST_INT:
+      {
+	/* Scalars are treated as 64-bit integers.  */
+	if (mode == VOIDmode)
+	  mode = DImode;
+
+	vec_const_integer (op, mode, 0, vec_const);
+
+	/* Splat the constant to the rest of the vector constant structure.  */
+	unsigned size = GET_MODE_SIZE (mode);
+	gcc_assert (size <= VECTOR_CONST_BYTES);
+	gcc_assert ((VECTOR_CONST_BYTES % size) == 0);
+
+	for (size_t splat = size; splat < VECTOR_CONST_BYTES; splat += size)
+	  memcpy ((void *) &vec_const->bytes[splat],
+		  (void *) &vec_const->bytes[0],
+		  size);
+	break;
+      }
+
+      /* Floating point constants.  */
+    case CONST_DOUBLE:
+      {
+	/* Fail if the floating point constant is the wrong mode.  */
+	if (mode == VOIDmode)
+	  mode = GET_MODE (op);
+
+	else if (GET_MODE (op) != mode)
+	  return false;
+
+	/* SFmode stored as scalars are stored in DFmode format.  */
+	if (mode == SFmode)
+	  mode = DFmode;
+
+	vec_const_floating_point (op, mode, 0, vec_const);
+
+	/* Splat the constant to the rest of the vector constant structure.  */
+	unsigned size = GET_MODE_SIZE (mode);
+	gcc_assert (size <= VECTOR_CONST_BYTES);
+	gcc_assert ((VECTOR_CONST_BYTES % size) == 0);
+
+	for (size_t splat = size; splat < VECTOR_CONST_BYTES; splat += size)
+	  memcpy ((void *) &vec_const->bytes[splat],
+		  (void *) &vec_const->bytes[0],
+		  size);
+	break;
+      }
+
+      /* Vector constants, iterate each element.  On little endian systems, we
+	 have to reverse the element numbers.  */
+    case CONST_VECTOR:
+      {
+	/* Fail if the vector constant is the wrong mode.  */
+	if (mode == VOIDmode)
+	  mode = GET_MODE (op);
+
+	else if (GET_MODE (op) != mode)
+	  return false;
+
+	machine_mode ele_mode = GET_MODE_INNER (mode);
+	size_t nunits = GET_MODE_NUNITS (mode);
+	size_t size = GET_MODE_SIZE (ele_mode);
+
+	for (size_t num = 0; num < nunits; num++)
+	  {
+	    rtx ele = (GET_CODE (op) == VEC_DUPLICATE
+		       ? XEXP (op, 0)
+		       : CONST_VECTOR_ELT (op, num));
+	    size_t byte_num = (BYTES_BIG_ENDIAN
+			       ? num
+			       : nunits - 1 - num) * size;
+
+	    if (CONST_INT_P (ele))
+	      vec_const_integer (ele, ele_mode, byte_num, vec_const);
+	    else if (CONST_DOUBLE_P (ele))
+	      vec_const_floating_point (ele, ele_mode, byte_num, vec_const);
+	    else
+	      return false;
+	  }
+
+	break;
+      }
+
+	/* Treat VEC_DUPLICATE of a constant just like a vector constant.  */
+    case VEC_DUPLICATE:
+      {
+	/* Fail if the vector duplicate is the wrong mode.  */
+	if (mode == VOIDmode)
+	  mode = GET_MODE (op);
+
+	else if (GET_MODE (op) != mode)
+	  return false;
+
+	machine_mode ele_mode = GET_MODE_INNER (mode);
+	size_t nunits = GET_MODE_NUNITS (mode);
+	size_t size = GET_MODE_SIZE (ele_mode);
+	rtx ele = XEXP (op, 0);
+
+	if (!CONST_INT_P (ele) && !CONST_DOUBLE_P (ele))
+	  return false;
+
+	for (size_t num = 0; num < nunits; num++)
+	  {
+	    size_t byte_num = num * size;
+
+	    if (CONST_INT_P (ele))
+	      vec_const_integer (ele, ele_mode, byte_num, vec_const);
+	    else
+	      vec_const_floating_point (ele, ele_mode, byte_num, vec_const);
+	  }
+
+	break;
+      }
+
+      /* Any thing else, just return failure.  */
+    default:
+      return false;
+    }
+
+  /* Pack half words together.  */
+  for (size_t i = 0; i < VECTOR_CONST_16BIT; i++)
+    vec_const->h_words[i] = ((vec_const->bytes[2*i] << 8)
+			     | vec_const->bytes[2*i + 1]);
+
+  /* Pack words together.  */
+  for (size_t i = 0; i < VECTOR_CONST_32BIT; i++)
+    {
+      unsigned word = 0;
+      for (size_t j = 0; j < 4; j++)
+	word = (word << 8) | vec_const->bytes[(4*i) + j];
+
+      vec_const->words[i] = word;
+    }
+
+  /* Pack double words together.  */
+  for (size_t i = 0; i < VECTOR_CONST_64BIT; i++)
+    {
+      unsigned HOST_WIDE_INT d_word = 0;
+      for (size_t j = 0; j < 8; j++)
+	d_word = (d_word << 8) | vec_const->bytes[(8*i) + j];
+
+      vec_const->d_words[i] = d_word;
+    }
+
+  /* Remember original mode that the vector/scalar used.  */
+  vec_const->orig_mode = mode;
+
+  return true;
+}
+
+\f
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-rs6000.h"
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 6bec2bddbde..cf42b6d2058 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -314,6 +314,11 @@
 
 	 (eq_attr "type" "integer,add")
 	 (if_then_else (match_test "prefixed_paddi_p (insn)")
+		       (const_string "yes")
+		       (const_string "no"))
+
+	 (eq_attr "type" "vecperm")
+	 (if_then_else (match_test "prefixed_xxsplti_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))]
 
@@ -7759,17 +7764,17 @@
 ;;
 ;;	LWZ          LFS        LXSSP       LXSSPX     STFS       STXSSP
 ;;	STXSSPX      STW        XXLXOR      LI         FMR        XSCPSGNDP
-;;	MR           MT<x>      MF<x>       NOP
+;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP
 
 (define_insn "movsf_hardfloat"
   [(set (match_operand:SF 0 "nonimmediate_operand"
 	 "=!r,       f,         v,          wa,        m,         wY,
 	  Z,         m,         wa,         !r,        f,         wa,
-	  !r,        *c*l,      !r,         *h")
+	  !r,        *c*l,      !r,         *h,        wa")
 	(match_operand:SF 1 "input_operand"
 	 "m,         m,         wY,         Z,         f,         v,
 	  wa,        r,         j,          j,         f,         wa,
-	  r,         r,         *h,         0"))]
+	  r,         r,         *h,         0,         eD"))]
   "(register_operand (operands[0], SFmode)
    || register_operand (operands[1], SFmode))
    && TARGET_HARD_FLOAT
@@ -7791,15 +7796,16 @@
    mr %0,%1
    mt%0 %1
    mf%1 %0
-   nop"
+   nop
+   #"
   [(set_attr "type"
 	"load,       fpload,    fpload,     fpload,    fpstore,   fpstore,
 	 fpstore,    store,     veclogical, integer,   fpsimple,  fpsimple,
-	 *,          mtjmpr,    mfjmpr,     *")
+	 *,          mtjmpr,    mfjmpr,     *,         vecperm")
    (set_attr "isa"
 	"*,          *,         p9v,        p8v,       *,         p9v,
 	 p8v,        *,         *,          *,         *,         *,
-	 *,          *,         *,          *")])
+	 *,          *,         *,          *,         p10")])
 
 ;;	LWZ          LFIWZX     STW        STFIWX     MTVSRWZ    MFVSRWZ
 ;;	FMR          MR         MT%0       MF%1       NOP
@@ -8059,18 +8065,18 @@
 
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSD         STXSD       XXLOR       XXLXOR      GPR<-0
-;;           LWZ          STW         MR
+;;           LWZ          STW         MR          XXSPLTIDP
 
 
 (define_insn "*mov<mode>_hardfloat32"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
             "=m,          d,          d,          <f64_p9>,   wY,
               <f64_av>,   Z,          <f64_vsx>,  <f64_vsx>,  !r,
-              Y,          r,          !r")
+              Y,          r,          !r,         wa")
 	(match_operand:FMOVE64 1 "input_operand"
              "d,          m,          d,          wY,         <f64_p9>,
               Z,          <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
-              r,          Y,          r"))]
+              r,          Y,          r,          eD"))]
   "! TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8087,20 +8093,21 @@
    #
    #
    #
+   #
    #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, two,
-             store,       load,       two")
+             store,       load,       two,        vecperm")
    (set_attr "size" "64")
    (set_attr "length"
             "*,           *,          *,          *,          *,
              *,           *,          *,          *,          8,
-             8,           8,          8")
+             8,           8,          8,          *")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
-             *,           *,          *")])
+             *,           *,          *,          p10")])
 
 ;;           STW      LWZ     MR      G-const H-const F-const
 
@@ -8127,19 +8134,19 @@
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSDX        STXSDX      XXLOR       XXLXOR      LI 0
 ;;           STD          LD          MR          MT{CTR,LR}  MF{CTR,LR}
-;;           NOP          MFVSRD      MTVSRD
+;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP
 
 (define_insn "*mov<mode>_hardfloat64"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
            "=m,           d,          d,          <f64_p9>,   wY,
              <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
              YZ,          r,          !r,         *c*l,       !r,
-            *h,           r,          <f64_dm>")
+            *h,           r,          <f64_dm>,   wa")
 	(match_operand:FMOVE64 1 "input_operand"
             "d,           m,          d,          wY,         <f64_p9>,
              Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
              r,           YZ,         r,          r,          *h,
-             0,           <f64_dm>,   r"))]
+             0,           <f64_dm>,   r,          eD"))]
   "TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8161,18 +8168,19 @@
    mf%1 %0
    nop
    mfvsrd %0,%x1
-   mtvsrd %x0,%1"
+   mtvsrd %x0,%1
+   #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, integer,
              store,       load,       *,          mtjmpr,     mfjmpr,
-             *,           mfvsr,      mtvsr")
+             *,           mfvsr,      mtvsr,      vecperm")
    (set_attr "size" "64")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
              *,           *,          *,          *,          *,
-             *,           p8v,        p8v")])
+             *,           p8v,        p8v,        p10")])
 
 ;;           STD      LD       MR      MT<SPR> MF<SPR> G-const
 ;;           H-const  F-const  Special
@@ -9220,6 +9228,7 @@
 ;; a gpr into a fpr instead of reloading an invalid 'Y' address
 
 ;;        GPR store  GPR load   GPR move   FPR store  FPR load   FPR move
+;;        XXSPLTIDP
 ;;        GPR const  AVX store  AVX store  AVX load   AVX load   VSX move
 ;;        P9 0       P9 -1      AVX 0/-1   VSX 0      VSX -1     P9 const
 ;;        AVX const  
@@ -9227,11 +9236,13 @@
 (define_insn "*movdi_internal32"
   [(set (match_operand:DI 0 "nonimmediate_operand"
          "=Y,        r,         r,         m,         ^d,        ^d,
+          ^wa,
           r,         wY,        Z,         ^v,        $v,        ^wa,
           wa,        wa,        v,         wa,        *i,        v,
           v")
 	(match_operand:DI 1 "input_operand"
          "r,         Y,         r,         ^d,        m,         ^d,
+          eD,
           IJKnF,     ^v,        $v,        wY,        Z,         ^wa,
           Oj,        wM,        OjwM,      Oj,        wM,        wS,
           wB"))]
@@ -9246,6 +9257,7 @@
    lfd%U1%X1 %0,%1
    fmr %0,%1
    #
+   #
    stxsd %1,%0
    stxsdx %x1,%y0
    lxsd %0,%1
@@ -9260,17 +9272,20 @@
    #"
   [(set_attr "type"
          "store,     load,      *,         fpstore,   fpload,    fpsimple,
+          vecperm,
           *,         fpstore,   fpstore,   fpload,    fpload,    veclogical,
           vecsimple, vecsimple, vecsimple, veclogical,veclogical,vecsimple,
           vecsimple")
    (set_attr "size" "64")
    (set_attr "length"
          "8,         8,         8,         *,         *,         *,
+          *,
           16,        *,         *,         *,         *,         *,
           *,         *,         *,         *,         *,         8,
           *")
    (set_attr "isa"
          "*,         *,         *,         *,         *,         *,
+          p10,
           *,         p9v,       p7v,       p9v,       p7v,       *,
           p9v,       p9v,       p7v,       *,         *,         p7v,
           p7v")])
@@ -9306,6 +9321,7 @@
 })
 
 ;;	   GPR store   GPR load    GPR move
+;;	   XXSPLTIDP
 ;;	   GPR li      GPR lis     GPR pli     GPR #
 ;;	   FPR store   FPR load    FPR move
 ;;	   AVX store   AVX store   AVX load    AVX load    VSX move
@@ -9316,6 +9332,7 @@
 (define_insn "*movdi_internal64"
   [(set (match_operand:DI 0 "nonimmediate_operand"
 	  "=YZ,        r,          r,
+	   ^wa,
 	   r,          r,          r,          r,
 	   m,          ^d,         ^d,
 	   wY,         Z,          $v,         $v,         ^wa,
@@ -9325,6 +9342,7 @@
 	   ?r,         ?wa")
 	(match_operand:DI 1 "input_operand"
 	  "r,          YZ,         r,
+	   eD,
 	   I,          L,          eI,         nF,
 	   ^d,         m,          ^d,
 	   ^v,         $v,         wY,         Z,          ^wa,
@@ -9339,6 +9357,7 @@
    std%U0%X0 %1,%0
    ld%U1%X1 %0,%1
    mr %0,%1
+   #
    li %0,%1
    lis %0,%v1
    li %0,%1
@@ -9365,6 +9384,7 @@
    mtvsrd %x0,%1"
   [(set_attr "type"
 	  "store,      load,       *,
+	   vecperm,
 	   *,          *,          *,          *,
 	   fpstore,    fpload,     fpsimple,
 	   fpstore,    fpstore,    fpload,     fpload,     veclogical,
@@ -9375,6 +9395,7 @@
    (set_attr "size" "64")
    (set_attr "length"
 	  "*,          *,          *,
+	   *,
 	   *,          *,          *,          20,
 	   *,          *,          *,
 	   *,          *,          *,          *,          *,
@@ -9384,6 +9405,7 @@
 	   *,          *")
    (set_attr "isa"
 	  "*,          *,          *,
+	   p10,
 	   *,          *,          p10,        *,
 	   *,          *,          *,
 	   p9v,        p7v,        p9v,        p7v,        *,
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 9d7878f144a..1d7ce4cc94a 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -640,6 +640,10 @@ mprivileged
 Target Var(rs6000_privileged) Init(0)
 Generate code that will run in privileged state.
 
+mxxspltidp
+Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
+Generate (do not generate) XXSPLTIDP instructions.
+
 -param=rs6000-density-pct-threshold=
 Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) IntegerRange(0, 100) Param
 When costing for loop vectorization, we probably need to penalize the loop body
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index bf033e31c1c..67ba121ed77 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1192,16 +1192,19 @@
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
+;;              XXLSPLTIDP
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        r,         we,        ?wQ,
                 ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
+                wa,
                 ?wa,       v,         <??r>,     wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        we,        r,         r,
                 wQ,        Y,         r,         r,         wE,        jwM,
+                eD,
                 ?jwM,      W,         <nW>,      v,         wZ"))]
 
   "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
@@ -1213,36 +1216,47 @@
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
                 store,     load,      store,     *,         vecsimple, vecsimple,
+                vecperm,
                 vecsimple, *,         *,         vecstore,  vecload")
    (set_attr "num_insns"
                "*,         *,         *,         2,         *,         2,
                 2,         2,         2,         2,         *,         *,
+                *,
                 *,         5,         2,         *,         *")
    (set_attr "max_prefixed_insns"
                "*,         *,         *,         *,         *,         2,
                 2,         2,         2,         2,         *,         *,
+                *,
                 *,         *,         *,         *,         *")
    (set_attr "length"
                "*,         *,         *,         8,         *,         8,
                 8,         8,         8,         8,         *,         *,
+                *,
                 *,         20,        8,         *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
                 *,         *,         *,         *,         p9v,       *,
+                p10,
                 <VSisa>,   *,         *,         *,         *")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
-;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
+;;              XXSPLTIB   VSPLTISW   VSX 0/-1
+;;              XXSPLTIDP
+;;              VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
-                wa,        v,         ?wa,       v,         <??r>,
+                wa,        v,         ?wa,
+                wa,
+                v,         <??r>,
                 wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        Y,         r,         r,
-                wE,        jwM,       ?jwM,      W,         <nW>,
+                wE,        jwM,       ?jwM,
+                eD,
+                W,         <nW>,
                 v,         wZ"))]
 
   "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
@@ -1253,15 +1267,21 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, load,      store,    *,
-                vecsimple, vecsimple, vecsimple, *,         *,
+                vecsimple, vecsimple, vecsimple,
+                vecperm,
+                *,         *,
                 vecstore,  vecload")
    (set_attr "length"
                "*,         *,         *,         16,        16,        16,
-                *,         *,         *,         20,        16,
+                *,         *,         *,
+                *,
+                20,        16,
                 *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
-                p9v,       *,         <VSisa>,   *,         *,
+                p9v,       *,         <VSisa>,
+                p10,
+                *,         *,
                 *,         *")])
 
 ;; Explicit  load/store expanders for the builtin functions
@@ -6458,6 +6478,39 @@
   [(set_attr "type" "vecperm")
    (set_attr "prefixed" "yes")])
 
+;; Generate the XXSPLTIDP instruction to support SFmode, DFmode, and DImode
+;; scalar constants and vector constants that look like DFmode floating point
+;; values where both elements are the same.  The constant has to be expressible
+;; as a SFmode constant that is not a SFmode denormal value.
+;;
+;; We don't need splitters for the 128-bit types, since the function
+;; rs6000_output_move_128bit handles the generation of XXSPLTIDP.
+(define_mode_iterator XXSPLTIDP [DI SF DF])
+
+(define_insn "*xxspltidp_<mode>_internal"
+  [(set (match_operand:XXSPLTIDP 0 "register_operand" "=wa")
+	(unspec:XXSPLTIDP [(match_operand:SI 1 "c32bit_cint_operand" "n")]
+			  UNSPEC_XXSPLTIDP))]
+  "TARGET_POWER10"
+  "xxspltidp %x0,%1"
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
+(define_split
+  [(set (match_operand:XXSPLTIDP 0 "vsx_register_operand")
+	(match_operand:XXSPLTIDP 1 "easy_vector_constant_64bit_element"))]
+  "TARGET_POWER10"
+  [(set (match_dup 0)
+	(unspec:XXSPLTIDP [(match_dup 2)] UNSPEC_XXSPLTIDP))]
+{
+  rs6000_vec_const vec_const;
+  if (!vec_const_to_bytes (operands[1], <MODE>mode, &vec_const)
+      || !vec_const_use_xxspltidp (&vec_const))
+    gcc_unreachable ();
+
+  operands[2] = GEN_INT (vec_const.xxspltidp_immediate);
+})
+
 ;; XXSPLTI32DX built-in function support
 (define_expand "xxsplti32dx_v4si"
   [(set (match_operand:V4SI 0 "register_operand" "=wa")
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 41f1850bf6e..b9dfcaf0d44 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3333,6 +3333,9 @@ The integer constant zero.
 A constant whose negation is a signed 16-bit constant.
 @end ifset
 
+@item eD
+A constant that can be loaded with the XXSPLTIDP instruction.
+
 @item eI
 A signed 34-bit integer constant if prefixed instructions are supported.
 
diff --git a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
index bd1502bb30a..dcb30e1d886 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
@@ -24,11 +24,12 @@ vector signed long long splats4(void)
         return (vector signed long long) vec_sl(mzero, mzero);
 }
 
-/* Codegen will consist of splat and shift instructions for most types.
-   If folding is enabled, the vec_sl tests using vector long long type will
-   generate a lvx instead of a vspltisw+vsld pair.  */
+/* Codegen will consist of splat and shift instructions for most types.  If
+   folding is enabled, the vec_sl tests using vector long long type will
+   generate a lvx instead of a vspltisw+vsld pair.  On power10, it will
+   generate a xxspltidp instruction instead of the lvx.  */
 
 /* { dg-final { scan-assembler-times {\mvspltis[bhw]\M} 0 } } */
 /* { dg-final { scan-assembler-times {\mvsl[bhwd]\M} 0 } } */
-/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M|\mxxspltidp\M} 2 } } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
new file mode 100644
index 00000000000..8f6e176f9af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+double
+scalar_double_0 (void)
+{
+  return 0.0;			/* XXSPLTIB or XXLXOR.  */
+}
+
+double
+scalar_double_1 (void)
+{
+  return 1.0;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+double
+scalar_double_m0 (void)
+{
+  return -0.0;			/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_nan (void)
+{
+  return __builtin_nan ("");	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_inf (void)
+{
+  return __builtin_inf ();	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inf ();
+}
+#endif
+
+double
+scalar_double_pi (void)
+{
+  return M_PI;			/* PLFD.  */
+}
+
+double
+scalar_double_denorm (void)
+{
+  return 0x1p-149f;		/* PLFD.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c
new file mode 100644
index 00000000000..75714d0b11d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c
@@ -0,0 +1,70 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test generating DImode constants that have the same bit pattern as DFmode
+   constants that can be loaded with the XXSPLTIDP instruction with the ISA 3.1
+   (power10).  We use asm to force the value into vector registers.  */
+
+double
+scalar_0 (void)
+{
+  /* XXSPLTIB or XXLXOR.  */
+  double d;
+  long long ll = 0;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+double
+scalar_1 (void)
+{
+  /* VSPLTISW/VUPKLSW or XXSPLTIB/VEXTSB2D.  */
+  double d;
+  long long ll = 1;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated
+   with XXSPLTIDP.  */
+double
+scalar_float_neg_0 (void)
+{
+  /* XXSPLTIDP.  */
+  double d;
+  long long ll = 0x8000000000000000LL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with
+   XXSPLTIDP.  */
+double
+scalar_float_1_0 (void)
+{
+  /* XXSPLTIDP.  */
+  double d;
+  long long ll = 0x3ff0000000000000LL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated
+   with XXSPLTIDP.  */
+double
+scalar_pi (void)
+{
+  /* PLXV.  */
+  double d;
+  long long ll = 0x400921fb54442d18LL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
new file mode 100644
index 00000000000..72504bdfbbd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating SFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+float
+scalar_float_0 (void)
+{
+  return 0.0f;			/* XXSPLTIB or XXLXOR.  */
+}
+
+float
+scalar_float_1 (void)
+{
+  return 1.0f;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+float
+scalar_float_m0 (void)
+{
+  return -0.0f;			/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_nan (void)
+{
+  return __builtin_nanf ("");	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_inf (void)
+{
+  return __builtin_inff ();	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inff ();
+}
+#endif
+
+float
+scalar_float_pi (void)
+{
+  return (float)M_PI;		/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_denorm (void)
+{
+  return 0x1p-149f;		/* PLFS.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
new file mode 100644
index 00000000000..82ffc86f8aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
@@ -0,0 +1,64 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating V2DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+vector double
+v2df_double_0 (void)
+{
+  return (vector double) { 0.0, 0.0 };			/* XXSPLTIB or XXLXOR.  */
+}
+
+vector double
+v2df_double_1 (void)
+{
+  return (vector double) { 1.0, 1.0 };			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+vector double
+v2df_double_m0 (void)
+{
+  return (vector double) { -0.0, -0.0 };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_nan (void)
+{
+  return (vector double) { __builtin_nan (""),
+			   __builtin_nan ("") };	/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_inf (void)
+{
+  return (vector double) { __builtin_inf (),
+			   __builtin_inf () };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_m_inf (void)
+{
+  return (vector double) { - __builtin_inf (),
+			   - __builtin_inf () };	/* XXSPLTIDP.  */
+}
+#endif
+
+vector double
+v2df_double_pi (void)
+{
+  return (vector double) { M_PI, M_PI };		/* PLVX.  */
+}
+
+vector double
+v2df_double_denorm (void)
+{
+  return (vector double) { (double)0x1p-149f,
+			   (double)0x1p-149f };		/* PLVX.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
new file mode 100644
index 00000000000..4d44f943d26
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test generating V2DImode constants that have the same bit pattern as
+   V2DFmode constants that can be loaded with the XXSPLTIDP instruction with
+   the ISA 3.1 (power10).  */
+
+vector long long
+vector_0 (void)
+{
+  /* XXSPLTIB or XXLXOR.  */
+  return (vector long long) { 0LL, 0LL };
+}
+
+vector long long
+vector_1 (void)
+{
+  /* XXSPLTIB and VEXTSB2D.  */
+  return (vector long long) { 1LL, 1LL };
+}
+
+/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated
+   with XXSPLTISDP.  */
+vector long long
+vector_float_neg_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x8000000000000000LL, 0x8000000000000000LL };
+}
+
+/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with
+   XXSPLTISDP.  */
+vector long long
+vector_float_1_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x3ff0000000000000LL, 0x3ff0000000000000LL };
+}
+
+/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated
+   with XXSPLTIDP.  */
+vector long long
+scalar_pi (void)
+{
+  /* PLXV.  */
+  return (vector long long) { 0x400921fb54442d18LL, 0x400921fb54442d18LL };
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [gcc(refs/users/meissner/heads/work071)] Generate XXSPLTIDP on power10.
@ 2021-10-14 14:57 Michael Meissner
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Meissner @ 2021-10-14 14:57 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:7a499ee942874a0ac851dd505dbd345fe8577993

commit 7a499ee942874a0ac851dd505dbd345fe8577993
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Thu Oct 14 10:57:14 2021 -0400

    Generate XXSPLTIDP on power10.
    
    This patch implements XXSPLTIDP support for SF, DF, and DI scalar constants and
    V2DF and V2DI vector constants.  The XXSPLTIDP instruction is given a 32-bit
    immediate that is converted to a vector of two DFmode constants.  The immediate
    is in SFmode format, so only constants that fit as SFmode values can be loaded
    with XXSPLTIDP.
    
    I added a new constraint (eD) to match scalar and vector constants that can be
    loaded with the XXSPLTIDP instruction.
    
    I have added a temporary switch (-mxxspltidp) to control whether or not the
    XXSPLTIDP instruction is generated.
    
    I added 5 new tests to test loading up SF/DF/DI scalar and V2DI/V2DF vector
    constants.
    
    This patch updates the previous patch to take into account the comments
    from the patch review.   The main change is that this patch does is map
    each vector and scalar to provide all of bits and then match those bits to
    see if the XXSPLTIDP instruction can generate the bits necessary, even if
    the values in the vector aren't DFmode constants.
    
    Some framework is provided in this patch which will also be used in future
    patches adding LXVKQ and XXSPLTIW support (possibly XXSPLTI32DX).  This way for
    instance in easy_fp_constant and easy_vector_constant when we first check
    whether the constant can be generated by XXSPLTIDP, we don't have to build the
    128-bits of the vector for each successive test.
    
    While the PowerPC is currently limited to 128-bit vectors, I have written
    the code so it can be changed in the future if we ever have larger vection
    sizes.
    
    2021-10-14  Michael Meissner  <meissner@the-meissners.org>
    
    gcc/
    
            * config/rs6000/constraints.md (eD): New constraint.
            * config/rs6000/predicates.md (easy_fp_constant): Add support for
            generating XXSPLTIDP.
            (easy_vector_constant_64bit_element): New predicate.
            (easy_vector_constant): Add support for generating XXSPLTIDP.
            * config/rs6000/rs6000-protos.h (prefixed_xxsplti_p): New
            declaration.
            (VECTOR_CONST_*): New macros.
            (rs6000_vec_const): New structure to hold information about vector
            constants.
            (vec_const_to_bytes): New function.
            (vec_const_use_xxspltidp): New function.
            * config/rs6000/rs6000.c (output_vec_const_move): Add support for
            XXSPLTIDP.
            (prefixed_xxsplti_p): New function.
            (vec_const_integer): New helper function.
            (vec_const_floating_point): New helper function.
            (vec_const_use_xxspltidp): New function.
            (vec_const_to_bytes): New function.
            * config/rs6000/rs6000.md (prefixed attribute): Add support for
            insns that generate XXSPLTIDP.
            (movsf_hardfloat): Add support for XXSPLTIDP.
            (mov<mode>_hardfloat32, FMOVE64 iterator): Likewise.
            (mov<mode>_hardfloat64, FMOVE64 iterator): Likewise.
            (movdi_internal32): Likewise.
            (movdi_internal64): Likewise.
            * config/rs6000/rs6000.opt (-mxxspltidp): New debug option.
            * config/rs6000/vsx.md (vsx_mov<mode>_64bit): Add support for
            XXSPLTIDP.
            (vsx_mov<mode>_32bit): Likewise.
            (XXSPLTIDP): New mode iterator.
            (xxspltidp_<mode>_internal): New insn.
            (XXSPLTIDP splitters): New splitters for XXSPLTIDP.
            * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
            eD constraint.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/pr86731-fwrapv-longlong.c: Update insn
            regex for power10.
            * gcc.target/powerpc/vec-splat-constant-df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-di.c: New test.
            * gcc.target/powerpc/vec-splat-constant-sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2di.c: New test.

Diff:
---
 gcc/config/rs6000/constraints.md                   |   5 +
 gcc/config/rs6000/predicates.md                    |  36 +++
 gcc/config/rs6000/rs6000-protos.h                  |  22 ++
 gcc/config/rs6000/rs6000.c                         | 326 +++++++++++++++++++++
 gcc/config/rs6000/rs6000.md                        |  58 ++--
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           |  65 +++-
 gcc/doc/md.texi                                    |   3 +
 .../gcc.target/powerpc/pr86731-fwrapv-longlong.c   |   9 +-
 .../gcc.target/powerpc/vec-splat-constant-df.c     |  60 ++++
 .../gcc.target/powerpc/vec-splat-constant-di.c     |  70 +++++
 .../gcc.target/powerpc/vec-splat-constant-sf.c     |  60 ++++
 .../gcc.target/powerpc/vec-splat-constant-v2df.c   |  64 ++++
 .../gcc.target/powerpc/vec-splat-constant-v2di.c   |  50 ++++
 14 files changed, 804 insertions(+), 28 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index c8cff1a3038..d26c8940104 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -208,6 +208,11 @@
   (and (match_code "const_int")
        (match_test "((- (unsigned HOST_WIDE_INT) ival) + 0x8000) < 0x10000")))
 
+;; A scalar or vector constant that can be loaded with the XXSPLTIDP instruction.
+(define_constraint "eD"
+  "A constant that can be loaded with the XXSPLTIDP instruction."
+  (match_operand 0 "easy_vector_constant_64bit_element"))
+
 ;; 34-bit signed integer constant
 (define_constraint "eI"
   "A signed 34-bit integer constant if prefixed instructions are supported."
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 956e42bc514..d4b50276bac 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -601,6 +601,15 @@
   if (TARGET_VSX && op == CONST0_RTX (mode))
     return 1;
 
+  /* Constants that can be generated with ISA 3.1 instructions are easy.  */
+  rs6000_vec_const vec_const;
+
+  if (TARGET_POWER10 && vec_const_to_bytes (op, mode, &vec_const))
+    {
+      if (vec_const_use_xxspltidp (&vec_const))
+	return true;
+    }
+
   /* Otherwise consider floating point constants hard, so that the
      constant gets pushed to memory during the early RTL phases.  This
      has the advantage that double precision constants that can be
@@ -609,6 +618,23 @@
    return 0;
 })
 
+;; Return 1 if the operand is a 64-bit vector constant that can be loaded via
+;; the XXSPLTIDP instruction, which takes a SFmode value and produces a
+;; V2DFmode or V2DI result.
+
+(define_predicate "easy_vector_constant_64bit_element"
+  (match_code "const_vector,vec_duplicate,const_int,const_double")
+{
+  rs6000_vec_const vec_const;
+
+  /* Can we generate the XXSPLTIDP instruction?  */
+  if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  return (vec_const_to_bytes (op, mode, &vec_const)
+	  && vec_const_use_xxspltidp (&vec_const));
+})
+
 ;; Return 1 if the operand is a constant that can loaded with a XXSPLTIB
 ;; instruction and then a VUPKHSB, VECSB2W or VECSB2D instruction.
 
@@ -657,6 +683,16 @@
           && xxspltib_constant_p (op, mode, &num_insns, &value))
 	return true;
 
+      /* See if the constant can be generated with the ISA 3.1
+         instructions.  */
+      rs6000_vec_const vec_const;
+
+      if (TARGET_POWER10 && vec_const_to_bytes (op, mode, &vec_const))
+	{
+	  if (vec_const_use_xxspltidp (&vec_const))
+	    return true;
+	}
+
       return easy_altivec_constant (op, mode);
     }
 
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 14f6b313105..df4ae364bfb 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -198,6 +198,7 @@ enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode);
 extern bool prefixed_load_p (rtx_insn *);
 extern bool prefixed_store_p (rtx_insn *);
 extern bool prefixed_paddi_p (rtx_insn *);
+extern bool prefixed_xxsplti_p (rtx_insn *);
 extern void rs6000_asm_output_opcode (FILE *);
 extern void output_pcrel_opt_reloc (rtx);
 extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int);
@@ -222,6 +223,27 @@ address_is_prefixed (rtx addr,
   return (iform == INSN_FORM_PREFIXED_NUMERIC
 	  || iform == INSN_FORM_PCREL_LOCAL);
 }
+
+/* Functions and data structures relating to 128-bit vector constants.  All
+   fields are kept in big endian order.  */
+#define VECTOR_CONST_BITS	128
+#define VECTOR_CONST_BYTES	(VECTOR_CONST_BITS / 8)
+#define VECTOR_CONST_16BIT	(VECTOR_CONST_BITS / 16)
+#define VECTOR_CONST_32BIT	(VECTOR_CONST_BITS / 32)
+#define VECTOR_CONST_64BIT	(VECTOR_CONST_BITS / 64)
+
+typedef struct {
+  /* Vector constant as various sized items.  */
+  unsigned HOST_WIDE_INT d_words[VECTOR_CONST_64BIT];
+  unsigned int words[VECTOR_CONST_32BIT];
+  unsigned short h_words[VECTOR_CONST_16BIT];
+  unsigned char bytes[VECTOR_CONST_BYTES];
+  machine_mode orig_mode;		/* Original mode.  */
+  unsigned int xxspltidp_immediate;	/* Immediate value for XXSPLTIDP.  */
+} rs6000_vec_const;
+
+extern bool vec_const_to_bytes (rtx, machine_mode, rs6000_vec_const *);
+extern bool vec_const_use_xxspltidp (rs6000_vec_const *);
 #endif /* RTX_CODE */
 
 #ifdef TREE_CODE
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index acba4d9f26c..6d9359b6e88 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6990,6 +6990,16 @@ output_vec_const_move (rtx *operands)
 	    gcc_unreachable ();
 	}
 
+      rs6000_vec_const vec_const;
+      if (TARGET_POWER10 && vec_const_to_bytes (vec, mode, &vec_const))
+	{
+	  if (vec_const_use_xxspltidp (&vec_const))
+	    {
+	      operands[2] = GEN_INT (vec_const.xxspltidp_immediate);
+	      return "xxspltidp %x0,%2";
+	    }
+	}
+
       if (TARGET_P9_VECTOR
 	  && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
 	{
@@ -26724,6 +26734,41 @@ prefixed_paddi_p (rtx_insn *insn)
   return (iform == INSN_FORM_PCREL_EXTERNAL || iform == INSN_FORM_PCREL_LOCAL);
 }
 
+/* Whether a permute type instruction is a prefixed XXSPLTI* instruction.
+   This is called from the prefixed attribute processing.  */
+
+bool
+prefixed_xxsplti_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  if (!set)
+    return false;
+
+  rtx dest = SET_DEST (set);
+  rtx src = SET_SRC (set);
+  machine_mode mode = GET_MODE (dest);
+
+  if (!REG_P (dest) && !SUBREG_P (dest))
+    return false;
+
+  if (GET_CODE (src) == UNSPEC)
+    {
+      int unspec = XINT (src, 1);
+      return (unspec == UNSPEC_XXSPLTIW
+	      || unspec == UNSPEC_XXSPLTIDP
+	      || unspec == UNSPEC_XXSPLTI32DX);
+    }
+
+  rs6000_vec_const vec_const;
+  if (vec_const_to_bytes (src, mode, &vec_const))
+    {
+      if (vec_const_use_xxspltidp (&vec_const))
+	return true;
+    }
+
+  return false;
+}
+
 /* Whether the next instruction needs a 'p' prefix issued before the
    instruction is printed out.  */
 static bool prepend_p_to_next_insn;
@@ -28587,6 +28632,287 @@ rs6000_output_addr_vec_elt (FILE *file, int value)
   fprintf (file, "\n");
 }
 
+\f
+/* Copy an integer constant to the vector constant structure.  */
+
+static void
+vec_const_integer (rtx op,
+		   machine_mode mode,
+		   size_t byte_num,
+		   rs6000_vec_const *vec_const)
+{
+  unsigned HOST_WIDE_INT uvalue = UINTVAL (op);
+  unsigned bitsize = GET_MODE_BITSIZE (mode);
+
+  for (int shift = bitsize - 8; shift >= 0; shift -= 8)
+    vec_const->bytes[byte_num++] = (uvalue >> shift) & 0xff;
+}
+
+/* Copy an floating point constant to the vector constant structure.  */
+
+static void
+vec_const_floating_point (rtx op,
+			  machine_mode mode,
+			  size_t byte_num,
+			  rs6000_vec_const *vec_const)
+{
+  unsigned bitsize = GET_MODE_BITSIZE (mode);
+  unsigned num_words = bitsize / 32;
+  const REAL_VALUE_TYPE *rtype = CONST_DOUBLE_REAL_VALUE (op);
+  long real_words[VECTOR_CONST_32BIT];
+
+  /* Make sure we don't overflow the real_words array and that it is
+     filled completely.  */
+  gcc_assert (bitsize <= VECTOR_CONST_BITS && (bitsize % 32) == 0);
+
+  real_to_target (real_words, rtype, mode);
+
+  /* Iterate over each 32-bit word in the floating point constant.  The
+     real_to_target function puts out words in endian fashion.  We need
+     to arrange so the words are written in big endian order.  */
+  for (unsigned num = 0; num < num_words; num++)
+    {
+      unsigned endian_num = (BYTES_BIG_ENDIAN
+			     ? num
+			     : num_words - 1 - num);
+
+      unsigned uvalue = real_words[endian_num];
+      for (int shift = 32 - 8; shift >= 0; shift -= 8)
+	vec_const->bytes[byte_num++] = (uvalue >> shift) & 0xff;
+    }
+}
+
+/* Determine if a vector constant can be loaded with XXSPLTIDP.  If so,
+   fill out the fields used to generate the instruction.  */
+
+bool
+vec_const_use_xxspltidp (rs6000_vec_const *vec_const)
+{
+  if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  /* Make sure that the two 64-bit segments are the same.  */
+  unsigned HOST_WIDE_INT df_upper = vec_const->d_words[0];
+  unsigned HOST_WIDE_INT df_lower = vec_const->d_words[1];
+  if (df_upper != df_lower)
+    return false;
+  
+  /* Avoid values that are easy to create with other instructions (0.0 for
+     floating point, and values that can be loaded with XXSPLTIB and sign
+     extension for integer.  */
+  if (df_upper == 0)
+    return false;
+
+  machine_mode mode = vec_const->orig_mode;
+  if (mode == VOIDmode)
+    mode = DImode;
+
+  if (!FLOAT_MODE_P (mode) && IN_RANGE (df_upper, -128, 127))
+    return false;
+
+  /* Avoid values that look like DFmode NaN's, except for the normal NaN bit
+     pattern and signalling NaN bit pattern.  Recognize infinity and negative
+     infinity.
+
+     The IEEE 754 64-bit floating format has 1 bit for sign, 11 bits for the
+     exponent, and 52 bits for the mantissa (not counting the hidden bit used
+     for normal numbers).  NaN values have the exponent set to all 1 bits, and
+     the mantissa non-zero (mantissa == 0 is infinity).  */
+
+  /* Bit representation of DFmode normal quiet NaN.  */
+#define VECTOR_CONST_DF_NAN	HOST_WIDE_INT_UC (0x7ff8000000000000)
+
+  /* Bit representation of DFmode normal signaling NaN.  */
+#define VECTOR_CONST_DF_NANS	HOST_WIDE_INT_UC (0x7ff4000000000000)
+
+  /* Bit representation of DFmode positive infinity.  */
+#define VECTOR_CONST_DF_INF	HOST_WIDE_INT_UC (0x7ff0000000000000)
+
+  /* Bit representation of DFmode negative infinity.  */
+#define VECTOR_CONST_DF_NEG_INF	HOST_WIDE_INT_UC (0xfff0000000000000)
+
+  if (df_upper != VECTOR_CONST_DF_NAN
+      && df_upper != VECTOR_CONST_DF_NANS
+      && df_upper != VECTOR_CONST_DF_INF
+      && df_upper != VECTOR_CONST_DF_NEG_INF)
+    {
+      int df_exponent = (df_upper >> 52) & 0x7ff;
+      unsigned HOST_WIDE_INT df_mantissa
+	= df_upper & ((HOST_WIDE_INT_1U << 52) - HOST_WIDE_INT_1U);
+
+      if (df_exponent == 0x7ff && df_mantissa != 0)	/* other NaNs.  */
+	return false;
+
+      /* Avoid values that are DFmode subnormal values.  Subnormal numbers have
+	 the exponent all 0 bits, and the mantissa non-zero.  If the value is
+	 subnormal, then the hidden bit in the mantissa is not set.  */
+      if (df_exponent == 0 && df_mantissa != 0)		/* subnormal.  */
+	return false;
+    }
+
+  /* Change the representation to DFmode constant.  */
+  long df_words[2] = { vec_const->words[0], vec_const->words[1] };
+
+  /* real_from_target takes the target words in  target order.  */
+  if (!BYTES_BIG_ENDIAN)
+    std::swap (df_words[0], df_words[1]);
+
+  REAL_VALUE_TYPE rv_type;
+  real_from_target (&rv_type, df_words, DFmode);
+
+  const REAL_VALUE_TYPE *rv = &rv_type;
+
+  /* Validate that the number can be stored as a SFmode value.  */
+  if (!exact_real_truncate (SFmode, rv))
+    return false;
+
+  /* Validate that the number is not a SFmode subnormal value (exponent is 0,
+     mantissa field is non-zero) which is undefined for the XXSPLTIDP
+     instruction.  */
+  long sf_value;
+  real_to_target (&sf_value, rv, SFmode);
+
+  /* IEEE 754 32-bit values have 1 bit for the sign, 8 bits for the exponent,
+     and 23 bits for the mantissa.  Subnormal numbers have the exponent all
+     0 bits, and the mantissa non-zero.  */
+  long sf_exponent = (sf_value >> 23) & 0xFF;
+  long sf_mantissa = sf_value & 0x7FFFFF;
+
+  if (sf_exponent == 0 && sf_mantissa != 0)
+    return false;
+
+  /* Record the information in the vec_const structure for XXSPLTIDP.  */
+  vec_const->xxspltidp_immediate = sf_value;
+
+  return true;
+}
+
+/* Convert a vector constant to an internal structure, breaking it out to
+   bytes, half words, words, and double words.  Return true if we have
+   successfully broken it out.  */
+
+bool
+vec_const_to_bytes (rtx op,
+		    machine_mode mode,
+		    rs6000_vec_const *vec_const)
+{
+  /* Initialize vec const structure.  */
+  memset ((void *)vec_const, 0, sizeof (rs6000_vec_const));
+
+  /* If we don't know the size of the constant, punt.  */
+  if (mode == VOIDmode)
+    return false;
+
+  switch (GET_CODE (op))
+    {
+      /* Integer constants, default to double word.  */
+    case CONST_INT:
+      {
+	/* Scalars are treated as 64-bit integers.  */
+	if (mode == VOIDmode)
+	  mode = DImode;
+
+	vec_const_integer (op, mode, 0, vec_const);
+
+	/* Splat the constant to the rest of the vector constant structure.  */
+	unsigned size = GET_MODE_SIZE (mode);
+	gcc_assert (size <= VECTOR_CONST_BYTES);
+	gcc_assert ((VECTOR_CONST_BYTES % size) == 0);
+
+	for (size_t splat = size; splat < VECTOR_CONST_BYTES; splat += size)
+	  memcpy ((void *) &vec_const->bytes[splat],
+		  (void *) &vec_const->bytes[0],
+		  size);
+	break;
+      }
+
+      /* Floating point constants.  */
+    case CONST_DOUBLE:
+      {
+	/* SFmode stored as scalars is stored in DFmode format.  */
+	if (mode == SFmode)
+	  mode = DFmode;
+
+	vec_const_floating_point (op, mode, 0, vec_const);
+
+	/* Splat the constant to the rest of the vector constant structure.  */
+	unsigned size = GET_MODE_SIZE (mode);
+	gcc_assert (size <= VECTOR_CONST_BYTES);
+	gcc_assert ((VECTOR_CONST_BYTES % size) == 0);
+
+	for (size_t splat = size; splat < VECTOR_CONST_BYTES; splat += size)
+	  memcpy ((void *) &vec_const->bytes[splat],
+		  (void *) &vec_const->bytes[0],
+		  size);
+	break;
+      }
+
+      /* Vector constants, iterate each element.  On little endian systems, we
+	 have to reverse the element numbers.  Also handle VEC_DUPLICATE.  */
+    case CONST_VECTOR:
+    case VEC_DUPLICATE:
+      {
+	machine_mode ele_mode = GET_MODE_INNER (mode);
+	size_t nunits = GET_MODE_NUNITS (mode);
+	size_t size = GET_MODE_SIZE (ele_mode);
+
+	for (size_t num = 0; num < nunits; num++)
+	  {
+	    rtx ele = (GET_CODE (op) == VEC_DUPLICATE
+		       ? XEXP (op, 0)
+		       : CONST_VECTOR_ELT (op, num));
+	    size_t byte_num = (BYTES_BIG_ENDIAN
+			       ? num
+			       : nunits - 1 - num) * size;
+
+	    if (CONST_INT_P (ele))
+	      vec_const_integer (ele, ele_mode, byte_num, vec_const);
+	    else if (CONST_DOUBLE_P (ele))
+	      vec_const_floating_point (ele, ele_mode, byte_num, vec_const);
+	    else
+	      return false;
+	  }
+
+	break;
+      }
+
+      /* Any thing else, just return failure.  */
+    default:
+      return false;
+    }
+
+  /* Pack half words together.  */
+  for (size_t i = 0; i < VECTOR_CONST_16BIT; i++)
+    vec_const->h_words[i] = ((vec_const->bytes[2*i] << 8)
+			     | vec_const->bytes[2*i + 1]);
+
+  /* Pack words together.  */
+  for (size_t i = 0; i < VECTOR_CONST_32BIT; i++)
+    {
+      unsigned word = 0;
+      for (size_t j = 0; j < 4; j++)
+	word = (word << 8) | vec_const->bytes[(4*i) + j];
+
+      vec_const->words[i] = word;
+    }
+
+  /* Pack double words together.  */
+  for (size_t i = 0; i < VECTOR_CONST_64BIT; i++)
+    {
+      unsigned HOST_WIDE_INT d_word = 0;
+      for (size_t j = 0; j < 8; j++)
+	d_word = (d_word << 8) | vec_const->bytes[(8*i) + j];
+
+      vec_const->d_words[i] = d_word;
+    }
+
+  /* Remember original mode that the vector/scalar used.  */
+  vec_const->orig_mode = mode;
+
+  return true;
+}
+
+\f
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-rs6000.h"
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 6bec2bddbde..cf42b6d2058 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -314,6 +314,11 @@
 
 	 (eq_attr "type" "integer,add")
 	 (if_then_else (match_test "prefixed_paddi_p (insn)")
+		       (const_string "yes")
+		       (const_string "no"))
+
+	 (eq_attr "type" "vecperm")
+	 (if_then_else (match_test "prefixed_xxsplti_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))]
 
@@ -7759,17 +7764,17 @@
 ;;
 ;;	LWZ          LFS        LXSSP       LXSSPX     STFS       STXSSP
 ;;	STXSSPX      STW        XXLXOR      LI         FMR        XSCPSGNDP
-;;	MR           MT<x>      MF<x>       NOP
+;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP
 
 (define_insn "movsf_hardfloat"
   [(set (match_operand:SF 0 "nonimmediate_operand"
 	 "=!r,       f,         v,          wa,        m,         wY,
 	  Z,         m,         wa,         !r,        f,         wa,
-	  !r,        *c*l,      !r,         *h")
+	  !r,        *c*l,      !r,         *h,        wa")
 	(match_operand:SF 1 "input_operand"
 	 "m,         m,         wY,         Z,         f,         v,
 	  wa,        r,         j,          j,         f,         wa,
-	  r,         r,         *h,         0"))]
+	  r,         r,         *h,         0,         eD"))]
   "(register_operand (operands[0], SFmode)
    || register_operand (operands[1], SFmode))
    && TARGET_HARD_FLOAT
@@ -7791,15 +7796,16 @@
    mr %0,%1
    mt%0 %1
    mf%1 %0
-   nop"
+   nop
+   #"
   [(set_attr "type"
 	"load,       fpload,    fpload,     fpload,    fpstore,   fpstore,
 	 fpstore,    store,     veclogical, integer,   fpsimple,  fpsimple,
-	 *,          mtjmpr,    mfjmpr,     *")
+	 *,          mtjmpr,    mfjmpr,     *,         vecperm")
    (set_attr "isa"
 	"*,          *,         p9v,        p8v,       *,         p9v,
 	 p8v,        *,         *,          *,         *,         *,
-	 *,          *,         *,          *")])
+	 *,          *,         *,          *,         p10")])
 
 ;;	LWZ          LFIWZX     STW        STFIWX     MTVSRWZ    MFVSRWZ
 ;;	FMR          MR         MT%0       MF%1       NOP
@@ -8059,18 +8065,18 @@
 
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSD         STXSD       XXLOR       XXLXOR      GPR<-0
-;;           LWZ          STW         MR
+;;           LWZ          STW         MR          XXSPLTIDP
 
 
 (define_insn "*mov<mode>_hardfloat32"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
             "=m,          d,          d,          <f64_p9>,   wY,
               <f64_av>,   Z,          <f64_vsx>,  <f64_vsx>,  !r,
-              Y,          r,          !r")
+              Y,          r,          !r,         wa")
 	(match_operand:FMOVE64 1 "input_operand"
              "d,          m,          d,          wY,         <f64_p9>,
               Z,          <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
-              r,          Y,          r"))]
+              r,          Y,          r,          eD"))]
   "! TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8087,20 +8093,21 @@
    #
    #
    #
+   #
    #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, two,
-             store,       load,       two")
+             store,       load,       two,        vecperm")
    (set_attr "size" "64")
    (set_attr "length"
             "*,           *,          *,          *,          *,
              *,           *,          *,          *,          8,
-             8,           8,          8")
+             8,           8,          8,          *")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
-             *,           *,          *")])
+             *,           *,          *,          p10")])
 
 ;;           STW      LWZ     MR      G-const H-const F-const
 
@@ -8127,19 +8134,19 @@
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSDX        STXSDX      XXLOR       XXLXOR      LI 0
 ;;           STD          LD          MR          MT{CTR,LR}  MF{CTR,LR}
-;;           NOP          MFVSRD      MTVSRD
+;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP
 
 (define_insn "*mov<mode>_hardfloat64"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
            "=m,           d,          d,          <f64_p9>,   wY,
              <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
              YZ,          r,          !r,         *c*l,       !r,
-            *h,           r,          <f64_dm>")
+            *h,           r,          <f64_dm>,   wa")
 	(match_operand:FMOVE64 1 "input_operand"
             "d,           m,          d,          wY,         <f64_p9>,
              Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
              r,           YZ,         r,          r,          *h,
-             0,           <f64_dm>,   r"))]
+             0,           <f64_dm>,   r,          eD"))]
   "TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8161,18 +8168,19 @@
    mf%1 %0
    nop
    mfvsrd %0,%x1
-   mtvsrd %x0,%1"
+   mtvsrd %x0,%1
+   #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, integer,
              store,       load,       *,          mtjmpr,     mfjmpr,
-             *,           mfvsr,      mtvsr")
+             *,           mfvsr,      mtvsr,      vecperm")
    (set_attr "size" "64")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
              *,           *,          *,          *,          *,
-             *,           p8v,        p8v")])
+             *,           p8v,        p8v,        p10")])
 
 ;;           STD      LD       MR      MT<SPR> MF<SPR> G-const
 ;;           H-const  F-const  Special
@@ -9220,6 +9228,7 @@
 ;; a gpr into a fpr instead of reloading an invalid 'Y' address
 
 ;;        GPR store  GPR load   GPR move   FPR store  FPR load   FPR move
+;;        XXSPLTIDP
 ;;        GPR const  AVX store  AVX store  AVX load   AVX load   VSX move
 ;;        P9 0       P9 -1      AVX 0/-1   VSX 0      VSX -1     P9 const
 ;;        AVX const  
@@ -9227,11 +9236,13 @@
 (define_insn "*movdi_internal32"
   [(set (match_operand:DI 0 "nonimmediate_operand"
          "=Y,        r,         r,         m,         ^d,        ^d,
+          ^wa,
           r,         wY,        Z,         ^v,        $v,        ^wa,
           wa,        wa,        v,         wa,        *i,        v,
           v")
 	(match_operand:DI 1 "input_operand"
          "r,         Y,         r,         ^d,        m,         ^d,
+          eD,
           IJKnF,     ^v,        $v,        wY,        Z,         ^wa,
           Oj,        wM,        OjwM,      Oj,        wM,        wS,
           wB"))]
@@ -9246,6 +9257,7 @@
    lfd%U1%X1 %0,%1
    fmr %0,%1
    #
+   #
    stxsd %1,%0
    stxsdx %x1,%y0
    lxsd %0,%1
@@ -9260,17 +9272,20 @@
    #"
   [(set_attr "type"
          "store,     load,      *,         fpstore,   fpload,    fpsimple,
+          vecperm,
           *,         fpstore,   fpstore,   fpload,    fpload,    veclogical,
           vecsimple, vecsimple, vecsimple, veclogical,veclogical,vecsimple,
           vecsimple")
    (set_attr "size" "64")
    (set_attr "length"
          "8,         8,         8,         *,         *,         *,
+          *,
           16,        *,         *,         *,         *,         *,
           *,         *,         *,         *,         *,         8,
           *")
    (set_attr "isa"
          "*,         *,         *,         *,         *,         *,
+          p10,
           *,         p9v,       p7v,       p9v,       p7v,       *,
           p9v,       p9v,       p7v,       *,         *,         p7v,
           p7v")])
@@ -9306,6 +9321,7 @@
 })
 
 ;;	   GPR store   GPR load    GPR move
+;;	   XXSPLTIDP
 ;;	   GPR li      GPR lis     GPR pli     GPR #
 ;;	   FPR store   FPR load    FPR move
 ;;	   AVX store   AVX store   AVX load    AVX load    VSX move
@@ -9316,6 +9332,7 @@
 (define_insn "*movdi_internal64"
   [(set (match_operand:DI 0 "nonimmediate_operand"
 	  "=YZ,        r,          r,
+	   ^wa,
 	   r,          r,          r,          r,
 	   m,          ^d,         ^d,
 	   wY,         Z,          $v,         $v,         ^wa,
@@ -9325,6 +9342,7 @@
 	   ?r,         ?wa")
 	(match_operand:DI 1 "input_operand"
 	  "r,          YZ,         r,
+	   eD,
 	   I,          L,          eI,         nF,
 	   ^d,         m,          ^d,
 	   ^v,         $v,         wY,         Z,          ^wa,
@@ -9339,6 +9357,7 @@
    std%U0%X0 %1,%0
    ld%U1%X1 %0,%1
    mr %0,%1
+   #
    li %0,%1
    lis %0,%v1
    li %0,%1
@@ -9365,6 +9384,7 @@
    mtvsrd %x0,%1"
   [(set_attr "type"
 	  "store,      load,       *,
+	   vecperm,
 	   *,          *,          *,          *,
 	   fpstore,    fpload,     fpsimple,
 	   fpstore,    fpstore,    fpload,     fpload,     veclogical,
@@ -9375,6 +9395,7 @@
    (set_attr "size" "64")
    (set_attr "length"
 	  "*,          *,          *,
+	   *,
 	   *,          *,          *,          20,
 	   *,          *,          *,
 	   *,          *,          *,          *,          *,
@@ -9384,6 +9405,7 @@
 	   *,          *")
    (set_attr "isa"
 	  "*,          *,          *,
+	   p10,
 	   *,          *,          p10,        *,
 	   *,          *,          *,
 	   p9v,        p7v,        p9v,        p7v,        *,
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 9d7878f144a..1d7ce4cc94a 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -640,6 +640,10 @@ mprivileged
 Target Var(rs6000_privileged) Init(0)
 Generate code that will run in privileged state.
 
+mxxspltidp
+Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
+Generate (do not generate) XXSPLTIDP instructions.
+
 -param=rs6000-density-pct-threshold=
 Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) IntegerRange(0, 100) Param
 When costing for loop vectorization, we probably need to penalize the loop body
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index bf033e31c1c..6be3376f5d1 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1192,16 +1192,19 @@
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
+;;              XXLSPLTIDP
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        r,         we,        ?wQ,
                 ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
+                wa,
                 ?wa,       v,         <??r>,     wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        we,        r,         r,
                 wQ,        Y,         r,         r,         wE,        jwM,
+                wD,
                 ?jwM,      W,         <nW>,      v,         wZ"))]
 
   "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
@@ -1213,36 +1216,47 @@
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
                 store,     load,      store,     *,         vecsimple, vecsimple,
+                vecperm,
                 vecsimple, *,         *,         vecstore,  vecload")
    (set_attr "num_insns"
                "*,         *,         *,         2,         *,         2,
                 2,         2,         2,         2,         *,         *,
+                *,
                 *,         5,         2,         *,         *")
    (set_attr "max_prefixed_insns"
                "*,         *,         *,         *,         *,         2,
                 2,         2,         2,         2,         *,         *,
+                *,
                 *,         *,         *,         *,         *")
    (set_attr "length"
                "*,         *,         *,         8,         *,         8,
                 8,         8,         8,         8,         *,         *,
+                *,
                 *,         20,        8,         *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
                 *,         *,         *,         *,         p9v,       *,
+                p10,
                 <VSisa>,   *,         *,         *,         *")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
-;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
+;;              XXSPLTIB   VSPLTISW   VSX 0/-1
+;;              XXSPLTIDP
+;;              VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
-                wa,        v,         ?wa,       v,         <??r>,
+                wa,        v,         ?wa,
+                wa,
+                v,         <??r>,
                 wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        Y,         r,         r,
-                wE,        jwM,       ?jwM,      W,         <nW>,
+                wE,        jwM,       ?jwM,
+                eD,
+                W,         <nW>,
                 v,         wZ"))]
 
   "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
@@ -1253,15 +1267,21 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, load,      store,    *,
-                vecsimple, vecsimple, vecsimple, *,         *,
+                vecsimple, vecsimple, vecsimple,
+                vecperm,
+                *,         *,
                 vecstore,  vecload")
    (set_attr "length"
                "*,         *,         *,         16,        16,        16,
-                *,         *,         *,         20,        16,
+                *,         *,         *,
+                *,
+                20,        16,
                 *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
-                p9v,       *,         <VSisa>,   *,         *,
+                p9v,       *,         <VSisa>,
+                p10,
+                *,         *,
                 *,         *")])
 
 ;; Explicit  load/store expanders for the builtin functions
@@ -6458,6 +6478,39 @@
   [(set_attr "type" "vecperm")
    (set_attr "prefixed" "yes")])
 
+;; Generate the XXSPLTIDP instruction to support SFmode, DFmode, and DImode
+;; scalar constants and vector constants that look like DFmode floating point
+;; values where both elements are the same.  The constant has to be expressible
+;; as a SFmode constant that is not a SFmode denormal value.
+;;
+;; We don't need splitters for the 128-bit types, since the function
+;; rs6000_output_move_128bit handles the generation of XXSPLTIDP.
+(define_mode_iterator XXSPLTIDP [DI SF DF])
+
+(define_insn "*xxspltidp_<mode>_internal"
+  [(set (match_operand:XXSPLTIDP 0 "register_operand" "=wa")
+	(unspec:XXSPLTIDP [(match_operand:SI 1 "c32bit_cint_operand" "n")]
+			  UNSPEC_XXSPLTIDP))]
+  "TARGET_POWER10"
+  "xxspltidp %x0,%1"
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
+(define_split
+  [(set (match_operand:XXSPLTIDP 0 "vsx_register_operand")
+	(match_operand:XXSPLTIDP 1 "easy_vector_constant_64bit_element"))]
+  "TARGET_POWER10"
+  [(set (match_dup 0)
+	(unspec:XXSPLTIDP [(match_dup 2)] UNSPEC_XXSPLTIDP))]
+{
+  rs6000_vec_const vec_const;
+  if (!vec_const_to_bytes (operands[1], <MODE>mode, &vec_const)
+      || !vec_const_use_xxspltidp (&vec_const))
+    gcc_unreachable ();
+
+  operands[2] = GEN_INT (vec_const.xxspltidp_immediate);
+})
+
 ;; XXSPLTI32DX built-in function support
 (define_expand "xxsplti32dx_v4si"
   [(set (match_operand:V4SI 0 "register_operand" "=wa")
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 41f1850bf6e..b9dfcaf0d44 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3333,6 +3333,9 @@ The integer constant zero.
 A constant whose negation is a signed 16-bit constant.
 @end ifset
 
+@item eD
+A constant that can be loaded with the XXSPLTIDP instruction.
+
 @item eI
 A signed 34-bit integer constant if prefixed instructions are supported.
 
diff --git a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
index bd1502bb30a..dcb30e1d886 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
@@ -24,11 +24,12 @@ vector signed long long splats4(void)
         return (vector signed long long) vec_sl(mzero, mzero);
 }
 
-/* Codegen will consist of splat and shift instructions for most types.
-   If folding is enabled, the vec_sl tests using vector long long type will
-   generate a lvx instead of a vspltisw+vsld pair.  */
+/* Codegen will consist of splat and shift instructions for most types.  If
+   folding is enabled, the vec_sl tests using vector long long type will
+   generate a lvx instead of a vspltisw+vsld pair.  On power10, it will
+   generate a xxspltidp instruction instead of the lvx.  */
 
 /* { dg-final { scan-assembler-times {\mvspltis[bhw]\M} 0 } } */
 /* { dg-final { scan-assembler-times {\mvsl[bhwd]\M} 0 } } */
-/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M|\mxxspltidp\M} 2 } } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
new file mode 100644
index 00000000000..8f6e176f9af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+double
+scalar_double_0 (void)
+{
+  return 0.0;			/* XXSPLTIB or XXLXOR.  */
+}
+
+double
+scalar_double_1 (void)
+{
+  return 1.0;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+double
+scalar_double_m0 (void)
+{
+  return -0.0;			/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_nan (void)
+{
+  return __builtin_nan ("");	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_inf (void)
+{
+  return __builtin_inf ();	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inf ();
+}
+#endif
+
+double
+scalar_double_pi (void)
+{
+  return M_PI;			/* PLFD.  */
+}
+
+double
+scalar_double_denorm (void)
+{
+  return 0x1p-149f;		/* PLFD.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c
new file mode 100644
index 00000000000..75714d0b11d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c
@@ -0,0 +1,70 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test generating DImode constants that have the same bit pattern as DFmode
+   constants that can be loaded with the XXSPLTIDP instruction with the ISA 3.1
+   (power10).  We use asm to force the value into vector registers.  */
+
+double
+scalar_0 (void)
+{
+  /* XXSPLTIB or XXLXOR.  */
+  double d;
+  long long ll = 0;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+double
+scalar_1 (void)
+{
+  /* VSPLTISW/VUPKLSW or XXSPLTIB/VEXTSB2D.  */
+  double d;
+  long long ll = 1;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated
+   with XXSPLTIDP.  */
+double
+scalar_float_neg_0 (void)
+{
+  /* XXSPLTIDP.  */
+  double d;
+  long long ll = 0x8000000000000000LL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with
+   XXSPLTIDP.  */
+double
+scalar_float_1_0 (void)
+{
+  /* XXSPLTIDP.  */
+  double d;
+  long long ll = 0x3ff0000000000000LL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated
+   with XXSPLTIDP.  */
+double
+scalar_pi (void)
+{
+  /* PLXV.  */
+  double d;
+  long long ll = 0x400921fb54442d18LL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
new file mode 100644
index 00000000000..72504bdfbbd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating SFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+float
+scalar_float_0 (void)
+{
+  return 0.0f;			/* XXSPLTIB or XXLXOR.  */
+}
+
+float
+scalar_float_1 (void)
+{
+  return 1.0f;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+float
+scalar_float_m0 (void)
+{
+  return -0.0f;			/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_nan (void)
+{
+  return __builtin_nanf ("");	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_inf (void)
+{
+  return __builtin_inff ();	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inff ();
+}
+#endif
+
+float
+scalar_float_pi (void)
+{
+  return (float)M_PI;		/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_denorm (void)
+{
+  return 0x1p-149f;		/* PLFS.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
new file mode 100644
index 00000000000..82ffc86f8aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
@@ -0,0 +1,64 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating V2DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+vector double
+v2df_double_0 (void)
+{
+  return (vector double) { 0.0, 0.0 };			/* XXSPLTIB or XXLXOR.  */
+}
+
+vector double
+v2df_double_1 (void)
+{
+  return (vector double) { 1.0, 1.0 };			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+vector double
+v2df_double_m0 (void)
+{
+  return (vector double) { -0.0, -0.0 };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_nan (void)
+{
+  return (vector double) { __builtin_nan (""),
+			   __builtin_nan ("") };	/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_inf (void)
+{
+  return (vector double) { __builtin_inf (),
+			   __builtin_inf () };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_m_inf (void)
+{
+  return (vector double) { - __builtin_inf (),
+			   - __builtin_inf () };	/* XXSPLTIDP.  */
+}
+#endif
+
+vector double
+v2df_double_pi (void)
+{
+  return (vector double) { M_PI, M_PI };		/* PLVX.  */
+}
+
+vector double
+v2df_double_denorm (void)
+{
+  return (vector double) { (double)0x1p-149f,
+			   (double)0x1p-149f };		/* PLVX.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
new file mode 100644
index 00000000000..4d44f943d26
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test generating V2DImode constants that have the same bit pattern as
+   V2DFmode constants that can be loaded with the XXSPLTIDP instruction with
+   the ISA 3.1 (power10).  */
+
+vector long long
+vector_0 (void)
+{
+  /* XXSPLTIB or XXLXOR.  */
+  return (vector long long) { 0LL, 0LL };
+}
+
+vector long long
+vector_1 (void)
+{
+  /* XXSPLTIB and VEXTSB2D.  */
+  return (vector long long) { 1LL, 1LL };
+}
+
+/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated
+   with XXSPLTISDP.  */
+vector long long
+vector_float_neg_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x8000000000000000LL, 0x8000000000000000LL };
+}
+
+/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with
+   XXSPLTISDP.  */
+vector long long
+vector_float_1_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x3ff0000000000000LL, 0x3ff0000000000000LL };
+}
+
+/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated
+   with XXSPLTIDP.  */
+vector long long
+scalar_pi (void)
+{
+  /* PLXV.  */
+  return (vector long long) { 0x400921fb54442d18LL, 0x400921fb54442d18LL };
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [gcc(refs/users/meissner/heads/work071)] Generate XXSPLTIDP on power10.
@ 2021-10-14  3:24 Michael Meissner
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Meissner @ 2021-10-14  3:24 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:59aa49d6fe6278fa865e4c836ef9e19fa2fd5ae9

commit 59aa49d6fe6278fa865e4c836ef9e19fa2fd5ae9
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Wed Oct 13 23:24:22 2021 -0400

    Generate XXSPLTIDP on power10.
    
    This patch implements XXSPLTIDP support for SF, DF, and DI scalar constants and
    V2DF and V2DI vector constants.  The XXSPLTIDP instruction is given a 32-bit
    immediate that is converted to a vector of two DFmode constants.  The immediate
    is in SFmode format, so only constants that fit as SFmode values can be loaded
    with XXSPLTIDP.
    
    I added a new constraint (eD) to match scalar and vector constants that can be
    loaded with the XXSPLTIDP instruction.
    
    I have added a temporary switch (-mxxspltidp) to control whether or not the
    XXSPLTIDP instruction is generated.
    
    I added 5 new tests to test loading up SF/DF/DI scalar and V2DI/V2DF vector
    constants.
    
    This patch updates the previous patch to take into account the comments
    from the patch review.   The main change is that this patch does is map
    each vector and scalar to provide all of bits and then match those bits to
    see if the XXSPLTIDP instruction can generate the bits necessary, even if
    the values in the vector aren't DFmode constants.
    
    Some framework is provided in this patch which will also be used in future
    patches adding LXVKQ and XXSPLTIW support (possibly XXSPLTI32DX).  This way for
    instance in easy_fp_constant and easy_vector_constant when we first check
    whether the constant can be generated by XXSPLTIDP, we don't have to build the
    128-bits of the vector for each successive test.
    
    While the PowerPC is currently limited to 128-bit vectors, I have written
    the code so it can be changed in the future if we ever have larger vection
    sizes.
    
    2021-10-13  Michael Meissner  <meissner@the-meissners.org>
    
    gcc/
    
            * config/rs6000/constraints.md (eD): New constraint.
            * config/rs6000/predicates.md (easy_fp_constant): Add support for
            generating XXSPLTIDP.
            (easy_vector_constant_64bit_element): New predicate.
            (easy_vector_constant): Add support for generating XXSPLTIDP.
            * config/rs6000/rs6000-protos.h (prefixed_xxsplti_p): New
            declaration.
            (VECTOR_CONST_*): New macros.
            (rs6000_vec_const): New structure to hold information about vector
            constants.
            (vec_const_to_bytes): New function.
            (vec_const_use_xxspltidp): New function.
            * config/rs6000/rs6000.c (output_vec_const_move): Add support for
            XXSPLTIDP.
            (prefixed_xxsplti_p): New function.
            (vec_const_integer): New helper function.
            (vec_const_floating_poin): New helper function.
            (vec_const_use_xxspltidp): New function.
            (vec_const_to_bytes): New function.
            * config/rs6000/rs6000.md (prefixed attribute): Add support for
            insns that generate XXSPLTIDP.
            (movsf_hardfloat): Add support for XXSPLTIDP.
            (mov<mode>_hardfloat32, FMOVE64 iterator): Likewise.
            (mov<mode>_hardfloat64, FMOVE64 iterator): Likewise.
            (movdi_internal32): Likewise.
            (movdi_internal64): Likewise.
            * config/rs6000/rs6000.opt (-mxxspltidp): New debug option.
            * config/rs6000/vsx.md (vsx_mov<mode>_64bit): Add support for
            XXSPLTIDP.
            (vsx_mov<mode>_32bit): Likewise.
            (XXSPLTIDP): New mode iterator.
            (xxspltidp_<mode>_internal): New insn.
            (XXSPLTIDP splitters): New splitters for XXSPLTIDP.
            * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
            eD constraint.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/pr86731-fwrapv-longlong.c: Update insn
            regex for power10.
            * gcc.target/powerpc/vec-splat-constant-df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-di.c: New test.
            * gcc.target/powerpc/vec-splat-constant-sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2di.c: New test.

Diff:
---
 gcc/config/rs6000/constraints.md                   |   5 +
 gcc/config/rs6000/predicates.md                    |  40 +++
 gcc/config/rs6000/rs6000-protos.h                  |  26 ++
 gcc/config/rs6000/rs6000.c                         | 325 +++++++++++++++++++++
 gcc/config/rs6000/rs6000.md                        |  58 ++--
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           |  58 +++-
 gcc/doc/md.texi                                    |   3 +
 .../gcc.target/powerpc/pr86731-fwrapv-longlong.c   |   9 +-
 .../gcc.target/powerpc/vec-splat-constant-df.c     |  60 ++++
 .../gcc.target/powerpc/vec-splat-constant-di.c     |  70 +++++
 .../gcc.target/powerpc/vec-splat-constant-sf.c     |  60 ++++
 .../gcc.target/powerpc/vec-splat-constant-v2df.c   |  64 ++++
 .../gcc.target/powerpc/vec-splat-constant-v2di.c   |  50 ++++
 14 files changed, 796 insertions(+), 36 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index c8cff1a3038..d26c8940104 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -208,6 +208,11 @@
   (and (match_code "const_int")
        (match_test "((- (unsigned HOST_WIDE_INT) ival) + 0x8000) < 0x10000")))
 
+;; A scalar or vector constant that can be loaded with the XXSPLTIDP instruction.
+(define_constraint "eD"
+  "A constant that can be loaded with the XXSPLTIDP instruction."
+  (match_operand 0 "easy_vector_constant_64bit_element"))
+
 ;; 34-bit signed integer constant
 (define_constraint "eI"
   "A signed 34-bit integer constant if prefixed instructions are supported."
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 956e42bc514..ddad7ca3ae9 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -601,6 +601,16 @@
   if (TARGET_VSX && op == CONST0_RTX (mode))
     return 1;
 
+  /* See if the constant can be generated with the ISA 3.1
+     instructions.  */
+  rs6000_vec_const vec_const;
+
+  if (vec_const_to_bytes (op, mode, &vec_const))
+    {
+      if (vec_const_use_xxspltidp (&vec_const))
+	return true;
+    }
+
   /* Otherwise consider floating point constants hard, so that the
      constant gets pushed to memory during the early RTL phases.  This
      has the advantage that double precision constants that can be
@@ -609,6 +619,26 @@
    return 0;
 })
 
+;; Return 1 if the operand is a 64-bit vector constant that can be loaded via
+;; the XXSPLTIDP instruction, which takes a SFmode value and produces a
+;; V2DFmode or V2DI result.
+
+(define_predicate "easy_vector_constant_64bit_element"
+  (match_code "const_vector,vec_duplicate,const_int,const_double")
+{
+  rs6000_vec_const vec_const;
+
+  /* Can we do the XXSPLTIDP instruction?  */
+  if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  /* Convert the vector constant to bytes.  */
+  if (!vec_const_to_bytes (op, mode, &vec_const))
+    return false;
+
+  return vec_const_use_xxspltidp (&vec_const);
+})
+
 ;; Return 1 if the operand is a constant that can loaded with a XXSPLTIB
 ;; instruction and then a VUPKHSB, VECSB2W or VECSB2D instruction.
 
@@ -657,6 +687,16 @@
           && xxspltib_constant_p (op, mode, &num_insns, &value))
 	return true;
 
+      /* See if the constant can be generated with the ISA 3.1
+         instructions.  */
+      rs6000_vec_const vec_const;
+
+      if (vec_const_to_bytes (op, mode, &vec_const))
+	{
+	  if (vec_const_use_xxspltidp (&vec_const))
+	    return true;
+	}
+
       return easy_altivec_constant (op, mode);
     }
 
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 14f6b313105..da9502bcb33 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -198,6 +198,7 @@ enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode);
 extern bool prefixed_load_p (rtx_insn *);
 extern bool prefixed_store_p (rtx_insn *);
 extern bool prefixed_paddi_p (rtx_insn *);
+extern bool prefixed_xxsplti_p (rtx_insn *);
 extern void rs6000_asm_output_opcode (FILE *);
 extern void output_pcrel_opt_reloc (rtx);
 extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int);
@@ -222,6 +223,31 @@ address_is_prefixed (rtx addr,
   return (iform == INSN_FORM_PREFIXED_NUMERIC
 	  || iform == INSN_FORM_PCREL_LOCAL);
 }
+
+/* Functions and data structures relating to 128-bit vector constants.  All
+   fields are kept in big endian order.  */
+#define VECTOR_CONST_BITS	128
+#define VECTOR_CONST_BYTES	(VECTOR_CONST_BITS / 8)
+#define VECTOR_CONST_16BIT	(VECTOR_CONST_BITS / 16)
+#define VECTOR_CONST_32BIT	(VECTOR_CONST_BITS / 32)
+#define VECTOR_CONST_64BIT	(VECTOR_CONST_BITS / 64)
+
+typedef struct {
+  /* Vector constant as various sized items.  */
+  unsigned char bytes[VECTOR_CONST_BYTES];
+  unsigned short h_words[VECTOR_CONST_16BIT];
+  unsigned int words[VECTOR_CONST_32BIT];
+  unsigned HOST_WIDE_INT d_words[VECTOR_CONST_64BIT];
+  machine_mode orig_mode;		/* Original mode.  */
+  enum rtx_code orig_code;		/* Original rtx code.  */
+  bool is_xxspltidp;			/* Use XXSPLTIDP to load constant.  */
+  machine_mode xxspltidp_mode;		/* Mode to use for XXSPLTIDP.  */
+  unsigned int xxspltidp_immediate;	/* Immediate value for XXSPLTIDP.  */
+  bool is_prefixed;			/* Prefixed instruction used.  */
+} rs6000_vec_const;
+
+extern bool vec_const_to_bytes (rtx, machine_mode, rs6000_vec_const *);
+extern bool vec_const_use_xxspltidp (rs6000_vec_const *);
 #endif /* RTX_CODE */
 
 #ifdef TREE_CODE
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index acba4d9f26c..05b2691d38a 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6990,6 +6990,16 @@ output_vec_const_move (rtx *operands)
 	    gcc_unreachable ();
 	}
 
+      rs6000_vec_const vec_const;
+      if (vec_const_to_bytes (vec, mode, &vec_const))
+	{
+	  if (vec_const_use_xxspltidp (&vec_const))
+	    {
+	      operands[2] = GEN_INT (vec_const.xxspltidp_immediate);
+	      return "xxspltidp %x0,%2";
+	    }
+	}
+
       if (TARGET_P9_VECTOR
 	  && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
 	{
@@ -26724,6 +26734,44 @@ prefixed_paddi_p (rtx_insn *insn)
   return (iform == INSN_FORM_PCREL_EXTERNAL || iform == INSN_FORM_PCREL_LOCAL);
 }
 
+/* Whether a permute type instruction is a prefixed XXSPLTI* instruction.
+   This is called from the prefixed attribute processing.  */
+
+bool
+prefixed_xxsplti_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  if (!set)
+    return false;
+
+  rtx dest = SET_DEST (set);
+  rtx src = SET_SRC (set);
+  machine_mode mode = GET_MODE (dest);
+
+  if (!REG_P (dest) && !SUBREG_P (dest))
+    return false;
+
+  if (GET_CODE (src) == UNSPEC)
+    {
+      int unspec = XINT (src, 1);
+      return (unspec == UNSPEC_XXSPLTIW
+	      || unspec == UNSPEC_XXSPLTIDP
+	      || unspec == UNSPEC_XXSPLTI32DX);
+    }
+
+  rs6000_vec_const vec_const;
+  if (vec_const_to_bytes (src, mode, &vec_const))
+    {
+      if (vec_const.is_prefixed)
+	return true;
+
+      if (vec_const_use_xxspltidp (&vec_const))
+	return true;
+    }
+
+  return false;
+}
+
 /* Whether the next instruction needs a 'p' prefix issued before the
    instruction is printed out.  */
 static bool prepend_p_to_next_insn;
@@ -28587,6 +28635,283 @@ rs6000_output_addr_vec_elt (FILE *file, int value)
   fprintf (file, "\n");
 }
 
+\f
+/* Copy an integer constant to the vector constant structure.  */
+
+static void
+vec_const_integer (rtx op,
+		   machine_mode mode,
+		   size_t byte_num,
+		   rs6000_vec_const *vec_const)
+{
+  unsigned HOST_WIDE_INT uvalue = UINTVAL (op);
+  unsigned bitsize = GET_MODE_BITSIZE (mode);
+
+  for (int shift = bitsize - 8; shift >= 0; shift -= 8)
+    vec_const->bytes[byte_num++] = (uvalue >> shift) & 0xff;
+}
+
+/* Copy an floating point constant to the vector constant structure.  */
+
+static void
+vec_const_floating_point (rtx op,
+			  machine_mode mode,
+			  size_t byte_num,
+			  rs6000_vec_const *vec_const)
+{
+  unsigned bitsize = GET_MODE_BITSIZE (mode);
+  unsigned num_words = bitsize / 32;
+  const REAL_VALUE_TYPE *rtype = CONST_DOUBLE_REAL_VALUE (op);
+  long real_words[VECTOR_CONST_32BIT];
+
+  /* Make sure we don't overflow the real_words array and that it is
+     filled completely.  */
+  gcc_assert (bitsize <= VECTOR_CONST_BITS && (bitsize % 32) == 0);
+
+  real_to_target (real_words, rtype, mode);
+
+  /* Iterate over each 32-bit word in the floating point constant.  The
+     real_to_target function puts out words in endian fashion.  We need
+     to arrange so the words are written in big endian order.  */
+  for (unsigned num = 0; num < num_words; num++)
+    {
+      unsigned endian_num = (BYTES_BIG_ENDIAN
+			     ? num
+			     : num_words - 1 - num);
+
+      unsigned uvalue = real_words[endian_num];
+      for (int shift = 32 - 8; shift >= 0; shift -= 8)
+	vec_const->bytes[byte_num++] = (uvalue >> shift) & 0xff;
+    }
+}
+
+/* Determine if a vector constant can be loaded with XXSPLTIDP.  If so,
+   fill out the fields used to generate the instruction.  */
+
+bool
+vec_const_use_xxspltidp (rs6000_vec_const *vec_const)
+{
+  if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  /* Make sure that the two 64-bit segments are the same.  */
+  unsigned HOST_WIDE_INT df_upper = vec_const->d_words[0];
+  unsigned HOST_WIDE_INT df_lower = vec_const->d_words[1];
+  if (df_upper != df_lower)
+    return false;
+  
+  /* Avoid values that are easy to create with other instructions (0.0 for
+     floating point, and values that can be loaded with XXSPLTIB and sign
+     extension for integer.  */
+  if (df_upper == 0)
+    return false;
+
+  machine_mode mode = vec_const->orig_mode;
+  if (mode == VOIDmode)
+    mode = DImode;
+
+  if (!FLOAT_MODE_P (mode) && IN_RANGE (df_upper, -128, 127))
+    return false;
+
+  /* Avoid values that look like DFmode NaN's, except for the normal NaN bit
+     pattern and signalling NaN bit pattern.  Recognize infinity and negative
+     infinity.
+
+     The IEEE 754 64-bit floating format has 1 bit for sign, 11 bits for the
+     exponent, and 52 bits for the mantissa (not counting the hidden bit used
+     for normal numbers).  NaN values have the exponent set to all 1 bits, and
+     the mantissa non-zero (mantissa == 0 is infinity).  */
+
+#define VECTOR_CONST_DF_NAN	HOST_WIDE_INT_UC (0x7ff8000000000000)
+#define VECTOR_CONST_DF_NANS	HOST_WIDE_INT_UC (0x7ff4000000000000)
+#define VECTOR_CONST_DF_INF	HOST_WIDE_INT_UC (0x7ff0000000000000)
+#define VECTOR_CONST_DF_NEG_INF	HOST_WIDE_INT_UC (0xfff0000000000000)
+
+  if (df_upper != VECTOR_CONST_DF_NAN
+      && df_upper != VECTOR_CONST_DF_NANS
+      && df_upper != VECTOR_CONST_DF_INF
+      && df_upper != VECTOR_CONST_DF_NEG_INF)
+    {
+      int df_exponent = (df_upper >> 52) & 0x7ff;
+      unsigned HOST_WIDE_INT df_mantissa
+	= df_upper & ((HOST_WIDE_INT_1U << 52) - HOST_WIDE_INT_1U);
+
+      if (df_exponent == 0x7ff && df_mantissa != 0)	/* other NaNs.  */
+	return false;
+
+      /* Avoid values that are DFmode subnormal values.  Subnormal numbers have
+	 the exponent all 0 bits, and the mantissa non-zero.  If the value is
+	 subnormal, then the hidden bit in the mantissa is not set.  */
+      if (df_exponent == 0 && df_mantissa != 0)		/* subnormal.  */
+	return false;
+    }
+
+  /* Change the representation to DFmode constant.  */
+  long df_words[2] = { vec_const->words[0], vec_const->words[1] };
+
+  /* real_from_target takes the target words in  target order.  */
+  if (!BYTES_BIG_ENDIAN)
+    std::swap (df_words[0], df_words[1]);
+
+  REAL_VALUE_TYPE rv_type;
+  real_from_target (&rv_type, df_words, DFmode);
+
+  const REAL_VALUE_TYPE *rv = &rv_type;
+
+  /* Validate that the number can be stored as a SFmode value.  */
+  if (!exact_real_truncate (SFmode, rv))
+    return false;
+
+  /* Validate that the number is not a SFmode subnormal value (exponent is 0,
+     mantissa field is non-zero) which is undefined for the XXSPLTIDP
+     instruction.  */
+  long sf_value;
+  real_to_target (&sf_value, rv, SFmode);
+
+  /* IEEE 754 32-bit values have 1 bit for the sign, 8 bits for the exponent,
+     and 23 bits for the mantissa.  Subnormal numbers have the exponent all
+     0 bits, and the mantissa non-zero.  */
+  long sf_exponent = (sf_value >> 23) & 0xFF;
+  long sf_mantissa = sf_value & 0x7FFFFF;
+
+  if (sf_exponent == 0 && sf_mantissa != 0)
+    return false;
+
+  /* Record the information in the vec_const structure for XXSPLTIDP.  */
+  vec_const->is_xxspltidp = true;
+  vec_const->is_prefixed = true;
+  vec_const->xxspltidp_immediate = sf_value;
+  vec_const->xxspltidp_mode = FLOAT_MODE_P (mode) ? E_DFmode : E_DImode;
+
+  return true;
+}
+
+/* Convert a vector constant to an internal structure, breaking it out to
+   bytes, half words, words, and double words.  Return true if we have
+   successfully broken it out.  */
+
+bool
+vec_const_to_bytes (rtx op,
+		    machine_mode mode,
+		    rs6000_vec_const *vec_const)
+{
+  /* Initialize vec const structure.  */
+  memset ((void *)vec_const, 0, sizeof (rs6000_vec_const));
+
+  /* If we don't know the size of the constant, punt.  */
+  if (mode == VOIDmode)
+    return false;
+
+  switch (GET_CODE (op))
+    {
+      /* Integer constants, default to double word.  */
+    case CONST_INT:
+      {
+	if (mode == VOIDmode)
+	  mode = DImode;
+
+	vec_const_integer (op, mode, 0, vec_const);
+
+	/* Splat the constant to the rest of the vector constant structure.  */
+	unsigned size = GET_MODE_SIZE (mode);
+	gcc_assert (size <= VECTOR_CONST_BYTES);
+	gcc_assert ((VECTOR_CONST_BYTES % size) == 0);
+
+	for (size_t splat = size; splat < VECTOR_CONST_BYTES; splat += size)
+	  memcpy ((void *) &vec_const->bytes[splat],
+		  (void *) &vec_const->bytes[0],
+		  size);
+	break;
+      }
+
+      /* Floating point constants.  */
+    case CONST_DOUBLE:
+      {
+	/* SFmode stored as scalars is stored in DFmode format.  */
+	if (mode == SFmode)
+	  mode = DFmode;
+
+	vec_const_floating_point (op, mode, 0, vec_const);
+
+	/* Splat the constant to the rest of the vector constant structure.  */
+	unsigned size = GET_MODE_SIZE (mode);
+	gcc_assert (size <= VECTOR_CONST_BYTES);
+	gcc_assert ((VECTOR_CONST_BYTES % size) == 0);
+
+	for (size_t splat = size; splat < VECTOR_CONST_BYTES; splat += size)
+	  memcpy ((void *) &vec_const->bytes[splat],
+		  (void *) &vec_const->bytes[0],
+		  size);
+	break;
+      }
+
+      /* Vector constants, iterate each element.  On little endian systems, we
+	 have to reverse the element numbers.  Also handle VEC_DUPLICATE.  */
+    case CONST_VECTOR:
+    case VEC_DUPLICATE:
+      {
+	machine_mode ele_mode = GET_MODE_INNER (mode);
+	size_t nunits = GET_MODE_NUNITS (mode);
+	size_t size = GET_MODE_SIZE (ele_mode);
+
+	for (size_t num = 0; num < nunits; num++)
+	  {
+	    rtx ele = (GET_CODE (op) == VEC_DUPLICATE
+		       ? XEXP (op, 0)
+		       : CONST_VECTOR_ELT (op, num));
+	    size_t byte_num = (BYTES_BIG_ENDIAN
+			       ? num
+			       : nunits - 1 - num) * size;
+
+	    if (CONST_INT_P (ele))
+	      vec_const_integer (ele, ele_mode, byte_num, vec_const);
+	    else if (CONST_DOUBLE_P (ele))
+	      vec_const_floating_point (ele, ele_mode, byte_num, vec_const);
+	    else
+	      return false;
+	  }
+
+	break;
+      }
+
+      /* Any thing else, just return failure.  */
+    default:
+      return false;
+    }
+
+  /* Pack half words together.  */
+  for (size_t i = 0; i < VECTOR_CONST_16BIT; i++)
+    vec_const->h_words[i] = ((vec_const->bytes[2*i] << 8)
+			     | vec_const->bytes[2*i + 1]);
+
+  /* Pack words together.  */
+  for (size_t i = 0; i < VECTOR_CONST_32BIT; i++)
+    {
+      unsigned word = 0;
+      for (size_t j = 0; j < 4; j++)
+	word = (word << 8) | vec_const->bytes[(4*i) + j];
+
+      vec_const->words[i] = word;
+    }
+
+  /* Pack double words together.  */
+  for (size_t i = 0; i < VECTOR_CONST_64BIT; i++)
+    {
+      unsigned HOST_WIDE_INT d_word = 0;
+      for (size_t j = 0; j < 8; j++)
+	d_word = (d_word << 8) | vec_const->bytes[(8*i) + j];
+
+      vec_const->d_words[i] = d_word;
+    }
+
+  /* Remember original mode and code.  */
+  vec_const->orig_mode = mode;
+  vec_const->orig_code = GET_CODE (op);
+
+  return true;
+}
+
+\f
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-rs6000.h"
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 6bec2bddbde..cf42b6d2058 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -314,6 +314,11 @@
 
 	 (eq_attr "type" "integer,add")
 	 (if_then_else (match_test "prefixed_paddi_p (insn)")
+		       (const_string "yes")
+		       (const_string "no"))
+
+	 (eq_attr "type" "vecperm")
+	 (if_then_else (match_test "prefixed_xxsplti_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))]
 
@@ -7759,17 +7764,17 @@
 ;;
 ;;	LWZ          LFS        LXSSP       LXSSPX     STFS       STXSSP
 ;;	STXSSPX      STW        XXLXOR      LI         FMR        XSCPSGNDP
-;;	MR           MT<x>      MF<x>       NOP
+;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP
 
 (define_insn "movsf_hardfloat"
   [(set (match_operand:SF 0 "nonimmediate_operand"
 	 "=!r,       f,         v,          wa,        m,         wY,
 	  Z,         m,         wa,         !r,        f,         wa,
-	  !r,        *c*l,      !r,         *h")
+	  !r,        *c*l,      !r,         *h,        wa")
 	(match_operand:SF 1 "input_operand"
 	 "m,         m,         wY,         Z,         f,         v,
 	  wa,        r,         j,          j,         f,         wa,
-	  r,         r,         *h,         0"))]
+	  r,         r,         *h,         0,         eD"))]
   "(register_operand (operands[0], SFmode)
    || register_operand (operands[1], SFmode))
    && TARGET_HARD_FLOAT
@@ -7791,15 +7796,16 @@
    mr %0,%1
    mt%0 %1
    mf%1 %0
-   nop"
+   nop
+   #"
   [(set_attr "type"
 	"load,       fpload,    fpload,     fpload,    fpstore,   fpstore,
 	 fpstore,    store,     veclogical, integer,   fpsimple,  fpsimple,
-	 *,          mtjmpr,    mfjmpr,     *")
+	 *,          mtjmpr,    mfjmpr,     *,         vecperm")
    (set_attr "isa"
 	"*,          *,         p9v,        p8v,       *,         p9v,
 	 p8v,        *,         *,          *,         *,         *,
-	 *,          *,         *,          *")])
+	 *,          *,         *,          *,         p10")])
 
 ;;	LWZ          LFIWZX     STW        STFIWX     MTVSRWZ    MFVSRWZ
 ;;	FMR          MR         MT%0       MF%1       NOP
@@ -8059,18 +8065,18 @@
 
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSD         STXSD       XXLOR       XXLXOR      GPR<-0
-;;           LWZ          STW         MR
+;;           LWZ          STW         MR          XXSPLTIDP
 
 
 (define_insn "*mov<mode>_hardfloat32"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
             "=m,          d,          d,          <f64_p9>,   wY,
               <f64_av>,   Z,          <f64_vsx>,  <f64_vsx>,  !r,
-              Y,          r,          !r")
+              Y,          r,          !r,         wa")
 	(match_operand:FMOVE64 1 "input_operand"
              "d,          m,          d,          wY,         <f64_p9>,
               Z,          <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
-              r,          Y,          r"))]
+              r,          Y,          r,          eD"))]
   "! TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8087,20 +8093,21 @@
    #
    #
    #
+   #
    #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, two,
-             store,       load,       two")
+             store,       load,       two,        vecperm")
    (set_attr "size" "64")
    (set_attr "length"
             "*,           *,          *,          *,          *,
              *,           *,          *,          *,          8,
-             8,           8,          8")
+             8,           8,          8,          *")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
-             *,           *,          *")])
+             *,           *,          *,          p10")])
 
 ;;           STW      LWZ     MR      G-const H-const F-const
 
@@ -8127,19 +8134,19 @@
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSDX        STXSDX      XXLOR       XXLXOR      LI 0
 ;;           STD          LD          MR          MT{CTR,LR}  MF{CTR,LR}
-;;           NOP          MFVSRD      MTVSRD
+;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP
 
 (define_insn "*mov<mode>_hardfloat64"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
            "=m,           d,          d,          <f64_p9>,   wY,
              <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
              YZ,          r,          !r,         *c*l,       !r,
-            *h,           r,          <f64_dm>")
+            *h,           r,          <f64_dm>,   wa")
 	(match_operand:FMOVE64 1 "input_operand"
             "d,           m,          d,          wY,         <f64_p9>,
              Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
              r,           YZ,         r,          r,          *h,
-             0,           <f64_dm>,   r"))]
+             0,           <f64_dm>,   r,          eD"))]
   "TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8161,18 +8168,19 @@
    mf%1 %0
    nop
    mfvsrd %0,%x1
-   mtvsrd %x0,%1"
+   mtvsrd %x0,%1
+   #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, integer,
              store,       load,       *,          mtjmpr,     mfjmpr,
-             *,           mfvsr,      mtvsr")
+             *,           mfvsr,      mtvsr,      vecperm")
    (set_attr "size" "64")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
              *,           *,          *,          *,          *,
-             *,           p8v,        p8v")])
+             *,           p8v,        p8v,        p10")])
 
 ;;           STD      LD       MR      MT<SPR> MF<SPR> G-const
 ;;           H-const  F-const  Special
@@ -9220,6 +9228,7 @@
 ;; a gpr into a fpr instead of reloading an invalid 'Y' address
 
 ;;        GPR store  GPR load   GPR move   FPR store  FPR load   FPR move
+;;        XXSPLTIDP
 ;;        GPR const  AVX store  AVX store  AVX load   AVX load   VSX move
 ;;        P9 0       P9 -1      AVX 0/-1   VSX 0      VSX -1     P9 const
 ;;        AVX const  
@@ -9227,11 +9236,13 @@
 (define_insn "*movdi_internal32"
   [(set (match_operand:DI 0 "nonimmediate_operand"
          "=Y,        r,         r,         m,         ^d,        ^d,
+          ^wa,
           r,         wY,        Z,         ^v,        $v,        ^wa,
           wa,        wa,        v,         wa,        *i,        v,
           v")
 	(match_operand:DI 1 "input_operand"
          "r,         Y,         r,         ^d,        m,         ^d,
+          eD,
           IJKnF,     ^v,        $v,        wY,        Z,         ^wa,
           Oj,        wM,        OjwM,      Oj,        wM,        wS,
           wB"))]
@@ -9246,6 +9257,7 @@
    lfd%U1%X1 %0,%1
    fmr %0,%1
    #
+   #
    stxsd %1,%0
    stxsdx %x1,%y0
    lxsd %0,%1
@@ -9260,17 +9272,20 @@
    #"
   [(set_attr "type"
          "store,     load,      *,         fpstore,   fpload,    fpsimple,
+          vecperm,
           *,         fpstore,   fpstore,   fpload,    fpload,    veclogical,
           vecsimple, vecsimple, vecsimple, veclogical,veclogical,vecsimple,
           vecsimple")
    (set_attr "size" "64")
    (set_attr "length"
          "8,         8,         8,         *,         *,         *,
+          *,
           16,        *,         *,         *,         *,         *,
           *,         *,         *,         *,         *,         8,
           *")
    (set_attr "isa"
          "*,         *,         *,         *,         *,         *,
+          p10,
           *,         p9v,       p7v,       p9v,       p7v,       *,
           p9v,       p9v,       p7v,       *,         *,         p7v,
           p7v")])
@@ -9306,6 +9321,7 @@
 })
 
 ;;	   GPR store   GPR load    GPR move
+;;	   XXSPLTIDP
 ;;	   GPR li      GPR lis     GPR pli     GPR #
 ;;	   FPR store   FPR load    FPR move
 ;;	   AVX store   AVX store   AVX load    AVX load    VSX move
@@ -9316,6 +9332,7 @@
 (define_insn "*movdi_internal64"
   [(set (match_operand:DI 0 "nonimmediate_operand"
 	  "=YZ,        r,          r,
+	   ^wa,
 	   r,          r,          r,          r,
 	   m,          ^d,         ^d,
 	   wY,         Z,          $v,         $v,         ^wa,
@@ -9325,6 +9342,7 @@
 	   ?r,         ?wa")
 	(match_operand:DI 1 "input_operand"
 	  "r,          YZ,         r,
+	   eD,
 	   I,          L,          eI,         nF,
 	   ^d,         m,          ^d,
 	   ^v,         $v,         wY,         Z,          ^wa,
@@ -9339,6 +9357,7 @@
    std%U0%X0 %1,%0
    ld%U1%X1 %0,%1
    mr %0,%1
+   #
    li %0,%1
    lis %0,%v1
    li %0,%1
@@ -9365,6 +9384,7 @@
    mtvsrd %x0,%1"
   [(set_attr "type"
 	  "store,      load,       *,
+	   vecperm,
 	   *,          *,          *,          *,
 	   fpstore,    fpload,     fpsimple,
 	   fpstore,    fpstore,    fpload,     fpload,     veclogical,
@@ -9375,6 +9395,7 @@
    (set_attr "size" "64")
    (set_attr "length"
 	  "*,          *,          *,
+	   *,
 	   *,          *,          *,          20,
 	   *,          *,          *,
 	   *,          *,          *,          *,          *,
@@ -9384,6 +9405,7 @@
 	   *,          *")
    (set_attr "isa"
 	  "*,          *,          *,
+	   p10,
 	   *,          *,          p10,        *,
 	   *,          *,          *,
 	   p9v,        p7v,        p9v,        p7v,        *,
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 9d7878f144a..1d7ce4cc94a 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -640,6 +640,10 @@ mprivileged
 Target Var(rs6000_privileged) Init(0)
 Generate code that will run in privileged state.
 
+mxxspltidp
+Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
+Generate (do not generate) XXSPLTIDP instructions.
+
 -param=rs6000-density-pct-threshold=
 Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) IntegerRange(0, 100) Param
 When costing for loop vectorization, we probably need to penalize the loop body
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index bf033e31c1c..7b2d2551c7b 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1192,17 +1192,17 @@
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
-;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
+;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)  XXLSPLTIDP
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        r,         we,        ?wQ,
                 ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
-                ?wa,       v,         <??r>,     wZ,        v")
+                ?wa,       v,         <??r>,     wZ,        v,         wa")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        we,        r,         r,
                 wQ,        Y,         r,         r,         wE,        jwM,
-                ?jwM,      W,         <nW>,      v,         wZ"))]
+                ?jwM,      W,         <nW>,      v,         wZ,        eD"))]
 
   "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
    && (register_operand (operands[0], <MODE>mode) 
@@ -1213,37 +1213,37 @@
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
                 store,     load,      store,     *,         vecsimple, vecsimple,
-                vecsimple, *,         *,         vecstore,  vecload")
+                vecsimple, *,         *,         vecstore,  vecload,   vecperm")
    (set_attr "num_insns"
                "*,         *,         *,         2,         *,         2,
                 2,         2,         2,         2,         *,         *,
-                *,         5,         2,         *,         *")
+                *,         5,         2,         *,         *,         *")
    (set_attr "max_prefixed_insns"
                "*,         *,         *,         *,         *,         2,
                 2,         2,         2,         2,         *,         *,
-                *,         *,         *,         *,         *")
+                *,         *,         *,         *,         *,         *")
    (set_attr "length"
                "*,         *,         *,         8,         *,         8,
                 8,         8,         8,         8,         *,         *,
-                *,         20,        8,         *,         *")
+                *,         20,        8,         *,         *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
                 *,         *,         *,         *,         p9v,       *,
-                <VSisa>,   *,         *,         *,         *")])
+                <VSisa>,   *,         *,         *,         *,         p10")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
 ;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
-;;              LVX (VMX)  STVX (VMX)
+;;              LVX (VMX)  STVX (VMX) XXSPLTID   LXVKQ
 (define_insn "*vsx_mov<mode>_32bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
                 wa,        v,         ?wa,       v,         <??r>,
-                wZ,        v")
+                wZ,        v,         wa")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        Y,         r,         r,
                 wE,        jwM,       ?jwM,      W,         <nW>,
-                v,         wZ"))]
+                v,         wZ,        eD"))]
 
   "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
    && (register_operand (operands[0], <MODE>mode) 
@@ -1254,15 +1254,15 @@
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, load,      store,    *,
                 vecsimple, vecsimple, vecsimple, *,         *,
-                vecstore,  vecload")
+                vecstore,  vecload,   vecperm")
    (set_attr "length"
                "*,         *,         *,         16,        16,        16,
                 *,         *,         *,         20,        16,
-                *,         *")
+                *,         *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
                 p9v,       *,         <VSisa>,   *,         *,
-                *,         *")])
+                *,         *,         p10")])
 
 ;; Explicit  load/store expanders for the builtin functions
 (define_expand "vsx_load_<mode>"
@@ -6458,6 +6458,36 @@
   [(set_attr "type" "vecperm")
    (set_attr "prefixed" "yes")])
 
+(define_mode_iterator XXSPLTIDP [DI SF DF V16QI V8HI V4SI V4SF V2DF V2DI])
+
+(define_insn "*xxspltidp_<mode>_internal"
+  [(set (match_operand:XXSPLTIDP 0 "register_operand" "=wa")
+	(unspec:XXSPLTIDP [(match_operand:SI 1 "c32bit_cint_operand" "n")]
+			  UNSPEC_XXSPLTIDP))]
+  "TARGET_POWER10"
+  "xxspltidp %x0,%1"
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
+;; Generate the XXSPLTIDP instruction to support SFmode, DFmode, and DImode
+;; scalar constants and vector constants that look like DFmode floating point
+;; values where both elements are the same.  The constant has to be expressible
+;; as a SFmode constant that is not a SFmode denormal value.
+(define_split
+  [(set (match_operand:XXSPLTIDP 0 "vsx_register_operand")
+	(match_operand:XXSPLTIDP 1 "easy_vector_constant_64bit_element"))]
+  "TARGET_POWER10"
+  [(set (match_dup 0)
+	(unspec:XXSPLTIDP [(match_dup 2)] UNSPEC_XXSPLTIDP))]
+{
+  rs6000_vec_const vec_const;
+  if (!vec_const_to_bytes (operands[1], <MODE>mode, &vec_const)
+      || !vec_const_use_xxspltidp (&vec_const))
+    gcc_unreachable ();
+
+  operands[2] = GEN_INT (vec_const.xxspltidp_immediate);
+})
+
 ;; XXSPLTI32DX built-in function support
 (define_expand "xxsplti32dx_v4si"
   [(set (match_operand:V4SI 0 "register_operand" "=wa")
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 41f1850bf6e..b9dfcaf0d44 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3333,6 +3333,9 @@ The integer constant zero.
 A constant whose negation is a signed 16-bit constant.
 @end ifset
 
+@item eD
+A constant that can be loaded with the XXSPLTIDP instruction.
+
 @item eI
 A signed 34-bit integer constant if prefixed instructions are supported.
 
diff --git a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
index bd1502bb30a..dcb30e1d886 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
@@ -24,11 +24,12 @@ vector signed long long splats4(void)
         return (vector signed long long) vec_sl(mzero, mzero);
 }
 
-/* Codegen will consist of splat and shift instructions for most types.
-   If folding is enabled, the vec_sl tests using vector long long type will
-   generate a lvx instead of a vspltisw+vsld pair.  */
+/* Codegen will consist of splat and shift instructions for most types.  If
+   folding is enabled, the vec_sl tests using vector long long type will
+   generate a lvx instead of a vspltisw+vsld pair.  On power10, it will
+   generate a xxspltidp instruction instead of the lvx.  */
 
 /* { dg-final { scan-assembler-times {\mvspltis[bhw]\M} 0 } } */
 /* { dg-final { scan-assembler-times {\mvsl[bhwd]\M} 0 } } */
-/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M|\mxxspltidp\M} 2 } } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
new file mode 100644
index 00000000000..8f6e176f9af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+double
+scalar_double_0 (void)
+{
+  return 0.0;			/* XXSPLTIB or XXLXOR.  */
+}
+
+double
+scalar_double_1 (void)
+{
+  return 1.0;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+double
+scalar_double_m0 (void)
+{
+  return -0.0;			/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_nan (void)
+{
+  return __builtin_nan ("");	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_inf (void)
+{
+  return __builtin_inf ();	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inf ();
+}
+#endif
+
+double
+scalar_double_pi (void)
+{
+  return M_PI;			/* PLFD.  */
+}
+
+double
+scalar_double_denorm (void)
+{
+  return 0x1p-149f;		/* PLFD.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c
new file mode 100644
index 00000000000..75714d0b11d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c
@@ -0,0 +1,70 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test generating DImode constants that have the same bit pattern as DFmode
+   constants that can be loaded with the XXSPLTIDP instruction with the ISA 3.1
+   (power10).  We use asm to force the value into vector registers.  */
+
+double
+scalar_0 (void)
+{
+  /* XXSPLTIB or XXLXOR.  */
+  double d;
+  long long ll = 0;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+double
+scalar_1 (void)
+{
+  /* VSPLTISW/VUPKLSW or XXSPLTIB/VEXTSB2D.  */
+  double d;
+  long long ll = 1;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated
+   with XXSPLTIDP.  */
+double
+scalar_float_neg_0 (void)
+{
+  /* XXSPLTIDP.  */
+  double d;
+  long long ll = 0x8000000000000000LL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with
+   XXSPLTIDP.  */
+double
+scalar_float_1_0 (void)
+{
+  /* XXSPLTIDP.  */
+  double d;
+  long long ll = 0x3ff0000000000000LL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated
+   with XXSPLTIDP.  */
+double
+scalar_pi (void)
+{
+  /* PLXV.  */
+  double d;
+  long long ll = 0x400921fb54442d18LL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
new file mode 100644
index 00000000000..72504bdfbbd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating SFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+float
+scalar_float_0 (void)
+{
+  return 0.0f;			/* XXSPLTIB or XXLXOR.  */
+}
+
+float
+scalar_float_1 (void)
+{
+  return 1.0f;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+float
+scalar_float_m0 (void)
+{
+  return -0.0f;			/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_nan (void)
+{
+  return __builtin_nanf ("");	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_inf (void)
+{
+  return __builtin_inff ();	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inff ();
+}
+#endif
+
+float
+scalar_float_pi (void)
+{
+  return (float)M_PI;		/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_denorm (void)
+{
+  return 0x1p-149f;		/* PLFS.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
new file mode 100644
index 00000000000..82ffc86f8aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
@@ -0,0 +1,64 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating V2DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+vector double
+v2df_double_0 (void)
+{
+  return (vector double) { 0.0, 0.0 };			/* XXSPLTIB or XXLXOR.  */
+}
+
+vector double
+v2df_double_1 (void)
+{
+  return (vector double) { 1.0, 1.0 };			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+vector double
+v2df_double_m0 (void)
+{
+  return (vector double) { -0.0, -0.0 };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_nan (void)
+{
+  return (vector double) { __builtin_nan (""),
+			   __builtin_nan ("") };	/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_inf (void)
+{
+  return (vector double) { __builtin_inf (),
+			   __builtin_inf () };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_m_inf (void)
+{
+  return (vector double) { - __builtin_inf (),
+			   - __builtin_inf () };	/* XXSPLTIDP.  */
+}
+#endif
+
+vector double
+v2df_double_pi (void)
+{
+  return (vector double) { M_PI, M_PI };		/* PLVX.  */
+}
+
+vector double
+v2df_double_denorm (void)
+{
+  return (vector double) { (double)0x1p-149f,
+			   (double)0x1p-149f };		/* PLVX.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
new file mode 100644
index 00000000000..4d44f943d26
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test generating V2DImode constants that have the same bit pattern as
+   V2DFmode constants that can be loaded with the XXSPLTIDP instruction with
+   the ISA 3.1 (power10).  */
+
+vector long long
+vector_0 (void)
+{
+  /* XXSPLTIB or XXLXOR.  */
+  return (vector long long) { 0LL, 0LL };
+}
+
+vector long long
+vector_1 (void)
+{
+  /* XXSPLTIB and VEXTSB2D.  */
+  return (vector long long) { 1LL, 1LL };
+}
+
+/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated
+   with XXSPLTISDP.  */
+vector long long
+vector_float_neg_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x8000000000000000LL, 0x8000000000000000LL };
+}
+
+/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with
+   XXSPLTISDP.  */
+vector long long
+vector_float_1_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x3ff0000000000000LL, 0x3ff0000000000000LL };
+}
+
+/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated
+   with XXSPLTIDP.  */
+vector long long
+scalar_pi (void)
+{
+  /* PLXV.  */
+  return (vector long long) { 0x400921fb54442d18LL, 0x400921fb54442d18LL };
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */


^ permalink raw reply	[flat|nested] 11+ messages in thread

* [gcc(refs/users/meissner/heads/work071)] Generate XXSPLTIDP on power10.
@ 2021-10-12 22:05 Michael Meissner
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Meissner @ 2021-10-12 22:05 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:cae098b61f80cdaa5ca91981205277021ab70871

commit cae098b61f80cdaa5ca91981205277021ab70871
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Tue Oct 12 18:00:01 2021 -0400

    Generate XXSPLTIDP on power10.
    
    This patch implements XXSPLTIDP support for SF, DF, and DI scalar constants and
    V2DF and V2DI vector constants.  The XXSPLTIDP instruction is given a 32-bit
    immediate that is converted to a vector of two DFmode constants.  The immediate
    is in SFmode format, so only constants that fit as SFmode values can be loaded
    with XXSPLTIDP.
    
    I added a new constraint (eD) to match scalar and vector constants that can be
    loaded with the XXSPLTIDP instruction.
    
    I have added a temporary switch (-mxxspltidp) to control whether or not the
    XXSPLTIDP instruction is generated.
    
    I added 5 new tests to test loading up SF/DF/DI scalar and V2DI/V2DF vector
    constants.
    
    This patch updates the previous patch to take into account the comments
    from the patch review.   The main change is that this patch does is to
    look at vector constants in general to see if the bits of the vector map
    into values that can be loaded with XXSPLTIDP.
    
    2021-10-12  Michael Meissner  <meissner@the-meissners.org>
    
    gcc/
    
            * config/rs6000/constraints.md (eD): New constraint.
            * config/rs6000/predicates.md (easy_fp_constant): Add support for
            generating XXSPLTIDP.
            (easy_vector_constant_64bit_element): New predicate.
            (easy_vector_constant): Add support for generating XXSPLTIDP.
            * config/rs6000/rs6000-protos.h (xxspltidp_constant_immediate):
            New declaration.
            (convert_vector_constant_to_bytes): Likewise.
            (convert_scalar_64bit_constant_to_bytes): Likewise.
            (prefixed_xxsplti_p): Likewise.
            * config/rs6000/rs6000.c (convert_vector_constant_to_bytes): New
            helper function.
            (convert_scalar_64bit_constant_to_bytes): Likewise.
            (xxspltidp_constant_immediate): Likewise.
            (output_vec_const_move): Add support for XXSPLTIDP.
            (prefixed_xxsplti_p): New function.
            * config/rs6000/rs6000.md (prefixed attribute): Add support for
            insns that generate XXSPLTIDP.
            (movsf_hardfloat): Add support for XXSPLTIDP.
            (mov<mode>_hardfloat32, FMOVE64 iterator): Likewise.
            (mov<mode>_hardfloat64, FMOVE64 iterator): Likewise.
            (movdi_internal32): Likewise.
            (movdi_internal64): Likewise.
            * config/rs6000/rs6000.opt (-mxxspltidp): New debug option.
            * config/rs6000/vsx.md (XXSPLTIDP): New mode iterator.
            (xxspltidp_<mode>_internal): New insn.
            (XXSPLTIDP splitters): New splitters for XXSPLTIDP.
            * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
            eD constraint.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/pr86731-fwrapv-longlong.c: Update insn
            regex for power10.
            * gcc.target/powerpc/vec-splat-constant-df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-di.c: New test.
            * gcc.target/powerpc/vec-splat-constant-sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2di.c: New test.

Diff:
---
 gcc/config/rs6000/constraints.md                   |   5 +
 gcc/config/rs6000/predicates.md                    | 112 +++++++++
 gcc/config/rs6000/rs6000-protos.h                  |   6 +
 gcc/config/rs6000/rs6000.c                         | 254 +++++++++++++++++++++
 gcc/config/rs6000/rs6000.md                        |  58 +++--
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           |  26 +++
 gcc/doc/md.texi                                    |   3 +
 .../gcc.target/powerpc/pr86731-fwrapv-longlong.c   |   9 +-
 .../gcc.target/powerpc/vec-splat-constant-df.c     |  60 +++++
 .../gcc.target/powerpc/vec-splat-constant-di.c     |  70 ++++++
 .../gcc.target/powerpc/vec-splat-constant-sf.c     |  60 +++++
 .../gcc.target/powerpc/vec-splat-constant-v2df.c   |  64 ++++++
 .../gcc.target/powerpc/vec-splat-constant-v2di.c   |  50 ++++
 14 files changed, 759 insertions(+), 22 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index c8cff1a3038..d26c8940104 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -208,6 +208,11 @@
   (and (match_code "const_int")
        (match_test "((- (unsigned HOST_WIDE_INT) ival) + 0x8000) < 0x10000")))
 
+;; A scalar or vector constant that can be loaded with the XXSPLTIDP instruction.
+(define_constraint "eD"
+  "A constant that can be loaded with the XXSPLTIDP instruction."
+  (match_operand 0 "easy_vector_constant_64bit_element"))
+
 ;; 34-bit signed integer constant
 (define_constraint "eI"
   "A signed 34-bit integer constant if prefixed instructions are supported."
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 956e42bc514..20da6faf6e7 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -601,6 +601,10 @@
   if (TARGET_VSX && op == CONST0_RTX (mode))
     return 1;
 
+  /* See if the constant can be generated with the XXSPLTIDP instruction.  */
+  if (easy_vector_constant_64bit_element (op, mode))
+    return 1;
+
   /* Otherwise consider floating point constants hard, so that the
      constant gets pushed to memory during the early RTL phases.  This
      has the advantage that double precision constants that can be
@@ -609,6 +613,111 @@
    return 0;
 })
 
+;; Return 1 if the operand is a 64-bit vector constant that can be loaded via
+;; the XXSPLTIDP instruction, which takes a SFmode value and produces a
+;; V2DFmode or V2DI result.
+
+(define_predicate "easy_vector_constant_64bit_element"
+  (match_code "const_vector,vec_duplicate,const_int,const_double")
+{
+  unsigned char vector_bytes[16];
+
+  /* Can we do the XXSPLTIDP instruction?  */
+  if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  /* We use DImode for integer constants and DFmode for floating point
+     constants, since SFmode scalars are stored as DFmode in the PowerPC.  */
+  if (CONST_INT_P (op) || CONST_DOUBLE_P (op))
+    {
+      if (!convert_scalar_64bit_constant_to_bytes (op, vector_bytes,
+						   sizeof (vector_bytes)))
+	return false;
+
+      /* Change mode to value in the vector register.  */
+      mode = (CONST_INT_P (op)) ? E_DImode : E_DFmode;
+    }
+
+  /* For vector constants, get the whole vector.  */
+  else if (!convert_vector_constant_to_bytes (op, mode, vector_bytes,
+					      sizeof (vector_bytes)))
+    return false;
+
+  /* The vector_bytes array has the 8 bytes of the upper and lower constants in
+     big endian order. Convert these into 64-bit constants.  */
+  unsigned HOST_WIDE_INT df_upper = 0, df_lower = 0;
+  for (int i = 0; i < 8; i++)
+    {
+      df_upper = (df_upper << 8) | vector_bytes[i];
+      df_lower = (df_lower << 8) | vector_bytes[i+8];
+    }
+
+  /* Make sure that the two 64-bit segments are the same.  */
+  if (df_upper != df_lower)
+    return false;
+  
+  /* Avoid values that are easy to create with other instructions (0.0 for
+     floating point, and values that can be loaded with XXSPLTIB and sign
+     extension for integer.  */
+  if (op == CONST0_RTX (mode))
+    return false;
+
+  if (INTEGRAL_MODE_P (mode) && IN_RANGE (df_upper, -128, 127))
+    return false;
+
+  /* Avoid values that look like DFmode NaN's.  The IEEE 754 64-bit floating
+     format has 1 bit for sign, 11 bits for the exponent, and 52 bits for the
+     mantissa.  NaN values have the exponent set to all 1 bits, and the
+     mantissa non-zero (mantissa == 0 is infinity).  */
+
+  int df_exponent = (df_upper >> 52) & 0x7ff;
+  HOST_WIDE_INT df_mantissa = df_upper & HOST_WIDE_INT_C (0x1fffffffffffff);
+
+  if (df_exponent == 0x7ff && df_mantissa != 0)		/* NaN.  */
+    return false;
+
+  /* Avoid values that are DFmode subnormal values.  Subnormal numbers have
+     the exponent all 0 bits, and the mantissa non-zero.  If the value is
+     subnormal, then the hidden bit in the mantissa is not set.  */
+  if (df_exponent == 0 && df_mantissa != 0)		/* subnormal.  */
+    return false;
+
+  /* Change the representation to DFmode constant.  */
+  long df_words[2];
+  df_words[0] = (df_upper >> 32) & 0xffffffff;
+  df_words[1] = df_upper & 0xffffffff;
+
+  /* real_from_target takes the target words in  target order.  */
+  if (!BYTES_BIG_ENDIAN)
+    std::swap (df_words[0], df_words[1]);
+
+  REAL_VALUE_TYPE rv_type;
+  real_from_target (&rv_type, df_words, DFmode);
+
+  const REAL_VALUE_TYPE *rv = &rv_type;
+
+  /* Validate that the number can be stored as a SFmode value.  */
+  if (!exact_real_truncate (SFmode, rv))
+    return false;
+
+  /* Validate that the number is not a SFmode subnormal value (exponent is 0,
+     mantissa field is non-zero) which is undefined for the XXSPLTIDP
+     instruction.  */
+  long sf_value;
+  real_to_target (&sf_value, rv, SFmode);
+
+  /* IEEE 754 32-bit values have 1 bit for the sign, 8 bits for the exponent,
+     and 23 bits for the mantissa.  Subnormal numbers have the exponent all
+     0 bits, and the mantissa non-zero.  */
+  long sf_exponent = (sf_value >> 23) & 0xFF;
+  long sf_mantissa = sf_value & 0x7FFFFF;
+
+  if (sf_exponent == 0 && sf_mantissa != 0)
+    return false;
+
+  return true;
+})
+
 ;; Return 1 if the operand is a constant that can loaded with a XXSPLTIB
 ;; instruction and then a VUPKHSB, VECSB2W or VECSB2D instruction.
 
@@ -657,6 +766,9 @@
           && xxspltib_constant_p (op, mode, &num_insns, &value))
 	return true;
 
+      if (easy_vector_constant_64bit_element (op, mode))
+	return true;
+
       return easy_altivec_constant (op, mode);
     }
 
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 14f6b313105..574b3d5e17e 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -32,10 +32,15 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, int, int, int,
 
 extern int easy_altivec_constant (rtx, machine_mode);
 extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
+extern long xxspltidp_constant_immediate (rtx, machine_mode);
 extern int vspltis_shifted (rtx);
 extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
 extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
 extern int num_insns_constant (rtx, machine_mode);
+extern bool convert_vector_constant_to_bytes (rtx, machine_mode,
+					      unsigned char [], size_t);
+extern bool convert_scalar_64bit_constant_to_bytes (rtx, unsigned char [],
+						    size_t);
 extern int small_data_operand (rtx, machine_mode);
 extern bool mem_operand_gpr (rtx, machine_mode);
 extern bool mem_operand_ds_form (rtx, machine_mode);
@@ -198,6 +203,7 @@ enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode);
 extern bool prefixed_load_p (rtx_insn *);
 extern bool prefixed_store_p (rtx_insn *);
 extern bool prefixed_paddi_p (rtx_insn *);
+extern bool prefixed_xxsplti_p (rtx_insn *);
 extern void rs6000_asm_output_opcode (FILE *);
 extern void output_pcrel_opt_reloc (rtx);
 extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index acba4d9f26c..a433c7c9386 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6502,6 +6502,179 @@ num_insns_constant (rtx op, machine_mode mode)
   return num_insns_constant_multi (val, mode);
 }
 
+/* Convert a constant OP which has mode MODE into into BYTES (which is
+   NUM_BYTES long).  Return false if the we cannot convert the constant to a
+   series of bytes.  This code supports normal constants and vector constants.
+   In addition to CONST_VECTOR, we also support constant vectors formed with
+   VEC_DUPLICATE.  Return false if we don't recognize the constant.  If OP is a
+   scalar, it is assumed to be a vector element.
+
+   We return the bytes in big endian order, i.e. for 128-bit vectors, byte 0 is
+   the most significant byte, and byte 15 is the least significant byte.  */
+
+bool
+convert_vector_constant_to_bytes (rtx op,
+				  machine_mode mode,
+				  unsigned char bytes[],
+				  size_t num_bytes)
+{
+  /* If we don't know the size of the constant, punt.  */
+  if (mode == VOIDmode)
+    return false;
+
+  /* Is the buffer too small?  Punt.  */
+  if (num_bytes < GET_MODE_SIZE (mode))
+    return false;
+
+  switch (GET_CODE (op))
+    {
+      /* Integer constants.  */
+    case CONST_INT:
+      {
+	unsigned bitsize = GET_MODE_BITSIZE (mode);
+	unsigned HOST_WIDE_INT uvalue = UINTVAL (op);
+	size_t byte_num = 0;
+
+	for (int shift = bitsize - 8; shift >= 0; shift -= 8)
+	  bytes[byte_num++] = (uvalue >> shift) & 0xff;
+
+	break;
+      }
+
+      /* Floating point constants.  */
+    case CONST_DOUBLE:
+      {
+	unsigned bitsize = GET_MODE_BITSIZE (mode);
+	unsigned num_words = bitsize / 32;
+	const REAL_VALUE_TYPE *rtype = CONST_DOUBLE_REAL_VALUE (op);
+	size_t byte_num = 0;
+	long real_words[4];
+
+	/* Make sure we don't overflow the real_words array and that it is
+	   filled completely.  */
+	if (bitsize > 128 || (bitsize % 32) != 0)
+	  return false;
+
+	real_to_target (real_words, rtype, mode);
+
+	/* Iterate over each 32-bit word in the floating point constant.  The
+	   real_to_target function puts out words in endian fashion.  We need
+	   to arrange so the bytes are written in big endian order.  */
+	for (unsigned num = 0; num < num_words; num++)
+	  {
+	    unsigned endian_num = (BYTES_BIG_ENDIAN
+				   ? num
+				   : num_words - 1 - num);
+
+	    unsigned uvalue = real_words[endian_num];
+	    for (int shift = 32 - 8; shift >= 0; shift -= 8)
+	      bytes[byte_num++] = (uvalue >> shift) & 0xff;
+	  }
+
+	break;
+      }
+
+      /* Vector constants, iterate each element.  On little endian systems, we
+	 have to reverse the element numbers.  */
+    case CONST_VECTOR:
+      {
+	machine_mode ele_mode = GET_MODE_INNER (mode);
+	size_t nunits = GET_MODE_NUNITS (mode);
+	size_t size = GET_MODE_SIZE (ele_mode);
+
+	for (size_t num = 0; num < nunits; num++)
+	  {
+	    rtx ele = CONST_VECTOR_ELT (op, num);
+	    size_t byte_num = (BYTES_BIG_ENDIAN
+			       ? num
+			       : nunits - 1 - num) * size;
+
+	    if (!convert_vector_constant_to_bytes (ele, ele_mode,
+						   &bytes[byte_num], size))
+	      return false;
+	  }
+
+	break;
+      }
+
+      /* Vector constants, formed with VEC_DUPLICATE of a constant.  */
+    case VEC_DUPLICATE:
+      {
+	machine_mode ele_mode = GET_MODE_INNER (mode);
+	size_t nunits = GET_MODE_NUNITS (mode);
+	size_t size = GET_MODE_SIZE (ele_mode);
+	rtx ele = XEXP (op, 0);
+	size_t byte_num = 0;
+
+	for (size_t num = 0; num < nunits; num++)
+	  {
+	    if (!convert_vector_constant_to_bytes (ele, ele_mode,
+						   &bytes[byte_num], size))
+	      return false;
+
+	    byte_num += size;
+	  }
+
+	break;
+      }
+
+      /* Any thing else, just return failure.  */
+    default:
+      return false;
+    }
+
+  return true;
+}
+
+/* Convert a CONST_INT or CONST_DOUBLE OP which has mode MODE into into BYTES
+   (which is NUM_BYTES long).  Return false if the we cannot convert the
+   constant to a series of bytes.  This function used for the XXSPLTIDP and
+   XXSPLTI32DX instructions that load up a vector register with a value into
+   the upper 64-bits of the vector register and then is splatted to the lower
+   64-bits.
+
+   We return the bytes in big endian order, i.e. for 128-bit vectors, byte 0 is
+   the most significant byte, and byte 15 is the least significant byte.  */
+
+bool
+convert_scalar_64bit_constant_to_bytes (rtx op,
+					unsigned char bytes[],
+					size_t num_bytes)
+{
+  machine_mode mode;
+
+  /* We use DImode for integer constants and DFmode for floating point
+     constants, since SFmode scalars are stored as DFmode in the PowerPC.  */
+  if (CONST_INT_P (op))
+    mode = DImode;
+
+  else if (CONST_DOUBLE_P (op))
+    {
+      if (GET_MODE (op) != SFmode && GET_MODE (op) != DFmode)
+	return false;
+
+      mode = DFmode;
+    }
+
+  else
+    return false;
+
+  /* Verify that the buffer is either scalar sized (64-bits) or vector
+     sized.  */
+  if (num_bytes != 8 && num_bytes != 16)
+    return false;
+
+  if (!convert_vector_constant_to_bytes (op, mode, bytes, 8))
+    return false;
+
+  /* If the caller wanted the bytes in a vector size, duplicate the bytes to
+     mimic the behavior of the XXSPLTIDP and XXSPLTI32DX instructions.  */
+  if (num_bytes == 16)
+    memcpy (&bytes[8], &bytes[0], 8);
+
+  return true;
+}
+
 /* Interpret element ELT of the CONST_VECTOR OP as an integer value.
    If the mode of OP is MODE_VECTOR_INT, this simply returns the
    corresponding element of the vector, but for V4SFmode, the
@@ -6946,6 +7119,58 @@ xxspltib_constant_p (rtx op,
   return true;
 }
 
+/* Return the 32-bit immediate value that is used for the XXSPLTIDP instruction
+   to load a DFmode value that is splatted into a 128-bit vector.  */
+
+long
+xxspltidp_constant_immediate (rtx op, machine_mode mode)
+{
+  long ret;
+  unsigned char vector_bytes[16];
+
+  gcc_assert (easy_vector_constant_64bit_element (op, mode));
+
+  /* We use DImode for integer constants and DFmode for floating point
+     constants, since SFmode scalars are stored as DFmode in the PowerPC.  */
+  if (CONST_INT_P (op) || CONST_DOUBLE_P (op))
+    {
+      if (!convert_scalar_64bit_constant_to_bytes (op, vector_bytes,
+						   sizeof (vector_bytes)))
+	gcc_unreachable ();
+
+      /* Change mode to value in the vector register.  */
+      mode = CONST_INT_P (op) ? E_DImode : E_DFmode;
+    }
+
+  /* For vector constants, get the whole vector.  */
+  else if (!convert_vector_constant_to_bytes (op, mode, vector_bytes,
+					      sizeof (vector_bytes)))
+    gcc_unreachable ();
+
+  /* The vector_bytes array has the 8 bytes of the upper and lower constants in
+     big endian order. Convert the upper 8 bytes into a 64-bit constant.  The
+     64-bit constant is represented by a pair of 32-bit constants.  Then
+     convert it to a DFmode constant.  The real value support functions take
+     things in target endian order, so we will need to swap things on little
+     endian.  */
+  long df_upper = 0, df_lower = 0;
+  for (int i = 0; i < 4; i++)
+    {
+      df_upper = (df_upper << 8) | vector_bytes[i];
+      df_lower = (df_lower << 8) | vector_bytes[i+4];
+    }
+
+  if (!BYTES_BIG_ENDIAN)
+    std::swap (df_upper, df_lower);
+
+  long df_words[2] = { df_upper, df_lower };
+  REAL_VALUE_TYPE r;
+  real_from_target (&r, &df_words[0], DFmode);
+  real_to_target (&ret, &r, SFmode);
+
+  return ret;
+}
+
 const char *
 output_vec_const_move (rtx *operands)
 {
@@ -6990,6 +7215,12 @@ output_vec_const_move (rtx *operands)
 	    gcc_unreachable ();
 	}
 
+      if (easy_vector_constant_64bit_element (vec, mode))
+	{
+	  operands[2] = GEN_INT (xxspltidp_constant_immediate (vec, mode));
+	  return "xxspltidp %x0,%2";
+	}
+
       if (TARGET_P9_VECTOR
 	  && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
 	{
@@ -26724,6 +26955,29 @@ prefixed_paddi_p (rtx_insn *insn)
   return (iform == INSN_FORM_PCREL_EXTERNAL || iform == INSN_FORM_PCREL_LOCAL);
 }
 
+/* Whether a permute type instruction is a prefixed XXSPLTI* instruction.
+   This is called from the prefixed attribute processing.  */
+
+bool
+prefixed_xxsplti_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  if (!set)
+    return false;
+
+  rtx dest = SET_DEST (set);
+  rtx src = SET_SRC (set);
+  machine_mode mode = GET_MODE (dest);
+
+  if (!REG_P (dest) && !SUBREG_P (dest))
+    return false;
+
+  if (easy_vector_constant_64bit_element (src, mode))
+    return true;
+
+  return false;
+}
+
 /* Whether the next instruction needs a 'p' prefix issued before the
    instruction is printed out.  */
 static bool prepend_p_to_next_insn;
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 6bec2bddbde..9ac5b8df173 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -314,6 +314,11 @@
 
 	 (eq_attr "type" "integer,add")
 	 (if_then_else (match_test "prefixed_paddi_p (insn)")
+		       (const_string "yes")
+		       (const_string "no"))
+
+	 (eq_attr "type" "vecperm")
+	 (if_then_else (match_test "prefixed_xxsplti_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))]
 
@@ -7759,17 +7764,17 @@
 ;;
 ;;	LWZ          LFS        LXSSP       LXSSPX     STFS       STXSSP
 ;;	STXSSPX      STW        XXLXOR      LI         FMR        XSCPSGNDP
-;;	MR           MT<x>      MF<x>       NOP
+;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP
 
 (define_insn "movsf_hardfloat"
   [(set (match_operand:SF 0 "nonimmediate_operand"
 	 "=!r,       f,         v,          wa,        m,         wY,
 	  Z,         m,         wa,         !r,        f,         wa,
-	  !r,        *c*l,      !r,         *h")
+	  !r,        *c*l,      !r,         *h,        wa")
 	(match_operand:SF 1 "input_operand"
 	 "m,         m,         wY,         Z,         f,         v,
 	  wa,        r,         j,          j,         f,         wa,
-	  r,         r,         *h,         0"))]
+	  r,         r,         *h,         0,         eD"))]
   "(register_operand (operands[0], SFmode)
    || register_operand (operands[1], SFmode))
    && TARGET_HARD_FLOAT
@@ -7791,15 +7796,16 @@
    mr %0,%1
    mt%0 %1
    mf%1 %0
-   nop"
+   nop
+   #"
   [(set_attr "type"
 	"load,       fpload,    fpload,     fpload,    fpstore,   fpstore,
 	 fpstore,    store,     veclogical, integer,   fpsimple,  fpsimple,
-	 *,          mtjmpr,    mfjmpr,     *")
+	 *,          mtjmpr,    mfjmpr,     *,         vecperm")
    (set_attr "isa"
 	"*,          *,         p9v,        p8v,       *,         p9v,
 	 p8v,        *,         *,          *,         *,         *,
-	 *,          *,         *,          *")])
+	 *,          *,         *,          *,         p10")])
 
 ;;	LWZ          LFIWZX     STW        STFIWX     MTVSRWZ    MFVSRWZ
 ;;	FMR          MR         MT%0       MF%1       NOP
@@ -8059,18 +8065,18 @@
 
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSD         STXSD       XXLOR       XXLXOR      GPR<-0
-;;           LWZ          STW         MR
+;;           LWZ          STW         MR          XXSPLTIDP
 
 
 (define_insn "*mov<mode>_hardfloat32"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
             "=m,          d,          d,          <f64_p9>,   wY,
               <f64_av>,   Z,          <f64_vsx>,  <f64_vsx>,  !r,
-              Y,          r,          !r")
+              Y,          r,          !r,         wa")
 	(match_operand:FMOVE64 1 "input_operand"
              "d,          m,          d,          wY,         <f64_p9>,
               Z,          <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
-              r,          Y,          r"))]
+              r,          Y,          r,          eD"))]
   "! TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8087,20 +8093,21 @@
    #
    #
    #
+   #
    #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, two,
-             store,       load,       two")
+             store,       load,       two,        vecperm")
    (set_attr "size" "64")
    (set_attr "length"
             "*,           *,          *,          *,          *,
              *,           *,          *,          *,          8,
-             8,           8,          8")
+             8,           8,          8,          *")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
-             *,           *,          *")])
+             *,           *,          *,          p10")])
 
 ;;           STW      LWZ     MR      G-const H-const F-const
 
@@ -8127,19 +8134,19 @@
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSDX        STXSDX      XXLOR       XXLXOR      LI 0
 ;;           STD          LD          MR          MT{CTR,LR}  MF{CTR,LR}
-;;           NOP          MFVSRD      MTVSRD
+;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP
 
 (define_insn "*mov<mode>_hardfloat64"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
            "=m,           d,          d,          <f64_p9>,   wY,
              <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
              YZ,          r,          !r,         *c*l,       !r,
-            *h,           r,          <f64_dm>")
+            *h,           r,          <f64_dm>,   wa")
 	(match_operand:FMOVE64 1 "input_operand"
             "d,           m,          d,          wY,         <f64_p9>,
              Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
              r,           YZ,         r,          r,          *h,
-             0,           <f64_dm>,   r"))]
+             0,           <f64_dm>,   r,          eD"))]
   "TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8161,18 +8168,19 @@
    mf%1 %0
    nop
    mfvsrd %0,%x1
-   mtvsrd %x0,%1"
+   mtvsrd %x0,%1
+   #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, integer,
              store,       load,       *,          mtjmpr,     mfjmpr,
-             *,           mfvsr,      mtvsr")
+             *,           mfvsr,      mtvsr,      vecperm")
    (set_attr "size" "64")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
              *,           *,          *,          *,          *,
-             *,           p8v,        p8v")])
+             *,           p8v,        p8v,        p10")])
 
 ;;           STD      LD       MR      MT<SPR> MF<SPR> G-const
 ;;           H-const  F-const  Special
@@ -9220,6 +9228,7 @@
 ;; a gpr into a fpr instead of reloading an invalid 'Y' address
 
 ;;        GPR store  GPR load   GPR move   FPR store  FPR load   FPR move
+;;	  XXSPLTIDP
 ;;        GPR const  AVX store  AVX store  AVX load   AVX load   VSX move
 ;;        P9 0       P9 -1      AVX 0/-1   VSX 0      VSX -1     P9 const
 ;;        AVX const  
@@ -9227,11 +9236,13 @@
 (define_insn "*movdi_internal32"
   [(set (match_operand:DI 0 "nonimmediate_operand"
          "=Y,        r,         r,         m,         ^d,        ^d,
+          ^wa,
           r,         wY,        Z,         ^v,        $v,        ^wa,
           wa,        wa,        v,         wa,        *i,        v,
           v")
 	(match_operand:DI 1 "input_operand"
          "r,         Y,         r,         ^d,        m,         ^d,
+          eD,
           IJKnF,     ^v,        $v,        wY,        Z,         ^wa,
           Oj,        wM,        OjwM,      Oj,        wM,        wS,
           wB"))]
@@ -9246,6 +9257,7 @@
    lfd%U1%X1 %0,%1
    fmr %0,%1
    #
+   #
    stxsd %1,%0
    stxsdx %x1,%y0
    lxsd %0,%1
@@ -9260,17 +9272,20 @@
    #"
   [(set_attr "type"
          "store,     load,      *,         fpstore,   fpload,    fpsimple,
+          vecperm,
           *,         fpstore,   fpstore,   fpload,    fpload,    veclogical,
           vecsimple, vecsimple, vecsimple, veclogical,veclogical,vecsimple,
           vecsimple")
    (set_attr "size" "64")
    (set_attr "length"
          "8,         8,         8,         *,         *,         *,
+          *,
           16,        *,         *,         *,         *,         *,
           *,         *,         *,         *,         *,         8,
           *")
    (set_attr "isa"
          "*,         *,         *,         *,         *,         *,
+          p10,
           *,         p9v,       p7v,       p9v,       p7v,       *,
           p9v,       p9v,       p7v,       *,         *,         p7v,
           p7v")])
@@ -9306,6 +9321,7 @@
 })
 
 ;;	   GPR store   GPR load    GPR move
+;;	   XXSPLTIDP
 ;;	   GPR li      GPR lis     GPR pli     GPR #
 ;;	   FPR store   FPR load    FPR move
 ;;	   AVX store   AVX store   AVX load    AVX load    VSX move
@@ -9316,6 +9332,7 @@
 (define_insn "*movdi_internal64"
   [(set (match_operand:DI 0 "nonimmediate_operand"
 	  "=YZ,        r,          r,
+	   ^wa,
 	   r,          r,          r,          r,
 	   m,          ^d,         ^d,
 	   wY,         Z,          $v,         $v,         ^wa,
@@ -9325,6 +9342,7 @@
 	   ?r,         ?wa")
 	(match_operand:DI 1 "input_operand"
 	  "r,          YZ,         r,
+	   eD,
 	   I,          L,          eI,         nF,
 	   ^d,         m,          ^d,
 	   ^v,         $v,         wY,         Z,          ^wa,
@@ -9339,6 +9357,7 @@
    std%U0%X0 %1,%0
    ld%U1%X1 %0,%1
    mr %0,%1
+   #
    li %0,%1
    lis %0,%v1
    li %0,%1
@@ -9365,6 +9384,7 @@
    mtvsrd %x0,%1"
   [(set_attr "type"
 	  "store,      load,       *,
+	   vecperm,
 	   *,          *,          *,          *,
 	   fpstore,    fpload,     fpsimple,
 	   fpstore,    fpstore,    fpload,     fpload,     veclogical,
@@ -9375,6 +9395,7 @@
    (set_attr "size" "64")
    (set_attr "length"
 	  "*,          *,          *,
+	   *,
 	   *,          *,          *,          20,
 	   *,          *,          *,
 	   *,          *,          *,          *,          *,
@@ -9384,6 +9405,7 @@
 	   *,          *")
    (set_attr "isa"
 	  "*,          *,          *,
+	   p10,
 	   *,          *,          p10,        *,
 	   *,          *,          *,
 	   p9v,        p7v,        p9v,        p7v,        *,
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 9d7878f144a..1d7ce4cc94a 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -640,6 +640,10 @@ mprivileged
 Target Var(rs6000_privileged) Init(0)
 Generate code that will run in privileged state.
 
+mxxspltidp
+Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
+Generate (do not generate) XXSPLTIDP instructions.
+
 -param=rs6000-density-pct-threshold=
 Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) IntegerRange(0, 100) Param
 When costing for loop vectorization, we probably need to penalize the loop body
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index bf033e31c1c..48ecb41801c 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -6458,6 +6458,32 @@
   [(set_attr "type" "vecperm")
    (set_attr "prefixed" "yes")])
 
+(define_mode_iterator XXSPLTIDP [DI SF DF V16QI V8HI V4SI V4SF V2DF V2DI])
+
+(define_insn "*xxspltidp_<mode>_internal"
+  [(set (match_operand:XXSPLTIDP 0 "register_operand" "=wa")
+	(unspec:XXSPLTIDP [(match_operand:SI 1 "c32bit_cint_operand" "n")]
+			  UNSPEC_XXSPLTIDP))]
+  "TARGET_POWER10"
+  "xxspltidp %x0,%1"
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
+;; Generate the XXSPLTIDP instruction to support SFmode, DFmode, and DImode
+;; scalar constants and vector constants that look like DFmode floating point
+;; values where both elements are the same.  The constant has to be expressible
+;; as a SFmode constant that is not a SFmode denormal value.
+(define_split
+  [(set (match_operand:XXSPLTIDP 0 "vsx_register_operand")
+	(match_operand:XXSPLTIDP 1 "easy_vector_constant_64bit_element"))]
+  "TARGET_POWER10"
+  [(set (match_dup 0)
+	(unspec:XXSPLTIDP [(match_dup 2)] UNSPEC_XXSPLTIDP))]
+{
+  long immediate = xxspltidp_constant_immediate (operands[1], <MODE>mode);
+  operands[2] = GEN_INT (immediate);
+  })
+
 ;; XXSPLTI32DX built-in function support
 (define_expand "xxsplti32dx_v4si"
   [(set (match_operand:V4SI 0 "register_operand" "=wa")
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 41f1850bf6e..b9dfcaf0d44 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3333,6 +3333,9 @@ The integer constant zero.
 A constant whose negation is a signed 16-bit constant.
 @end ifset
 
+@item eD
+A constant that can be loaded with the XXSPLTIDP instruction.
+
 @item eI
 A signed 34-bit integer constant if prefixed instructions are supported.
 
diff --git a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
index bd1502bb30a..dcb30e1d886 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
@@ -24,11 +24,12 @@ vector signed long long splats4(void)
         return (vector signed long long) vec_sl(mzero, mzero);
 }
 
-/* Codegen will consist of splat and shift instructions for most types.
-   If folding is enabled, the vec_sl tests using vector long long type will
-   generate a lvx instead of a vspltisw+vsld pair.  */
+/* Codegen will consist of splat and shift instructions for most types.  If
+   folding is enabled, the vec_sl tests using vector long long type will
+   generate a lvx instead of a vspltisw+vsld pair.  On power10, it will
+   generate a xxspltidp instruction instead of the lvx.  */
 
 /* { dg-final { scan-assembler-times {\mvspltis[bhw]\M} 0 } } */
 /* { dg-final { scan-assembler-times {\mvsl[bhwd]\M} 0 } } */
-/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M|\mxxspltidp\M} 2 } } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
new file mode 100644
index 00000000000..8f6e176f9af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+double
+scalar_double_0 (void)
+{
+  return 0.0;			/* XXSPLTIB or XXLXOR.  */
+}
+
+double
+scalar_double_1 (void)
+{
+  return 1.0;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+double
+scalar_double_m0 (void)
+{
+  return -0.0;			/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_nan (void)
+{
+  return __builtin_nan ("");	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_inf (void)
+{
+  return __builtin_inf ();	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inf ();
+}
+#endif
+
+double
+scalar_double_pi (void)
+{
+  return M_PI;			/* PLFD.  */
+}
+
+double
+scalar_double_denorm (void)
+{
+  return 0x1p-149f;		/* PLFD.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c
new file mode 100644
index 00000000000..75714d0b11d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c
@@ -0,0 +1,70 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test generating DImode constants that have the same bit pattern as DFmode
+   constants that can be loaded with the XXSPLTIDP instruction with the ISA 3.1
+   (power10).  We use asm to force the value into vector registers.  */
+
+double
+scalar_0 (void)
+{
+  /* XXSPLTIB or XXLXOR.  */
+  double d;
+  long long ll = 0;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+double
+scalar_1 (void)
+{
+  /* VSPLTISW/VUPKLSW or XXSPLTIB/VEXTSB2D.  */
+  double d;
+  long long ll = 1;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated
+   with XXSPLTIDP.  */
+double
+scalar_float_neg_0 (void)
+{
+  /* XXSPLTIDP.  */
+  double d;
+  long long ll = 0x8000000000000000LL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with
+   XXSPLTIDP.  */
+double
+scalar_float_1_0 (void)
+{
+  /* XXSPLTIDP.  */
+  double d;
+  long long ll = 0x3ff0000000000000LL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated
+   with XXSPLTIDP.  */
+double
+scalar_pi (void)
+{
+  /* PLXV.  */
+  double d;
+  long long ll = 0x400921fb54442d18LL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
new file mode 100644
index 00000000000..72504bdfbbd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating SFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+float
+scalar_float_0 (void)
+{
+  return 0.0f;			/* XXSPLTIB or XXLXOR.  */
+}
+
+float
+scalar_float_1 (void)
+{
+  return 1.0f;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+float
+scalar_float_m0 (void)
+{
+  return -0.0f;			/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_nan (void)
+{
+  return __builtin_nanf ("");	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_inf (void)
+{
+  return __builtin_inff ();	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inff ();
+}
+#endif
+
+float
+scalar_float_pi (void)
+{
+  return (float)M_PI;		/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_denorm (void)
+{
+  return 0x1p-149f;		/* PLFS.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
new file mode 100644
index 00000000000..82ffc86f8aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
@@ -0,0 +1,64 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating V2DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+vector double
+v2df_double_0 (void)
+{
+  return (vector double) { 0.0, 0.0 };			/* XXSPLTIB or XXLXOR.  */
+}
+
+vector double
+v2df_double_1 (void)
+{
+  return (vector double) { 1.0, 1.0 };			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+vector double
+v2df_double_m0 (void)
+{
+  return (vector double) { -0.0, -0.0 };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_nan (void)
+{
+  return (vector double) { __builtin_nan (""),
+			   __builtin_nan ("") };	/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_inf (void)
+{
+  return (vector double) { __builtin_inf (),
+			   __builtin_inf () };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_m_inf (void)
+{
+  return (vector double) { - __builtin_inf (),
+			   - __builtin_inf () };	/* XXSPLTIDP.  */
+}
+#endif
+
+vector double
+v2df_double_pi (void)
+{
+  return (vector double) { M_PI, M_PI };		/* PLVX.  */
+}
+
+vector double
+v2df_double_denorm (void)
+{
+  return (vector double) { (double)0x1p-149f,
+			   (double)0x1p-149f };		/* PLVX.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
new file mode 100644
index 00000000000..4d44f943d26
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test generating V2DImode constants that have the same bit pattern as
+   V2DFmode constants that can be loaded with the XXSPLTIDP instruction with
+   the ISA 3.1 (power10).  */
+
+vector long long
+vector_0 (void)
+{
+  /* XXSPLTIB or XXLXOR.  */
+  return (vector long long) { 0LL, 0LL };
+}
+
+vector long long
+vector_1 (void)
+{
+  /* XXSPLTIB and VEXTSB2D.  */
+  return (vector long long) { 1LL, 1LL };
+}
+
+/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated
+   with XXSPLTISDP.  */
+vector long long
+vector_float_neg_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x8000000000000000LL, 0x8000000000000000LL };
+}
+
+/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with
+   XXSPLTISDP.  */
+vector long long
+vector_float_1_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x3ff0000000000000LL, 0x3ff0000000000000LL };
+}
+
+/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated
+   with XXSPLTIDP.  */
+vector long long
+scalar_pi (void)
+{
+  /* PLXV.  */
+  return (vector long long) { 0x400921fb54442d18LL, 0x400921fb54442d18LL };
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2021-10-21  2:33 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-14 15:56 [gcc(refs/users/meissner/heads/work071)] Generate XXSPLTIDP on power10 Michael Meissner
  -- strict thread matches above, loose matches on Subject: below --
2021-10-21  2:33 Michael Meissner
2021-10-20 22:44 Michael Meissner
2021-10-20 21:46 Michael Meissner
2021-10-18 17:53 Michael Meissner
2021-10-18 17:02 Michael Meissner
2021-10-15  2:42 Michael Meissner
2021-10-14 16:49 Michael Meissner
2021-10-14 14:57 Michael Meissner
2021-10-14  3:24 Michael Meissner
2021-10-12 22:05 Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).