[gcc(refs/users/meissner/heads/work070)] Generate XXSPLTIDP on power10.

public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed

* [gcc(refs/users/meissner/heads/work070)] Generate XXSPLTIDP on power10.
@ 2021-10-05 21:36 Michael Meissner
  0 siblings, 0 replies; 5+ messages in thread
From: Michael Meissner @ 2021-10-05 21:36 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:13f477e7d140c47817fe8c9518ef22bed63a0359

commit 13f477e7d140c47817fe8c9518ef22bed63a0359
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Tue Oct 5 17:36:22 2021 -0400

    Generate XXSPLTIDP on power10.
    
    This patch implements XXSPLTIDP support for SF, DF, and DI scalar constants and
    V2DF and V2DI vector constants.  The XXSPLTIDP instruction is given a 32-bit
    immediate that is converted to a vector of two DFmode constants.  The immediate
    is in SFmode format, so only constants that fit as SFmode values can be loaded
    with XXSPLTIDP.
    
    I added two new constraints (eF and eV) to match scalar and vector constants
    that can be loaded with the XXSPLTIDP instruction.
    
    I have added a temporary switch (-mxxspltidp) to control whether or not the
    XXSPLTIDP instruction is generated.
    
    I added 5 new tests to test loading up SF/DF/DI scalar and V2DI/V2DF vector
    constants.
    
    2021-10-05  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/constraints.md (eF): New constraint.
            (eV): New constraint.
            * config/rs6000/predicates.md (easy_fp_constant): If we can load
            the scalar constant with XXSPLTIDP, the constant is easy.
            (easy_fp_constant_64bit_scalar): New predicate.
            (easy_vector_constant_64bit_element): New predicate.
            (easy_vector_constant): If we can generate XXSPLTIDP, mark the
            vector constant as easy.
            * config/rs6000/rs6000-protos.h (xxspltidp_constant_immediate): New
            declaration.
            (prefixed_xxsplti_p): Likewise.
            * config/rs6000/rs6000.c (xxspltidp_constant_immediate): New function.
            (output_vec_const_move): Add support for XXSPLTIDP.
            (prefixed_xxsplti_p): New function.
            * config/rs6000/rs6000.md (prefixed attribute): Add support for the
            xxsplti* prefixed instructions.
            (movsf_hardfloat): Add XXSPLTIDP support.
            (mov<mode>_hardfloat32, FMOVE64 iterator): Likewise.
            (mov<mode>_hardfloat64, FMOVE64 iterator): Likewise.
            (movdi_internal32): Likewise.
            (movdi_internal64): Likewise.
            * config/rs6000/rs6000.opt (-mxxspltidp): New switch.
            * config/rs6000/vsx.md (vsx_move<mode>_64bit): Add XXSPLTIDP
            support.
            (vsx_move<mode>_32bit): Likewise.
            (XXSPLTIDP_S): New mode iterator.
            (XXSPLTIDP_V): Likewise.
            (XXSPLTIDP): Likewise.
            (xxspltidp_<mode>_inst): Replace xxspltidp_v2df_inst with an
            iterated form that also does SFmode, DFmode, DImode, and
            V2DImode.
            (xxspltidp_<mode>_internal): New insn and splits.
            * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
            eF and eV constraints.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/pr86731-fwrapv-longlong.c: Update insn
            regex for power10.
            * gcc.target/powerpc/vec-splat-constant-df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-di.c: New test.
            * gcc.target/powerpc/vec-splat-constant-sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2di.c: New test.

Diff:
---
 gcc/config/rs6000/constraints.md                   |  10 ++
 gcc/config/rs6000/predicates.md                    | 140 +++++++++++++++++++++
 gcc/config/rs6000/rs6000-protos.h                  |   2 +
 gcc/config/rs6000/rs6000.c                         |  96 ++++++++++++++
 gcc/config/rs6000/rs6000.md                        |  58 ++++++---
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           |  60 ++++++++-
 gcc/doc/md.texi                                    |   6 +
 .../gcc.target/powerpc/pr86731-fwrapv-longlong.c   |   9 +-
 .../gcc.target/powerpc/vec-splat-constant-df.c     |  60 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-di.c     |  70 +++++++++++
 .../gcc.target/powerpc/vec-splat-constant-sf.c     |  60 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-v2df.c   |  64 ++++++++++
 .../gcc.target/powerpc/vec-splat-constant-v2di.c   |  50 ++++++++
 14 files changed, 663 insertions(+), 26 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index c8cff1a3038..1ff46c9f4fc 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -208,11 +208,21 @@
   (and (match_code "const_int")
        (match_test "((- (unsigned HOST_WIDE_INT) ival) + 0x8000) < 0x10000")))
 
+;; DI/SF/DF scalar constant that can be loaded with the XXSPLTIDP instruction.
+(define_constraint "eF"
+  "A 64-bit scalar constant that can be loaded with the XXSPLTIDP instruction."
+  (match_operand 0 "easy_fp_constant_64bit_scalar"))
+
 ;; 34-bit signed integer constant
 (define_constraint "eI"
   "A signed 34-bit integer constant if prefixed instructions are supported."
   (match_operand 0 "cint34_operand"))
 
+;; V2DI/V2DF vector constant that can be loaded with the XXSPLTIDP instruction.
+(define_constraint "eV"
+  "A 128-bit vector constant that can be loaded with the XXSPLTIDP instruction."
+  (match_operand 0 "easy_vector_constant_64bit_element"))
+
 ;; Floating-point constraints.  These two are defined so that insn
 ;; length attributes can be calculated exactly.
 
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 956e42bc514..7544ac87700 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -601,6 +601,11 @@
   if (TARGET_VSX && op == CONST0_RTX (mode))
     return 1;
 
+  /* If we have the ISA 3.1 XXSPLTIDP instruction, see if the constant can
+     be loaded with that instruction.  */
+  if (easy_fp_constant_64bit_scalar (op, mode))
+    return 1;
+
   /* Otherwise consider floating point constants hard, so that the
      constant gets pushed to memory during the early RTL phases.  This
      has the advantage that double precision constants that can be
@@ -609,6 +614,138 @@
    return 0;
 })
 
+;; Return 1 if the operand is a 64-bit scalar constant that can be loaded via
+;; the XXSPLTIDP instruction, which takes a SFmode value and produces a V2DF or
+;; V2DI mode result that is interpretted as a 64-bit scalar.
+(define_predicate "easy_fp_constant_64bit_scalar"
+  (match_code "const_int,const_double")
+{
+  const REAL_VALUE_TYPE *rv;
+  REAL_VALUE_TYPE rv_type;
+
+  /* Can we do the XXSPLTIDP instruction?  */
+  if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  if (mode == VOIDmode)
+    mode = GET_MODE (op);
+
+  /* Don't return true for 0.0 or 0 since that is easy to create without
+     XXSPLTIDP.  */
+  if (op == CONST0_RTX (mode))
+    return false;
+
+  /* Handle DImode by creating a DF value from it.  */
+  if (CONST_INT_P (op) && (mode == DImode || mode == VOIDmode))
+    {
+      HOST_WIDE_INT df_value = INTVAL (op);
+
+      /* Avoid values that look like DFmode NaN's.  The IEEE 754 64-bit
+         floating format has 1 bit for sign, 11 bits for the exponent,
+         and 52 bits for the mantissa.  NaN values have the exponent set
+         to all 1 bits, and the mantissa non-zero (mantissa == 0 is
+         infinity).  */
+      int df_exponent = (df_value >> 52) & 0x7ff;
+      HOST_WIDE_INT df_mantissa = df_value & HOST_WIDE_INT_C (0x1fffffffffffff);
+
+      if (df_exponent == 0x7ff && df_mantissa != 0)	/* NaN.  */
+	return false;
+
+      /* Avoid values that are DFmode subnormal values.  Subnormal numbers
+         have the exponent all 0 bits, and the mantissa non-zero.  If the
+         value is subnormal, then the hidden bit in the mantissa is not
+         set.  */
+      if (df_exponent == 0 && df_mantissa != 0)		/* subnormal.  */
+	return false;
+
+      long df_words[2];
+      df_words[0] = (df_value >> 32) & 0xffffffff;
+      df_words[1] = df_value & 0xffffffff;
+
+      /* real_from_target takes the target words in  target order.  */
+      if (!BYTES_BIG_ENDIAN)
+	std::swap (df_words[0], df_words[1]);
+
+      real_from_target (&rv_type, df_words, DFmode);
+      rv = &rv_type;
+    }
+
+  /* Handle SFmode/DFmode constants.  Don't allow decimal or IEEE 128-bit
+     binary constants.  */
+  else if (CONST_DOUBLE_P (op) && (mode == SFmode || mode == DFmode))
+    rv = CONST_DOUBLE_REAL_VALUE (op);
+
+  /* We can't handle anything else with the XXSPLTIDP instruction.  */
+  else
+    return false;  
+
+  /* Validate that the number can be stored as a SFmode value.  */
+  if (!exact_real_truncate (SFmode, rv))
+    return false;
+
+  /* Validate that the number is not a SFmode subnormal value (exponent is 0,
+     mantissa field is non-zero) which is undefined for the XXSPLTIDP
+     instruction.  */
+  long sf_value;
+  real_to_target (&sf_value, rv, SFmode);
+
+  /* IEEE 754 32-bit values have 1 bit for the sign, 8 bits for the exponent,
+     and 23 bits for the mantissa.  Subnormal numbers have the exponent all
+     0 bits, and the mantissa non-zero.  */
+  long sf_exponent = (sf_value >> 23) & 0xFF;
+  long sf_mantissa = sf_value & 0x7FFFFF;
+
+  if (sf_exponent == 0 && sf_mantissa != 0)
+    return false;
+
+  return true;
+})
+
+;; Return 1 if the operand is a 64-bit vector constant that can be loaded via
+;; the XXSPLTIDP instruction, which takes a SFmode value and produces a
+;; V2DFmode or V2DI result.
+;;
+;; We cannot combine the scalar and vector cases because otherwise it is
+;; problematical if we assign an appropriate integer constant to a TImode
+;; value.  I.e.
+;;
+;;	(set (reg:TI 32)
+;;	     (const_int 0x8000000000000000))
+;;
+;; Otherwise, the constant would be splatted into the 2 64-bit positions in the
+;; vector register, and not loaded with the upper 64-bits 0, and the constant
+;; in the lower 64-bits.
+
+(define_predicate "easy_vector_constant_64bit_element"
+  (match_code "const_vector,vec_duplicate")
+{
+  /* Can we do the XXSPLTIDP instruction?  */
+  if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  if (mode == VOIDmode)
+    mode = GET_MODE (op);
+
+  if (mode != V2DFmode && mode != V2DImode)
+    return false;
+
+  if (CONST_VECTOR_P (op))
+    {
+      if (!CONST_VECTOR_DUPLICATE_P (op))
+	return false;
+
+      op = CONST_VECTOR_ELT (op, 0);
+    }
+
+  else if (GET_CODE (op) == VEC_DUPLICATE)
+    op = XEXP (op, 0);
+
+  else
+    return false;
+
+  return easy_fp_constant_64bit_scalar (op, GET_MODE_INNER (mode));
+})
+
 ;; Return 1 if the operand is a constant that can loaded with a XXSPLTIB
 ;; instruction and then a VUPKHSB, VECSB2W or VECSB2D instruction.
 
@@ -653,6 +790,9 @@
       if (zero_constant (op, mode) || all_ones_constant (op, mode))
 	return true;
 
+      if (easy_vector_constant_64bit_element (op, mode))
+	return true;
+
       if (TARGET_P9_VECTOR
           && xxspltib_constant_p (op, mode, &num_insns, &value))
 	return true;
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 14f6b313105..e9be9c4d99f 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -32,6 +32,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, int, int, int,
 
 extern int easy_altivec_constant (rtx, machine_mode);
 extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
+extern long xxspltidp_constant_immediate (rtx, machine_mode);
 extern int vspltis_shifted (rtx);
 extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
 extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
@@ -198,6 +199,7 @@ enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode);
 extern bool prefixed_load_p (rtx_insn *);
 extern bool prefixed_store_p (rtx_insn *);
 extern bool prefixed_paddi_p (rtx_insn *);
+extern bool prefixed_xxsplti_p (rtx_insn *);
 extern void rs6000_asm_output_opcode (FILE *);
 extern void output_pcrel_opt_reloc (rtx);
 extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index ad860728169..83d243269e3 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6946,6 +6946,60 @@ xxspltib_constant_p (rtx op,
   return true;
 }
 
+/* Return the immediate value used in the XXSPLTIDP instruction.  */
+
+long
+xxspltidp_constant_immediate (rtx op, machine_mode mode)
+{
+  long ret;
+
+  /* Handle vectors.  */
+  if (CONST_VECTOR_P (op))
+    {
+      op = CONST_VECTOR_ELT (op, 0);
+      mode = GET_MODE_INNER (mode);
+    }
+
+  else if (GET_CODE (op) == VEC_DUPLICATE)
+    {
+      op = XEXP (op, 0);
+      mode = GET_MODE (op);
+    }
+
+  gcc_assert (easy_fp_constant_64bit_scalar (op, mode));
+
+  /* Handle DImode/V2DImode by creating a DF value from it and then converting
+     the DFmode value to SFmode.  */
+  if (CONST_INT_P (op))
+    {
+      HOST_WIDE_INT df_value = INTVAL (op);
+      long df_words[2];
+
+      df_words[0] = (df_value >> 32) & 0xffffffff;
+      df_words[1] = df_value & 0xffffffff;
+
+      /* real_to_target takes input in target endian order.  */
+      if (!BYTES_BIG_ENDIAN)
+	std::swap (df_words[0], df_words[1]);
+
+      REAL_VALUE_TYPE r;
+      real_from_target (&r, &df_words[0], DFmode);
+      real_to_target (&ret, &r, SFmode);
+    }
+
+  /* For floating point constants, convert to SFmode.  */
+  else if (CONST_DOUBLE_P (op) && (mode == SFmode || mode == DFmode))
+    {
+      const REAL_VALUE_TYPE *rv = CONST_DOUBLE_REAL_VALUE (op);
+      real_to_target (&ret, rv, SFmode);
+    }
+
+  else
+    gcc_unreachable ();
+
+  return ret;
+}
+
 const char *
 output_vec_const_move (rtx *operands)
 {
@@ -6990,6 +7044,13 @@ output_vec_const_move (rtx *operands)
 	    gcc_unreachable ();
 	}
 
+      if (easy_fp_constant_64bit_scalar (vec, mode)
+	  || easy_vector_constant_64bit_element (vec, mode))
+	{
+	  operands[2] = GEN_INT (xxspltidp_constant_immediate (vec, mode));
+	  return "xxspltidp %x0,%2";
+	}
+
       if (TARGET_P9_VECTOR
 	  && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
 	{
@@ -26724,6 +26785,41 @@ prefixed_paddi_p (rtx_insn *insn)
   return (iform == INSN_FORM_PCREL_EXTERNAL || iform == INSN_FORM_PCREL_LOCAL);
 }
 
+/* Whether a permute type instruction is a prefixed XXSPLTI* instruction.
+   This is called from the prefixed attribute processing.  */
+
+bool
+prefixed_xxsplti_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  if (!set)
+    return false;
+
+  rtx dest = SET_DEST (set);
+  rtx src = SET_SRC (set);
+  machine_mode mode = GET_MODE (dest);
+
+  if (!REG_P (dest) && !SUBREG_P (dest))
+    return false;
+
+  switch (mode)
+    {
+    case E_DImode:
+    case E_DFmode:
+    case E_SFmode:
+      return easy_fp_constant_64bit_scalar (src, mode);
+
+    case E_V2DImode:
+    case E_V2DFmode:
+      return easy_vector_constant_64bit_element (src, mode);
+
+    default:
+      break;
+    }
+
+  return false;
+}
+
 /* Whether the next instruction needs a 'p' prefix issued before the
    instruction is printed out.  */
 static bool prepend_p_to_next_insn;
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 6bec2bddbde..8afc4b2756d 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -314,6 +314,11 @@
 
 	 (eq_attr "type" "integer,add")
 	 (if_then_else (match_test "prefixed_paddi_p (insn)")
+		       (const_string "yes")
+		       (const_string "no"))
+
+	 (eq_attr "type" "vecperm")
+	 (if_then_else (match_test "prefixed_xxsplti_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))]
 
@@ -7759,17 +7764,17 @@
 ;;
 ;;	LWZ          LFS        LXSSP       LXSSPX     STFS       STXSSP
 ;;	STXSSPX      STW        XXLXOR      LI         FMR        XSCPSGNDP
-;;	MR           MT<x>      MF<x>       NOP
+;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP
 
 (define_insn "movsf_hardfloat"
   [(set (match_operand:SF 0 "nonimmediate_operand"
 	 "=!r,       f,         v,          wa,        m,         wY,
 	  Z,         m,         wa,         !r,        f,         wa,
-	  !r,        *c*l,      !r,         *h")
+	  !r,        *c*l,      !r,         *h,        wa")
 	(match_operand:SF 1 "input_operand"
 	 "m,         m,         wY,         Z,         f,         v,
 	  wa,        r,         j,          j,         f,         wa,
-	  r,         r,         *h,         0"))]
+	  r,         r,         *h,         0,         eF"))]
   "(register_operand (operands[0], SFmode)
    || register_operand (operands[1], SFmode))
    && TARGET_HARD_FLOAT
@@ -7791,15 +7796,16 @@
    mr %0,%1
    mt%0 %1
    mf%1 %0
-   nop"
+   nop
+   #"
   [(set_attr "type"
 	"load,       fpload,    fpload,     fpload,    fpstore,   fpstore,
 	 fpstore,    store,     veclogical, integer,   fpsimple,  fpsimple,
-	 *,          mtjmpr,    mfjmpr,     *")
+	 *,          mtjmpr,    mfjmpr,     *,         vecperm")
    (set_attr "isa"
 	"*,          *,         p9v,        p8v,       *,         p9v,
 	 p8v,        *,         *,          *,         *,         *,
-	 *,          *,         *,          *")])
+	 *,          *,         *,          *,         p10")])
 
 ;;	LWZ          LFIWZX     STW        STFIWX     MTVSRWZ    MFVSRWZ
 ;;	FMR          MR         MT%0       MF%1       NOP
@@ -8059,18 +8065,18 @@
 
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSD         STXSD       XXLOR       XXLXOR      GPR<-0
-;;           LWZ          STW         MR
+;;           LWZ          STW         MR          XXSPLTIDP
 
 
 (define_insn "*mov<mode>_hardfloat32"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
             "=m,          d,          d,          <f64_p9>,   wY,
               <f64_av>,   Z,          <f64_vsx>,  <f64_vsx>,  !r,
-              Y,          r,          !r")
+              Y,          r,          !r,         wa")
 	(match_operand:FMOVE64 1 "input_operand"
              "d,          m,          d,          wY,         <f64_p9>,
               Z,          <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
-              r,          Y,          r"))]
+              r,          Y,          r,          eF"))]
   "! TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8087,20 +8093,21 @@
    #
    #
    #
+   #
    #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, two,
-             store,       load,       two")
+             store,       load,       two,        vecperm")
    (set_attr "size" "64")
    (set_attr "length"
             "*,           *,          *,          *,          *,
              *,           *,          *,          *,          8,
-             8,           8,          8")
+             8,           8,          8,          *")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
-             *,           *,          *")])
+             *,           *,          *,          p10")])
 
 ;;           STW      LWZ     MR      G-const H-const F-const
 
@@ -8127,19 +8134,19 @@
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSDX        STXSDX      XXLOR       XXLXOR      LI 0
 ;;           STD          LD          MR          MT{CTR,LR}  MF{CTR,LR}
-;;           NOP          MFVSRD      MTVSRD
+;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP
 
 (define_insn "*mov<mode>_hardfloat64"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
            "=m,           d,          d,          <f64_p9>,   wY,
              <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
              YZ,          r,          !r,         *c*l,       !r,
-            *h,           r,          <f64_dm>")
+            *h,           r,          <f64_dm>,   wa")
 	(match_operand:FMOVE64 1 "input_operand"
             "d,           m,          d,          wY,         <f64_p9>,
              Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
              r,           YZ,         r,          r,          *h,
-             0,           <f64_dm>,   r"))]
+             0,           <f64_dm>,   r,          eF"))]
   "TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8161,18 +8168,19 @@
    mf%1 %0
    nop
    mfvsrd %0,%x1
-   mtvsrd %x0,%1"
+   mtvsrd %x0,%1
+   #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, integer,
              store,       load,       *,          mtjmpr,     mfjmpr,
-             *,           mfvsr,      mtvsr")
+             *,           mfvsr,      mtvsr,      vecperm")
    (set_attr "size" "64")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
              *,           *,          *,          *,          *,
-             *,           p8v,        p8v")])
+             *,           p8v,        p8v,        p10")])
 
 ;;           STD      LD       MR      MT<SPR> MF<SPR> G-const
 ;;           H-const  F-const  Special
@@ -9220,6 +9228,7 @@
 ;; a gpr into a fpr instead of reloading an invalid 'Y' address
 
 ;;        GPR store  GPR load   GPR move   FPR store  FPR load   FPR move
+;;	  XXSPLTIDP
 ;;        GPR const  AVX store  AVX store  AVX load   AVX load   VSX move
 ;;        P9 0       P9 -1      AVX 0/-1   VSX 0      VSX -1     P9 const
 ;;        AVX const  
@@ -9227,11 +9236,13 @@
 (define_insn "*movdi_internal32"
   [(set (match_operand:DI 0 "nonimmediate_operand"
          "=Y,        r,         r,         m,         ^d,        ^d,
+          ^wa,
           r,         wY,        Z,         ^v,        $v,        ^wa,
           wa,        wa,        v,         wa,        *i,        v,
           v")
 	(match_operand:DI 1 "input_operand"
          "r,         Y,         r,         ^d,        m,         ^d,
+          eF,
           IJKnF,     ^v,        $v,        wY,        Z,         ^wa,
           Oj,        wM,        OjwM,      Oj,        wM,        wS,
           wB"))]
@@ -9246,6 +9257,7 @@
    lfd%U1%X1 %0,%1
    fmr %0,%1
    #
+   #
    stxsd %1,%0
    stxsdx %x1,%y0
    lxsd %0,%1
@@ -9260,17 +9272,20 @@
    #"
   [(set_attr "type"
          "store,     load,      *,         fpstore,   fpload,    fpsimple,
+          vecperm,
           *,         fpstore,   fpstore,   fpload,    fpload,    veclogical,
           vecsimple, vecsimple, vecsimple, veclogical,veclogical,vecsimple,
           vecsimple")
    (set_attr "size" "64")
    (set_attr "length"
          "8,         8,         8,         *,         *,         *,
+          *,
           16,        *,         *,         *,         *,         *,
           *,         *,         *,         *,         *,         8,
           *")
    (set_attr "isa"
          "*,         *,         *,         *,         *,         *,
+          p10,
           *,         p9v,       p7v,       p9v,       p7v,       *,
           p9v,       p9v,       p7v,       *,         *,         p7v,
           p7v")])
@@ -9306,6 +9321,7 @@
 })
 
 ;;	   GPR store   GPR load    GPR move
+;;	   XXSPLTIDP
 ;;	   GPR li      GPR lis     GPR pli     GPR #
 ;;	   FPR store   FPR load    FPR move
 ;;	   AVX store   AVX store   AVX load    AVX load    VSX move
@@ -9316,6 +9332,7 @@
 (define_insn "*movdi_internal64"
   [(set (match_operand:DI 0 "nonimmediate_operand"
 	  "=YZ,        r,          r,
+	   ^wa,
 	   r,          r,          r,          r,
 	   m,          ^d,         ^d,
 	   wY,         Z,          $v,         $v,         ^wa,
@@ -9325,6 +9342,7 @@
 	   ?r,         ?wa")
 	(match_operand:DI 1 "input_operand"
 	  "r,          YZ,         r,
+	   eF,
 	   I,          L,          eI,         nF,
 	   ^d,         m,          ^d,
 	   ^v,         $v,         wY,         Z,          ^wa,
@@ -9339,6 +9357,7 @@
    std%U0%X0 %1,%0
    ld%U1%X1 %0,%1
    mr %0,%1
+   #
    li %0,%1
    lis %0,%v1
    li %0,%1
@@ -9365,6 +9384,7 @@
    mtvsrd %x0,%1"
   [(set_attr "type"
 	  "store,      load,       *,
+	   vecperm,
 	   *,          *,          *,          *,
 	   fpstore,    fpload,     fpsimple,
 	   fpstore,    fpstore,    fpload,     fpload,     veclogical,
@@ -9375,6 +9395,7 @@
    (set_attr "size" "64")
    (set_attr "length"
 	  "*,          *,          *,
+	   *,
 	   *,          *,          *,          20,
 	   *,          *,          *,
 	   *,          *,          *,          *,          *,
@@ -9384,6 +9405,7 @@
 	   *,          *")
    (set_attr "isa"
 	  "*,          *,          *,
+	   p10,
 	   *,          *,          p10,        *,
 	   *,          *,          *,
 	   p9v,        p7v,        p9v,        p7v,        *,
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 9d7878f144a..1d7ce4cc94a 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -640,6 +640,10 @@ mprivileged
 Target Var(rs6000_privileged) Init(0)
 Generate code that will run in privileged state.
 
+mxxspltidp
+Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
+Generate (do not generate) XXSPLTIDP instructions.
+
 -param=rs6000-density-pct-threshold=
 Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) IntegerRange(0, 100) Param
 When costing for loop vectorization, we probably need to penalize the loop body
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index bf033e31c1c..fa33c9d9fbf 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1191,16 +1191,19 @@
 ;; instruction). But generate XXLXOR/XXLORC if it will avoid a register move.
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
+;;              XXSPLTIDP
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        r,         we,        ?wQ,
+                wa,
                 ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
                 ?wa,       v,         <??r>,     wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        we,        r,         r,
+                eV,
                 wQ,        Y,         r,         r,         wE,        jwM,
                 ?jwM,      W,         <nW>,      v,         wZ"))]
 
@@ -1212,36 +1215,44 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
+                vecperm,
                 store,     load,      store,     *,         vecsimple, vecsimple,
                 vecsimple, *,         *,         vecstore,  vecload")
    (set_attr "num_insns"
                "*,         *,         *,         2,         *,         2,
+                *,
                 2,         2,         2,         2,         *,         *,
                 *,         5,         2,         *,         *")
    (set_attr "max_prefixed_insns"
                "*,         *,         *,         *,         *,         2,
+                *,
                 2,         2,         2,         2,         *,         *,
                 *,         *,         *,         *,         *")
    (set_attr "length"
                "*,         *,         *,         8,         *,         8,
+                *,
                 8,         8,         8,         8,         *,         *,
                 *,         20,        8,         *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
+                p10,
                 *,         *,         *,         *,         p9v,       *,
                 <VSisa>,   *,         *,         *,         *")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
+;;              XXSPLTIDP
 ;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
+                wa,
                 wa,        v,         ?wa,       v,         <??r>,
                 wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        Y,         r,         r,
+                eV,
                 wE,        jwM,       ?jwM,      W,         <nW>,
                 v,         wZ"))]
 
@@ -1253,14 +1264,17 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, load,      store,    *,
+                vecperm,
                 vecsimple, vecsimple, vecsimple, *,         *,
                 vecstore,  vecload")
    (set_attr "length"
                "*,         *,         *,         16,        16,        16,
+                *,
                 *,         *,         *,         20,        16,
                 *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
+                p10,
                 p9v,       *,         <VSisa>,   *,         *,
                 *,         *")])
 
@@ -6449,15 +6463,53 @@
   DONE;
 })
 
-(define_insn "xxspltidp_v2df_inst"
-  [(set (match_operand:V2DF 0 "register_operand" "=wa")
-	(unspec:V2DF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
-		     UNSPEC_XXSPLTIDP))]
+(define_mode_iterator XXSPLTIDP_S [DI SF DF])
+(define_mode_iterator XXSPLTIDP_V [V2DF V2DI])
+(define_mode_iterator XXSPLTIDP   [DI SF DF V2DF V2DI])
+
+(define_insn "xxspltidp_<mode>_inst"
+  [(set (match_operand:XXSPLTIDP 0 "register_operand" "=wa")
+	(unspec:XXSPLTIDP [(match_operand:SI 1 "c32bit_cint_operand" "n")]
+			  UNSPEC_XXSPLTIDP))]
   "TARGET_POWER10"
   "xxspltidp %x0,%1"
   [(set_attr "type" "vecperm")
    (set_attr "prefixed" "yes")])
 
+;; Generate the XXSPLTIDP instruction to support SFmode, DFmode, and DImode
+;; scalar constants and V2DF and V2DI vector constants where both elements are
+;; the same.  The constant has to be expressible as a SFmode constant that is
+;; not a SFmode denormal value.
+(define_insn_and_split "*xxspltidp_<mode>_internal"
+  [(set (match_operand:XXSPLTIDP_S 0 "vsx_register_operand" "=wa")
+	(match_operand:XXSPLTIDP_S 1 "easy_fp_constant_64bit_scalar" "eF"))]
+  "TARGET_POWER10"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(unspec:XXSPLTIDP_S [(match_dup 2)] UNSPEC_XXSPLTIDP))]
+{
+  long immediate = xxspltidp_constant_immediate (operands[1], <MODE>mode);
+  operands[2] = GEN_INT (immediate);
+}
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "yes")])
+
+(define_insn_and_split "*xxspltidp_<mode>_internal"
+  [(set (match_operand:XXSPLTIDP_V 0 "vsx_register_operand" "=wa")
+	(match_operand:XXSPLTIDP_V 1 "easy_vector_constant_64bit_element" "eV"))]
+  "TARGET_POWER10"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(unspec:XXSPLTIDP_V [(match_dup 2)] UNSPEC_XXSPLTIDP))]
+{
+  long immediate = xxspltidp_constant_immediate (operands[1], <MODE>mode);
+  operands[2] = GEN_INT (immediate);
+}
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "yes")])
+
 ;; XXSPLTI32DX built-in function support
 (define_expand "xxsplti32dx_v4si"
   [(set (match_operand:V4SI 0 "register_operand" "=wa")
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 2b41cb7fb7b..5035a3fd604 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3333,9 +3333,15 @@ The integer constant zero.
 A constant whose negation is a signed 16-bit constant.
 @end ifset
 
+@item eF
+A 64-bit scalar constant that can be loaded with the XXSPLTIDP instruction.
+
 @item eI
 A signed 34-bit integer constant if prefixed instructions are supported.
 
+@item eV
+A 128-bit vector constant that can be loaded with the XXSPLTIDP instruction.
+
 @ifset INTERNALS
 @item G
 A floating point constant that can be loaded into a register with one
diff --git a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
index bd1502bb30a..dcb30e1d886 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
@@ -24,11 +24,12 @@ vector signed long long splats4(void)
         return (vector signed long long) vec_sl(mzero, mzero);
 }
 
-/* Codegen will consist of splat and shift instructions for most types.
-   If folding is enabled, the vec_sl tests using vector long long type will
-   generate a lvx instead of a vspltisw+vsld pair.  */
+/* Codegen will consist of splat and shift instructions for most types.  If
+   folding is enabled, the vec_sl tests using vector long long type will
+   generate a lvx instead of a vspltisw+vsld pair.  On power10, it will
+   generate a xxspltidp instruction instead of the lvx.  */
 
 /* { dg-final { scan-assembler-times {\mvspltis[bhw]\M} 0 } } */
 /* { dg-final { scan-assembler-times {\mvsl[bhwd]\M} 0 } } */
-/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M|\mxxspltidp\M} 2 } } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
new file mode 100644
index 00000000000..8f6e176f9af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+double
+scalar_double_0 (void)
+{
+  return 0.0;			/* XXSPLTIB or XXLXOR.  */
+}
+
+double
+scalar_double_1 (void)
+{
+  return 1.0;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+double
+scalar_double_m0 (void)
+{
+  return -0.0;			/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_nan (void)
+{
+  return __builtin_nan ("");	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_inf (void)
+{
+  return __builtin_inf ();	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inf ();
+}
+#endif
+
+double
+scalar_double_pi (void)
+{
+  return M_PI;			/* PLFD.  */
+}
+
+double
+scalar_double_denorm (void)
+{
+  return 0x1p-149f;		/* PLFD.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c
new file mode 100644
index 00000000000..75714d0b11d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c
@@ -0,0 +1,70 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test generating DImode constants that have the same bit pattern as DFmode
+   constants that can be loaded with the XXSPLTIDP instruction with the ISA 3.1
+   (power10).  We use asm to force the value into vector registers.  */
+
+double
+scalar_0 (void)
+{
+  /* XXSPLTIB or XXLXOR.  */
+  double d;
+  long long ll = 0;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+double
+scalar_1 (void)
+{
+  /* VSPLTISW/VUPKLSW or XXSPLTIB/VEXTSB2D.  */
+  double d;
+  long long ll = 1;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated
+   with XXSPLTIDP.  */
+double
+scalar_float_neg_0 (void)
+{
+  /* XXSPLTIDP.  */
+  double d;
+  long long ll = 0x8000000000000000LL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with
+   XXSPLTIDP.  */
+double
+scalar_float_1_0 (void)
+{
+  /* XXSPLTIDP.  */
+  double d;
+  long long ll = 0x3ff0000000000000LL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated
+   with XXSPLTIDP.  */
+double
+scalar_pi (void)
+{
+  /* PLXV.  */
+  double d;
+  long long ll = 0x400921fb54442d18LL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
new file mode 100644
index 00000000000..72504bdfbbd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating SFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+float
+scalar_float_0 (void)
+{
+  return 0.0f;			/* XXSPLTIB or XXLXOR.  */
+}
+
+float
+scalar_float_1 (void)
+{
+  return 1.0f;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+float
+scalar_float_m0 (void)
+{
+  return -0.0f;			/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_nan (void)
+{
+  return __builtin_nanf ("");	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_inf (void)
+{
+  return __builtin_inff ();	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inff ();
+}
+#endif
+
+float
+scalar_float_pi (void)
+{
+  return (float)M_PI;		/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_denorm (void)
+{
+  return 0x1p-149f;		/* PLFS.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
new file mode 100644
index 00000000000..82ffc86f8aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
@@ -0,0 +1,64 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating V2DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+vector double
+v2df_double_0 (void)
+{
+  return (vector double) { 0.0, 0.0 };			/* XXSPLTIB or XXLXOR.  */
+}
+
+vector double
+v2df_double_1 (void)
+{
+  return (vector double) { 1.0, 1.0 };			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+vector double
+v2df_double_m0 (void)
+{
+  return (vector double) { -0.0, -0.0 };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_nan (void)
+{
+  return (vector double) { __builtin_nan (""),
+			   __builtin_nan ("") };	/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_inf (void)
+{
+  return (vector double) { __builtin_inf (),
+			   __builtin_inf () };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_m_inf (void)
+{
+  return (vector double) { - __builtin_inf (),
+			   - __builtin_inf () };	/* XXSPLTIDP.  */
+}
+#endif
+
+vector double
+v2df_double_pi (void)
+{
+  return (vector double) { M_PI, M_PI };		/* PLVX.  */
+}
+
+vector double
+v2df_double_denorm (void)
+{
+  return (vector double) { (double)0x1p-149f,
+			   (double)0x1p-149f };		/* PLVX.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
new file mode 100644
index 00000000000..4d44f943d26
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test generating V2DImode constants that have the same bit pattern as
+   V2DFmode constants that can be loaded with the XXSPLTIDP instruction with
+   the ISA 3.1 (power10).  */
+
+vector long long
+vector_0 (void)
+{
+  /* XXSPLTIB or XXLXOR.  */
+  return (vector long long) { 0LL, 0LL };
+}
+
+vector long long
+vector_1 (void)
+{
+  /* XXSPLTIB and VEXTSB2D.  */
+  return (vector long long) { 1LL, 1LL };
+}
+
+/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated
+   with XXSPLTISDP.  */
+vector long long
+vector_float_neg_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x8000000000000000LL, 0x8000000000000000LL };
+}
+
+/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with
+   XXSPLTISDP.  */
+vector long long
+vector_float_1_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x3ff0000000000000LL, 0x3ff0000000000000LL };
+}
+
+/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated
+   with XXSPLTIDP.  */
+vector long long
+scalar_pi (void)
+{
+  /* PLXV.  */
+  return (vector long long) { 0x400921fb54442d18LL, 0x400921fb54442d18LL };
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [gcc(refs/users/meissner/heads/work070)] Generate XXSPLTIDP on power10.
@ 2021-10-05 21:59 Michael Meissner
  0 siblings, 0 replies; 5+ messages in thread
From: Michael Meissner @ 2021-10-05 21:59 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:332c130e3294e22888d6163ed6904bf32b38cec4

commit 332c130e3294e22888d6163ed6904bf32b38cec4
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Tue Oct 5 17:58:50 2021 -0400

    Generate XXSPLTIDP on power10.
    
    This patch implements XXSPLTIDP support for SF, DF, and DI scalar constants and
    V2DF and V2DI vector constants.  The XXSPLTIDP instruction is given a 32-bit
    immediate that is converted to a vector of two DFmode constants.  The immediate
    is in SFmode format, so only constants that fit as SFmode values can be loaded
    with XXSPLTIDP.
    
    I added two new constraints (eF and eV) to match scalar and vector constants
    that can be loaded with the XXSPLTIDP instruction.
    
    I have added a temporary switch (-mxxspltidp) to control whether or not the
    XXSPLTIDP instruction is generated.
    
    I added 5 new tests to test loading up SF/DF/DI scalar and V2DI/V2DF vector
    constants.
    
    2021-10-05  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/constraints.md (eF): New constraint.
            (eV): New constraint.
            * config/rs6000/predicates.md (easy_fp_constant): If we can load
            the scalar constant with XXSPLTIDP, the constant is easy.
            (easy_fp_constant_64bit_scalar): New predicate.
            (easy_vector_constant_64bit_element): New predicate.
            (easy_vector_constant): If we can generate XXSPLTIDP, mark the
            vector constant as easy.
            * config/rs6000/rs6000-protos.h (xxspltidp_constant_immediate): New
            declaration.
            (prefixed_xxsplti_p): Likewise.
            * config/rs6000/rs6000.c (xxspltidp_constant_immediate): New function.
            (output_vec_const_move): Add support for XXSPLTIDP.
            (prefixed_xxsplti_p): New function.
            * config/rs6000/rs6000.md (prefixed attribute): Add support for the
            xxsplti* prefixed instructions.
            (movsf_hardfloat): Add XXSPLTIDP support.
            (mov<mode>_hardfloat32, FMOVE64 iterator): Likewise.
            (mov<mode>_hardfloat64, FMOVE64 iterator): Likewise.
            (movdi_internal32): Likewise.
            (movdi_internal64): Likewise.
            * config/rs6000/rs6000.opt (-mxxspltidp): New switch.
            * config/rs6000/vsx.md (vsx_move<mode>_64bit): Add XXSPLTIDP
            support.
            (vsx_move<mode>_32bit): Likewise.
            (XXSPLTIDP_S): New mode iterator.
            (XXSPLTIDP_V): Likewise.
            (XXSPLTIDP): Likewise.
            (xxspltidp_<mode>_inst): Replace xxspltidp_v2df_inst with an
            iterated form that also does SFmode, DFmode, DImode, and
            V2DImode.
            (xxspltidp_<mode>_internal): New insn and splits.
            * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
            eF and eV constraints.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/pr86731-fwrapv-longlong.c: Update insn
            regex for power10.
            * gcc.target/powerpc/vec-splat-constant-df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-di.c: New test.
            * gcc.target/powerpc/vec-splat-constant-sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2di.c: New test.

Diff:
---
 gcc/config/rs6000/constraints.md                   |  10 ++
 gcc/config/rs6000/predicates.md                    | 140 +++++++++++++++++++++
 gcc/config/rs6000/rs6000-protos.h                  |   2 +
 gcc/config/rs6000/rs6000.c                         |  96 ++++++++++++++
 gcc/config/rs6000/rs6000.md                        |  58 ++++++---
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           |  60 ++++++++-
 gcc/doc/md.texi                                    |   6 +
 .../gcc.target/powerpc/pr86731-fwrapv-longlong.c   |   9 +-
 .../gcc.target/powerpc/vec-splat-constant-df.c     |  60 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-di.c     |  70 +++++++++++
 .../gcc.target/powerpc/vec-splat-constant-sf.c     |  60 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-v2df.c   |  64 ++++++++++
 .../gcc.target/powerpc/vec-splat-constant-v2di.c   |  50 ++++++++
 14 files changed, 663 insertions(+), 26 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index c8cff1a3038..1ff46c9f4fc 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -208,11 +208,21 @@
   (and (match_code "const_int")
        (match_test "((- (unsigned HOST_WIDE_INT) ival) + 0x8000) < 0x10000")))
 
+;; DI/SF/DF scalar constant that can be loaded with the XXSPLTIDP instruction.
+(define_constraint "eF"
+  "A 64-bit scalar constant that can be loaded with the XXSPLTIDP instruction."
+  (match_operand 0 "easy_fp_constant_64bit_scalar"))
+
 ;; 34-bit signed integer constant
 (define_constraint "eI"
   "A signed 34-bit integer constant if prefixed instructions are supported."
   (match_operand 0 "cint34_operand"))
 
+;; V2DI/V2DF vector constant that can be loaded with the XXSPLTIDP instruction.
+(define_constraint "eV"
+  "A 128-bit vector constant that can be loaded with the XXSPLTIDP instruction."
+  (match_operand 0 "easy_vector_constant_64bit_element"))
+
 ;; Floating-point constraints.  These two are defined so that insn
 ;; length attributes can be calculated exactly.
 
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 956e42bc514..7544ac87700 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -601,6 +601,11 @@
   if (TARGET_VSX && op == CONST0_RTX (mode))
     return 1;
 
+  /* If we have the ISA 3.1 XXSPLTIDP instruction, see if the constant can
+     be loaded with that instruction.  */
+  if (easy_fp_constant_64bit_scalar (op, mode))
+    return 1;
+
   /* Otherwise consider floating point constants hard, so that the
      constant gets pushed to memory during the early RTL phases.  This
      has the advantage that double precision constants that can be
@@ -609,6 +614,138 @@
    return 0;
 })
 
+;; Return 1 if the operand is a 64-bit scalar constant that can be loaded via
+;; the XXSPLTIDP instruction, which takes a SFmode value and produces a V2DF or
+;; V2DI mode result that is interpretted as a 64-bit scalar.
+(define_predicate "easy_fp_constant_64bit_scalar"
+  (match_code "const_int,const_double")
+{
+  const REAL_VALUE_TYPE *rv;
+  REAL_VALUE_TYPE rv_type;
+
+  /* Can we do the XXSPLTIDP instruction?  */
+  if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  if (mode == VOIDmode)
+    mode = GET_MODE (op);
+
+  /* Don't return true for 0.0 or 0 since that is easy to create without
+     XXSPLTIDP.  */
+  if (op == CONST0_RTX (mode))
+    return false;
+
+  /* Handle DImode by creating a DF value from it.  */
+  if (CONST_INT_P (op) && (mode == DImode || mode == VOIDmode))
+    {
+      HOST_WIDE_INT df_value = INTVAL (op);
+
+      /* Avoid values that look like DFmode NaN's.  The IEEE 754 64-bit
+         floating format has 1 bit for sign, 11 bits for the exponent,
+         and 52 bits for the mantissa.  NaN values have the exponent set
+         to all 1 bits, and the mantissa non-zero (mantissa == 0 is
+         infinity).  */
+      int df_exponent = (df_value >> 52) & 0x7ff;
+      HOST_WIDE_INT df_mantissa = df_value & HOST_WIDE_INT_C (0x1fffffffffffff);
+
+      if (df_exponent == 0x7ff && df_mantissa != 0)	/* NaN.  */
+	return false;
+
+      /* Avoid values that are DFmode subnormal values.  Subnormal numbers
+         have the exponent all 0 bits, and the mantissa non-zero.  If the
+         value is subnormal, then the hidden bit in the mantissa is not
+         set.  */
+      if (df_exponent == 0 && df_mantissa != 0)		/* subnormal.  */
+	return false;
+
+      long df_words[2];
+      df_words[0] = (df_value >> 32) & 0xffffffff;
+      df_words[1] = df_value & 0xffffffff;
+
+      /* real_from_target takes the target words in  target order.  */
+      if (!BYTES_BIG_ENDIAN)
+	std::swap (df_words[0], df_words[1]);
+
+      real_from_target (&rv_type, df_words, DFmode);
+      rv = &rv_type;
+    }
+
+  /* Handle SFmode/DFmode constants.  Don't allow decimal or IEEE 128-bit
+     binary constants.  */
+  else if (CONST_DOUBLE_P (op) && (mode == SFmode || mode == DFmode))
+    rv = CONST_DOUBLE_REAL_VALUE (op);
+
+  /* We can't handle anything else with the XXSPLTIDP instruction.  */
+  else
+    return false;  
+
+  /* Validate that the number can be stored as a SFmode value.  */
+  if (!exact_real_truncate (SFmode, rv))
+    return false;
+
+  /* Validate that the number is not a SFmode subnormal value (exponent is 0,
+     mantissa field is non-zero) which is undefined for the XXSPLTIDP
+     instruction.  */
+  long sf_value;
+  real_to_target (&sf_value, rv, SFmode);
+
+  /* IEEE 754 32-bit values have 1 bit for the sign, 8 bits for the exponent,
+     and 23 bits for the mantissa.  Subnormal numbers have the exponent all
+     0 bits, and the mantissa non-zero.  */
+  long sf_exponent = (sf_value >> 23) & 0xFF;
+  long sf_mantissa = sf_value & 0x7FFFFF;
+
+  if (sf_exponent == 0 && sf_mantissa != 0)
+    return false;
+
+  return true;
+})
+
+;; Return 1 if the operand is a 64-bit vector constant that can be loaded via
+;; the XXSPLTIDP instruction, which takes a SFmode value and produces a
+;; V2DFmode or V2DI result.
+;;
+;; We cannot combine the scalar and vector cases because otherwise it is
+;; problematical if we assign an appropriate integer constant to a TImode
+;; value.  I.e.
+;;
+;;	(set (reg:TI 32)
+;;	     (const_int 0x8000000000000000))
+;;
+;; Otherwise, the constant would be splatted into the 2 64-bit positions in the
+;; vector register, and not loaded with the upper 64-bits 0, and the constant
+;; in the lower 64-bits.
+
+(define_predicate "easy_vector_constant_64bit_element"
+  (match_code "const_vector,vec_duplicate")
+{
+  /* Can we do the XXSPLTIDP instruction?  */
+  if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  if (mode == VOIDmode)
+    mode = GET_MODE (op);
+
+  if (mode != V2DFmode && mode != V2DImode)
+    return false;
+
+  if (CONST_VECTOR_P (op))
+    {
+      if (!CONST_VECTOR_DUPLICATE_P (op))
+	return false;
+
+      op = CONST_VECTOR_ELT (op, 0);
+    }
+
+  else if (GET_CODE (op) == VEC_DUPLICATE)
+    op = XEXP (op, 0);
+
+  else
+    return false;
+
+  return easy_fp_constant_64bit_scalar (op, GET_MODE_INNER (mode));
+})
+
 ;; Return 1 if the operand is a constant that can loaded with a XXSPLTIB
 ;; instruction and then a VUPKHSB, VECSB2W or VECSB2D instruction.
 
@@ -653,6 +790,9 @@
       if (zero_constant (op, mode) || all_ones_constant (op, mode))
 	return true;
 
+      if (easy_vector_constant_64bit_element (op, mode))
+	return true;
+
       if (TARGET_P9_VECTOR
           && xxspltib_constant_p (op, mode, &num_insns, &value))
 	return true;
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 14f6b313105..e9be9c4d99f 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -32,6 +32,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, int, int, int,
 
 extern int easy_altivec_constant (rtx, machine_mode);
 extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
+extern long xxspltidp_constant_immediate (rtx, machine_mode);
 extern int vspltis_shifted (rtx);
 extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
 extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
@@ -198,6 +199,7 @@ enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode);
 extern bool prefixed_load_p (rtx_insn *);
 extern bool prefixed_store_p (rtx_insn *);
 extern bool prefixed_paddi_p (rtx_insn *);
+extern bool prefixed_xxsplti_p (rtx_insn *);
 extern void rs6000_asm_output_opcode (FILE *);
 extern void output_pcrel_opt_reloc (rtx);
 extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index ad860728169..83d243269e3 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6946,6 +6946,60 @@ xxspltib_constant_p (rtx op,
   return true;
 }
 
+/* Return the immediate value used in the XXSPLTIDP instruction.  */
+
+long
+xxspltidp_constant_immediate (rtx op, machine_mode mode)
+{
+  long ret;
+
+  /* Handle vectors.  */
+  if (CONST_VECTOR_P (op))
+    {
+      op = CONST_VECTOR_ELT (op, 0);
+      mode = GET_MODE_INNER (mode);
+    }
+
+  else if (GET_CODE (op) == VEC_DUPLICATE)
+    {
+      op = XEXP (op, 0);
+      mode = GET_MODE (op);
+    }
+
+  gcc_assert (easy_fp_constant_64bit_scalar (op, mode));
+
+  /* Handle DImode/V2DImode by creating a DF value from it and then converting
+     the DFmode value to SFmode.  */
+  if (CONST_INT_P (op))
+    {
+      HOST_WIDE_INT df_value = INTVAL (op);
+      long df_words[2];
+
+      df_words[0] = (df_value >> 32) & 0xffffffff;
+      df_words[1] = df_value & 0xffffffff;
+
+      /* real_to_target takes input in target endian order.  */
+      if (!BYTES_BIG_ENDIAN)
+	std::swap (df_words[0], df_words[1]);
+
+      REAL_VALUE_TYPE r;
+      real_from_target (&r, &df_words[0], DFmode);
+      real_to_target (&ret, &r, SFmode);
+    }
+
+  /* For floating point constants, convert to SFmode.  */
+  else if (CONST_DOUBLE_P (op) && (mode == SFmode || mode == DFmode))
+    {
+      const REAL_VALUE_TYPE *rv = CONST_DOUBLE_REAL_VALUE (op);
+      real_to_target (&ret, rv, SFmode);
+    }
+
+  else
+    gcc_unreachable ();
+
+  return ret;
+}
+
 const char *
 output_vec_const_move (rtx *operands)
 {
@@ -6990,6 +7044,13 @@ output_vec_const_move (rtx *operands)
 	    gcc_unreachable ();
 	}
 
+      if (easy_fp_constant_64bit_scalar (vec, mode)
+	  || easy_vector_constant_64bit_element (vec, mode))
+	{
+	  operands[2] = GEN_INT (xxspltidp_constant_immediate (vec, mode));
+	  return "xxspltidp %x0,%2";
+	}
+
       if (TARGET_P9_VECTOR
 	  && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
 	{
@@ -26724,6 +26785,41 @@ prefixed_paddi_p (rtx_insn *insn)
   return (iform == INSN_FORM_PCREL_EXTERNAL || iform == INSN_FORM_PCREL_LOCAL);
 }
 
+/* Whether a permute type instruction is a prefixed XXSPLTI* instruction.
+   This is called from the prefixed attribute processing.  */
+
+bool
+prefixed_xxsplti_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  if (!set)
+    return false;
+
+  rtx dest = SET_DEST (set);
+  rtx src = SET_SRC (set);
+  machine_mode mode = GET_MODE (dest);
+
+  if (!REG_P (dest) && !SUBREG_P (dest))
+    return false;
+
+  switch (mode)
+    {
+    case E_DImode:
+    case E_DFmode:
+    case E_SFmode:
+      return easy_fp_constant_64bit_scalar (src, mode);
+
+    case E_V2DImode:
+    case E_V2DFmode:
+      return easy_vector_constant_64bit_element (src, mode);
+
+    default:
+      break;
+    }
+
+  return false;
+}
+
 /* Whether the next instruction needs a 'p' prefix issued before the
    instruction is printed out.  */
 static bool prepend_p_to_next_insn;
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 6bec2bddbde..8afc4b2756d 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -314,6 +314,11 @@
 
 	 (eq_attr "type" "integer,add")
 	 (if_then_else (match_test "prefixed_paddi_p (insn)")
+		       (const_string "yes")
+		       (const_string "no"))
+
+	 (eq_attr "type" "vecperm")
+	 (if_then_else (match_test "prefixed_xxsplti_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))]
 
@@ -7759,17 +7764,17 @@
 ;;
 ;;	LWZ          LFS        LXSSP       LXSSPX     STFS       STXSSP
 ;;	STXSSPX      STW        XXLXOR      LI         FMR        XSCPSGNDP
-;;	MR           MT<x>      MF<x>       NOP
+;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP
 
 (define_insn "movsf_hardfloat"
   [(set (match_operand:SF 0 "nonimmediate_operand"
 	 "=!r,       f,         v,          wa,        m,         wY,
 	  Z,         m,         wa,         !r,        f,         wa,
-	  !r,        *c*l,      !r,         *h")
+	  !r,        *c*l,      !r,         *h,        wa")
 	(match_operand:SF 1 "input_operand"
 	 "m,         m,         wY,         Z,         f,         v,
 	  wa,        r,         j,          j,         f,         wa,
-	  r,         r,         *h,         0"))]
+	  r,         r,         *h,         0,         eF"))]
   "(register_operand (operands[0], SFmode)
    || register_operand (operands[1], SFmode))
    && TARGET_HARD_FLOAT
@@ -7791,15 +7796,16 @@
    mr %0,%1
    mt%0 %1
    mf%1 %0
-   nop"
+   nop
+   #"
   [(set_attr "type"
 	"load,       fpload,    fpload,     fpload,    fpstore,   fpstore,
 	 fpstore,    store,     veclogical, integer,   fpsimple,  fpsimple,
-	 *,          mtjmpr,    mfjmpr,     *")
+	 *,          mtjmpr,    mfjmpr,     *,         vecperm")
    (set_attr "isa"
 	"*,          *,         p9v,        p8v,       *,         p9v,
 	 p8v,        *,         *,          *,         *,         *,
-	 *,          *,         *,          *")])
+	 *,          *,         *,          *,         p10")])
 
 ;;	LWZ          LFIWZX     STW        STFIWX     MTVSRWZ    MFVSRWZ
 ;;	FMR          MR         MT%0       MF%1       NOP
@@ -8059,18 +8065,18 @@
 
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSD         STXSD       XXLOR       XXLXOR      GPR<-0
-;;           LWZ          STW         MR
+;;           LWZ          STW         MR          XXSPLTIDP
 
 
 (define_insn "*mov<mode>_hardfloat32"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
             "=m,          d,          d,          <f64_p9>,   wY,
               <f64_av>,   Z,          <f64_vsx>,  <f64_vsx>,  !r,
-              Y,          r,          !r")
+              Y,          r,          !r,         wa")
 	(match_operand:FMOVE64 1 "input_operand"
              "d,          m,          d,          wY,         <f64_p9>,
               Z,          <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
-              r,          Y,          r"))]
+              r,          Y,          r,          eF"))]
   "! TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8087,20 +8093,21 @@
    #
    #
    #
+   #
    #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, two,
-             store,       load,       two")
+             store,       load,       two,        vecperm")
    (set_attr "size" "64")
    (set_attr "length"
             "*,           *,          *,          *,          *,
              *,           *,          *,          *,          8,
-             8,           8,          8")
+             8,           8,          8,          *")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
-             *,           *,          *")])
+             *,           *,          *,          p10")])
 
 ;;           STW      LWZ     MR      G-const H-const F-const
 
@@ -8127,19 +8134,19 @@
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSDX        STXSDX      XXLOR       XXLXOR      LI 0
 ;;           STD          LD          MR          MT{CTR,LR}  MF{CTR,LR}
-;;           NOP          MFVSRD      MTVSRD
+;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP
 
 (define_insn "*mov<mode>_hardfloat64"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
            "=m,           d,          d,          <f64_p9>,   wY,
              <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
              YZ,          r,          !r,         *c*l,       !r,
-            *h,           r,          <f64_dm>")
+            *h,           r,          <f64_dm>,   wa")
 	(match_operand:FMOVE64 1 "input_operand"
             "d,           m,          d,          wY,         <f64_p9>,
              Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
              r,           YZ,         r,          r,          *h,
-             0,           <f64_dm>,   r"))]
+             0,           <f64_dm>,   r,          eF"))]
   "TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8161,18 +8168,19 @@
    mf%1 %0
    nop
    mfvsrd %0,%x1
-   mtvsrd %x0,%1"
+   mtvsrd %x0,%1
+   #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, integer,
              store,       load,       *,          mtjmpr,     mfjmpr,
-             *,           mfvsr,      mtvsr")
+             *,           mfvsr,      mtvsr,      vecperm")
    (set_attr "size" "64")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
              *,           *,          *,          *,          *,
-             *,           p8v,        p8v")])
+             *,           p8v,        p8v,        p10")])
 
 ;;           STD      LD       MR      MT<SPR> MF<SPR> G-const
 ;;           H-const  F-const  Special
@@ -9220,6 +9228,7 @@
 ;; a gpr into a fpr instead of reloading an invalid 'Y' address
 
 ;;        GPR store  GPR load   GPR move   FPR store  FPR load   FPR move
+;;	  XXSPLTIDP
 ;;        GPR const  AVX store  AVX store  AVX load   AVX load   VSX move
 ;;        P9 0       P9 -1      AVX 0/-1   VSX 0      VSX -1     P9 const
 ;;        AVX const  
@@ -9227,11 +9236,13 @@
 (define_insn "*movdi_internal32"
   [(set (match_operand:DI 0 "nonimmediate_operand"
          "=Y,        r,         r,         m,         ^d,        ^d,
+          ^wa,
           r,         wY,        Z,         ^v,        $v,        ^wa,
           wa,        wa,        v,         wa,        *i,        v,
           v")
 	(match_operand:DI 1 "input_operand"
          "r,         Y,         r,         ^d,        m,         ^d,
+          eF,
           IJKnF,     ^v,        $v,        wY,        Z,         ^wa,
           Oj,        wM,        OjwM,      Oj,        wM,        wS,
           wB"))]
@@ -9246,6 +9257,7 @@
    lfd%U1%X1 %0,%1
    fmr %0,%1
    #
+   #
    stxsd %1,%0
    stxsdx %x1,%y0
    lxsd %0,%1
@@ -9260,17 +9272,20 @@
    #"
   [(set_attr "type"
          "store,     load,      *,         fpstore,   fpload,    fpsimple,
+          vecperm,
           *,         fpstore,   fpstore,   fpload,    fpload,    veclogical,
           vecsimple, vecsimple, vecsimple, veclogical,veclogical,vecsimple,
           vecsimple")
    (set_attr "size" "64")
    (set_attr "length"
          "8,         8,         8,         *,         *,         *,
+          *,
           16,        *,         *,         *,         *,         *,
           *,         *,         *,         *,         *,         8,
           *")
    (set_attr "isa"
          "*,         *,         *,         *,         *,         *,
+          p10,
           *,         p9v,       p7v,       p9v,       p7v,       *,
           p9v,       p9v,       p7v,       *,         *,         p7v,
           p7v")])
@@ -9306,6 +9321,7 @@
 })
 
 ;;	   GPR store   GPR load    GPR move
+;;	   XXSPLTIDP
 ;;	   GPR li      GPR lis     GPR pli     GPR #
 ;;	   FPR store   FPR load    FPR move
 ;;	   AVX store   AVX store   AVX load    AVX load    VSX move
@@ -9316,6 +9332,7 @@
 (define_insn "*movdi_internal64"
   [(set (match_operand:DI 0 "nonimmediate_operand"
 	  "=YZ,        r,          r,
+	   ^wa,
 	   r,          r,          r,          r,
 	   m,          ^d,         ^d,
 	   wY,         Z,          $v,         $v,         ^wa,
@@ -9325,6 +9342,7 @@
 	   ?r,         ?wa")
 	(match_operand:DI 1 "input_operand"
 	  "r,          YZ,         r,
+	   eF,
 	   I,          L,          eI,         nF,
 	   ^d,         m,          ^d,
 	   ^v,         $v,         wY,         Z,          ^wa,
@@ -9339,6 +9357,7 @@
    std%U0%X0 %1,%0
    ld%U1%X1 %0,%1
    mr %0,%1
+   #
    li %0,%1
    lis %0,%v1
    li %0,%1
@@ -9365,6 +9384,7 @@
    mtvsrd %x0,%1"
   [(set_attr "type"
 	  "store,      load,       *,
+	   vecperm,
 	   *,          *,          *,          *,
 	   fpstore,    fpload,     fpsimple,
 	   fpstore,    fpstore,    fpload,     fpload,     veclogical,
@@ -9375,6 +9395,7 @@
    (set_attr "size" "64")
    (set_attr "length"
 	  "*,          *,          *,
+	   *,
 	   *,          *,          *,          20,
 	   *,          *,          *,
 	   *,          *,          *,          *,          *,
@@ -9384,6 +9405,7 @@
 	   *,          *")
    (set_attr "isa"
 	  "*,          *,          *,
+	   p10,
 	   *,          *,          p10,        *,
 	   *,          *,          *,
 	   p9v,        p7v,        p9v,        p7v,        *,
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 9d7878f144a..1d7ce4cc94a 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -640,6 +640,10 @@ mprivileged
 Target Var(rs6000_privileged) Init(0)
 Generate code that will run in privileged state.
 
+mxxspltidp
+Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
+Generate (do not generate) XXSPLTIDP instructions.
+
 -param=rs6000-density-pct-threshold=
 Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) IntegerRange(0, 100) Param
 When costing for loop vectorization, we probably need to penalize the loop body
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index bf033e31c1c..fa33c9d9fbf 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1191,16 +1191,19 @@
 ;; instruction). But generate XXLXOR/XXLORC if it will avoid a register move.
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
+;;              XXSPLTIDP
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        r,         we,        ?wQ,
+                wa,
                 ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
                 ?wa,       v,         <??r>,     wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        we,        r,         r,
+                eV,
                 wQ,        Y,         r,         r,         wE,        jwM,
                 ?jwM,      W,         <nW>,      v,         wZ"))]
 
@@ -1212,36 +1215,44 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
+                vecperm,
                 store,     load,      store,     *,         vecsimple, vecsimple,
                 vecsimple, *,         *,         vecstore,  vecload")
    (set_attr "num_insns"
                "*,         *,         *,         2,         *,         2,
+                *,
                 2,         2,         2,         2,         *,         *,
                 *,         5,         2,         *,         *")
    (set_attr "max_prefixed_insns"
                "*,         *,         *,         *,         *,         2,
+                *,
                 2,         2,         2,         2,         *,         *,
                 *,         *,         *,         *,         *")
    (set_attr "length"
                "*,         *,         *,         8,         *,         8,
+                *,
                 8,         8,         8,         8,         *,         *,
                 *,         20,        8,         *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
+                p10,
                 *,         *,         *,         *,         p9v,       *,
                 <VSisa>,   *,         *,         *,         *")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
+;;              XXSPLTIDP
 ;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
+                wa,
                 wa,        v,         ?wa,       v,         <??r>,
                 wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        Y,         r,         r,
+                eV,
                 wE,        jwM,       ?jwM,      W,         <nW>,
                 v,         wZ"))]
 
@@ -1253,14 +1264,17 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, load,      store,    *,
+                vecperm,
                 vecsimple, vecsimple, vecsimple, *,         *,
                 vecstore,  vecload")
    (set_attr "length"
                "*,         *,         *,         16,        16,        16,
+                *,
                 *,         *,         *,         20,        16,
                 *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
+                p10,
                 p9v,       *,         <VSisa>,   *,         *,
                 *,         *")])
 
@@ -6449,15 +6463,53 @@
   DONE;
 })
 
-(define_insn "xxspltidp_v2df_inst"
-  [(set (match_operand:V2DF 0 "register_operand" "=wa")
-	(unspec:V2DF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
-		     UNSPEC_XXSPLTIDP))]
+(define_mode_iterator XXSPLTIDP_S [DI SF DF])
+(define_mode_iterator XXSPLTIDP_V [V2DF V2DI])
+(define_mode_iterator XXSPLTIDP   [DI SF DF V2DF V2DI])
+
+(define_insn "xxspltidp_<mode>_inst"
+  [(set (match_operand:XXSPLTIDP 0 "register_operand" "=wa")
+	(unspec:XXSPLTIDP [(match_operand:SI 1 "c32bit_cint_operand" "n")]
+			  UNSPEC_XXSPLTIDP))]
   "TARGET_POWER10"
   "xxspltidp %x0,%1"
   [(set_attr "type" "vecperm")
    (set_attr "prefixed" "yes")])
 
+;; Generate the XXSPLTIDP instruction to support SFmode, DFmode, and DImode
+;; scalar constants and V2DF and V2DI vector constants where both elements are
+;; the same.  The constant has to be expressible as a SFmode constant that is
+;; not a SFmode denormal value.
+(define_insn_and_split "*xxspltidp_<mode>_internal"
+  [(set (match_operand:XXSPLTIDP_S 0 "vsx_register_operand" "=wa")
+	(match_operand:XXSPLTIDP_S 1 "easy_fp_constant_64bit_scalar" "eF"))]
+  "TARGET_POWER10"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(unspec:XXSPLTIDP_S [(match_dup 2)] UNSPEC_XXSPLTIDP))]
+{
+  long immediate = xxspltidp_constant_immediate (operands[1], <MODE>mode);
+  operands[2] = GEN_INT (immediate);
+}
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "yes")])
+
+(define_insn_and_split "*xxspltidp_<mode>_internal"
+  [(set (match_operand:XXSPLTIDP_V 0 "vsx_register_operand" "=wa")
+	(match_operand:XXSPLTIDP_V 1 "easy_vector_constant_64bit_element" "eV"))]
+  "TARGET_POWER10"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(unspec:XXSPLTIDP_V [(match_dup 2)] UNSPEC_XXSPLTIDP))]
+{
+  long immediate = xxspltidp_constant_immediate (operands[1], <MODE>mode);
+  operands[2] = GEN_INT (immediate);
+}
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "yes")])
+
 ;; XXSPLTI32DX built-in function support
 (define_expand "xxsplti32dx_v4si"
   [(set (match_operand:V4SI 0 "register_operand" "=wa")
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 2b41cb7fb7b..5035a3fd604 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3333,9 +3333,15 @@ The integer constant zero.
 A constant whose negation is a signed 16-bit constant.
 @end ifset
 
+@item eF
+A 64-bit scalar constant that can be loaded with the XXSPLTIDP instruction.
+
 @item eI
 A signed 34-bit integer constant if prefixed instructions are supported.
 
+@item eV
+A 128-bit vector constant that can be loaded with the XXSPLTIDP instruction.
+
 @ifset INTERNALS
 @item G
 A floating point constant that can be loaded into a register with one
diff --git a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
index bd1502bb30a..dcb30e1d886 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
@@ -24,11 +24,12 @@ vector signed long long splats4(void)
         return (vector signed long long) vec_sl(mzero, mzero);
 }
 
-/* Codegen will consist of splat and shift instructions for most types.
-   If folding is enabled, the vec_sl tests using vector long long type will
-   generate a lvx instead of a vspltisw+vsld pair.  */
+/* Codegen will consist of splat and shift instructions for most types.  If
+   folding is enabled, the vec_sl tests using vector long long type will
+   generate a lvx instead of a vspltisw+vsld pair.  On power10, it will
+   generate a xxspltidp instruction instead of the lvx.  */
 
 /* { dg-final { scan-assembler-times {\mvspltis[bhw]\M} 0 } } */
 /* { dg-final { scan-assembler-times {\mvsl[bhwd]\M} 0 } } */
-/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M|\mxxspltidp\M} 2 } } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
new file mode 100644
index 00000000000..8f6e176f9af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+double
+scalar_double_0 (void)
+{
+  return 0.0;			/* XXSPLTIB or XXLXOR.  */
+}
+
+double
+scalar_double_1 (void)
+{
+  return 1.0;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+double
+scalar_double_m0 (void)
+{
+  return -0.0;			/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_nan (void)
+{
+  return __builtin_nan ("");	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_inf (void)
+{
+  return __builtin_inf ();	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inf ();
+}
+#endif
+
+double
+scalar_double_pi (void)
+{
+  return M_PI;			/* PLFD.  */
+}
+
+double
+scalar_double_denorm (void)
+{
+  return 0x1p-149f;		/* PLFD.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c
new file mode 100644
index 00000000000..75714d0b11d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c
@@ -0,0 +1,70 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test generating DImode constants that have the same bit pattern as DFmode
+   constants that can be loaded with the XXSPLTIDP instruction with the ISA 3.1
+   (power10).  We use asm to force the value into vector registers.  */
+
+double
+scalar_0 (void)
+{
+  /* XXSPLTIB or XXLXOR.  */
+  double d;
+  long long ll = 0;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+double
+scalar_1 (void)
+{
+  /* VSPLTISW/VUPKLSW or XXSPLTIB/VEXTSB2D.  */
+  double d;
+  long long ll = 1;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated
+   with XXSPLTIDP.  */
+double
+scalar_float_neg_0 (void)
+{
+  /* XXSPLTIDP.  */
+  double d;
+  long long ll = 0x8000000000000000LL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with
+   XXSPLTIDP.  */
+double
+scalar_float_1_0 (void)
+{
+  /* XXSPLTIDP.  */
+  double d;
+  long long ll = 0x3ff0000000000000LL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated
+   with XXSPLTIDP.  */
+double
+scalar_pi (void)
+{
+  /* PLXV.  */
+  double d;
+  long long ll = 0x400921fb54442d18LL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
new file mode 100644
index 00000000000..72504bdfbbd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating SFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+float
+scalar_float_0 (void)
+{
+  return 0.0f;			/* XXSPLTIB or XXLXOR.  */
+}
+
+float
+scalar_float_1 (void)
+{
+  return 1.0f;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+float
+scalar_float_m0 (void)
+{
+  return -0.0f;			/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_nan (void)
+{
+  return __builtin_nanf ("");	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_inf (void)
+{
+  return __builtin_inff ();	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inff ();
+}
+#endif
+
+float
+scalar_float_pi (void)
+{
+  return (float)M_PI;		/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_denorm (void)
+{
+  return 0x1p-149f;		/* PLFS.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
new file mode 100644
index 00000000000..82ffc86f8aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
@@ -0,0 +1,64 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating V2DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+vector double
+v2df_double_0 (void)
+{
+  return (vector double) { 0.0, 0.0 };			/* XXSPLTIB or XXLXOR.  */
+}
+
+vector double
+v2df_double_1 (void)
+{
+  return (vector double) { 1.0, 1.0 };			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+vector double
+v2df_double_m0 (void)
+{
+  return (vector double) { -0.0, -0.0 };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_nan (void)
+{
+  return (vector double) { __builtin_nan (""),
+			   __builtin_nan ("") };	/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_inf (void)
+{
+  return (vector double) { __builtin_inf (),
+			   __builtin_inf () };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_m_inf (void)
+{
+  return (vector double) { - __builtin_inf (),
+			   - __builtin_inf () };	/* XXSPLTIDP.  */
+}
+#endif
+
+vector double
+v2df_double_pi (void)
+{
+  return (vector double) { M_PI, M_PI };		/* PLVX.  */
+}
+
+vector double
+v2df_double_denorm (void)
+{
+  return (vector double) { (double)0x1p-149f,
+			   (double)0x1p-149f };		/* PLVX.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
new file mode 100644
index 00000000000..4d44f943d26
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test generating V2DImode constants that have the same bit pattern as
+   V2DFmode constants that can be loaded with the XXSPLTIDP instruction with
+   the ISA 3.1 (power10).  */
+
+vector long long
+vector_0 (void)
+{
+  /* XXSPLTIB or XXLXOR.  */
+  return (vector long long) { 0LL, 0LL };
+}
+
+vector long long
+vector_1 (void)
+{
+  /* XXSPLTIB and VEXTSB2D.  */
+  return (vector long long) { 1LL, 1LL };
+}
+
+/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated
+   with XXSPLTISDP.  */
+vector long long
+vector_float_neg_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x8000000000000000LL, 0x8000000000000000LL };
+}
+
+/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with
+   XXSPLTISDP.  */
+vector long long
+vector_float_1_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x3ff0000000000000LL, 0x3ff0000000000000LL };
+}
+
+/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated
+   with XXSPLTIDP.  */
+vector long long
+scalar_pi (void)
+{
+  /* PLXV.  */
+  return (vector long long) { 0x400921fb54442d18LL, 0x400921fb54442d18LL };
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [gcc(refs/users/meissner/heads/work070)] Generate XXSPLTIDP on power10.
@ 2021-10-05 21:51 Michael Meissner
  0 siblings, 0 replies; 5+ messages in thread
From: Michael Meissner @ 2021-10-05 21:51 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:dc4b08de574f12f6eefb4eb6cd5f26151b688f6a

commit dc4b08de574f12f6eefb4eb6cd5f26151b688f6a
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Tue Oct 5 17:50:20 2021 -0400

    Generate XXSPLTIDP on power10.
    
    This patch implements XXSPLTIDP support for SF, DF, and DI scalar constants and
    V2DF and V2DI vector constants.  The XXSPLTIDP instruction is given a 32-bit
    immediate that is converted to a vector of two DFmode constants.  The immediate
    is in SFmode format, so only constants that fit as SFmode values can be loaded
    with XXSPLTIDP.
    
    I added two new constraints (eF and eV) to match scalar and vector constants
    that can be loaded with the XXSPLTIDP instruction.
    
    I have added a temporary switch (-mxxspltidp) to control whether or not the
    XXSPLTIDP instruction is generated.
    
    I added 5 new tests to test loading up SF/DF/DI scalar and V2DI/V2DF vector
    constants.
    
    2021-10-05  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/constraints.md (eF): New constraint.
            (eV): New constraint.
            * config/rs6000/predicates.md (easy_fp_constant): If we can load
            the scalar constant with XXSPLTIDP, the constant is easy.
            (easy_fp_constant_64bit_scalar): New predicate.
            (easy_vector_constant_64bit_element): New predicate.
            (easy_vector_constant): If we can generate XXSPLTIDP, mark the
            vector constant as easy.
            * config/rs6000/rs6000-protos.h (xxspltidp_constant_immediate): New
            declaration.
            (prefixed_xxsplti_p): Likewise.
            * config/rs6000/rs6000.c (xxspltidp_constant_immediate): New function.
            (output_vec_const_move): Add support for XXSPLTIDP.
            (prefixed_xxsplti_p): New function.
            * config/rs6000/rs6000.md (prefixed attribute): Add support for the
            xxsplti* prefixed instructions.
            (movsf_hardfloat): Add XXSPLTIDP support.
            (mov<mode>_hardfloat32, FMOVE64 iterator): Likewise.
            (mov<mode>_hardfloat64, FMOVE64 iterator): Likewise.
            (movdi_internal32): Likewise.
            (movdi_internal64): Likewise.
            * config/rs6000/rs6000.opt (-mxxspltidp): New switch.
            * config/rs6000/vsx.md (vsx_move<mode>_64bit): Add XXSPLTIDP
            support.
            (vsx_move<mode>_32bit): Likewise.
            (XXSPLTIDP_S): New mode iterator.
            (XXSPLTIDP_V): Likewise.
            (XXSPLTIDP): Likewise.
            (xxspltidp_<mode>_inst): Replace xxspltidp_v2df_inst with an
            iterated form that also does SFmode, DFmode, DImode, and
            V2DImode.
            (xxspltidp_<mode>_internal): New insn and splits.
            * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
            eF and eV constraints.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/vec-splat-constant-df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-di.c: New test.
            * gcc.target/powerpc/vec-splat-constant-sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2di.c: New test.

Diff:
---
 gcc/config/rs6000/constraints.md                   |  10 ++
 gcc/config/rs6000/predicates.md                    | 140 +++++++++++++++++++++
 gcc/config/rs6000/rs6000-protos.h                  |   2 +
 gcc/config/rs6000/rs6000.c                         |  96 ++++++++++++++
 gcc/config/rs6000/rs6000.md                        |  58 ++++++---
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           |  60 ++++++++-
 gcc/doc/md.texi                                    |   6 +
 .../gcc.target/powerpc/vec-splat-constant-df.c     |  60 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-di.c     |  70 +++++++++++
 .../gcc.target/powerpc/vec-splat-constant-sf.c     |  60 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-v2df.c   |  64 ++++++++++
 .../gcc.target/powerpc/vec-splat-constant-v2di.c   |  50 ++++++++
 13 files changed, 658 insertions(+), 22 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index c8cff1a3038..1ff46c9f4fc 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -208,11 +208,21 @@
   (and (match_code "const_int")
        (match_test "((- (unsigned HOST_WIDE_INT) ival) + 0x8000) < 0x10000")))
 
+;; DI/SF/DF scalar constant that can be loaded with the XXSPLTIDP instruction.
+(define_constraint "eF"
+  "A 64-bit scalar constant that can be loaded with the XXSPLTIDP instruction."
+  (match_operand 0 "easy_fp_constant_64bit_scalar"))
+
 ;; 34-bit signed integer constant
 (define_constraint "eI"
   "A signed 34-bit integer constant if prefixed instructions are supported."
   (match_operand 0 "cint34_operand"))
 
+;; V2DI/V2DF vector constant that can be loaded with the XXSPLTIDP instruction.
+(define_constraint "eV"
+  "A 128-bit vector constant that can be loaded with the XXSPLTIDP instruction."
+  (match_operand 0 "easy_vector_constant_64bit_element"))
+
 ;; Floating-point constraints.  These two are defined so that insn
 ;; length attributes can be calculated exactly.
 
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 956e42bc514..7544ac87700 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -601,6 +601,11 @@
   if (TARGET_VSX && op == CONST0_RTX (mode))
     return 1;
 
+  /* If we have the ISA 3.1 XXSPLTIDP instruction, see if the constant can
+     be loaded with that instruction.  */
+  if (easy_fp_constant_64bit_scalar (op, mode))
+    return 1;
+
   /* Otherwise consider floating point constants hard, so that the
      constant gets pushed to memory during the early RTL phases.  This
      has the advantage that double precision constants that can be
@@ -609,6 +614,138 @@
    return 0;
 })
 
+;; Return 1 if the operand is a 64-bit scalar constant that can be loaded via
+;; the XXSPLTIDP instruction, which takes a SFmode value and produces a V2DF or
+;; V2DI mode result that is interpretted as a 64-bit scalar.
+(define_predicate "easy_fp_constant_64bit_scalar"
+  (match_code "const_int,const_double")
+{
+  const REAL_VALUE_TYPE *rv;
+  REAL_VALUE_TYPE rv_type;
+
+  /* Can we do the XXSPLTIDP instruction?  */
+  if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  if (mode == VOIDmode)
+    mode = GET_MODE (op);
+
+  /* Don't return true for 0.0 or 0 since that is easy to create without
+     XXSPLTIDP.  */
+  if (op == CONST0_RTX (mode))
+    return false;
+
+  /* Handle DImode by creating a DF value from it.  */
+  if (CONST_INT_P (op) && (mode == DImode || mode == VOIDmode))
+    {
+      HOST_WIDE_INT df_value = INTVAL (op);
+
+      /* Avoid values that look like DFmode NaN's.  The IEEE 754 64-bit
+         floating format has 1 bit for sign, 11 bits for the exponent,
+         and 52 bits for the mantissa.  NaN values have the exponent set
+         to all 1 bits, and the mantissa non-zero (mantissa == 0 is
+         infinity).  */
+      int df_exponent = (df_value >> 52) & 0x7ff;
+      HOST_WIDE_INT df_mantissa = df_value & HOST_WIDE_INT_C (0x1fffffffffffff);
+
+      if (df_exponent == 0x7ff && df_mantissa != 0)	/* NaN.  */
+	return false;
+
+      /* Avoid values that are DFmode subnormal values.  Subnormal numbers
+         have the exponent all 0 bits, and the mantissa non-zero.  If the
+         value is subnormal, then the hidden bit in the mantissa is not
+         set.  */
+      if (df_exponent == 0 && df_mantissa != 0)		/* subnormal.  */
+	return false;
+
+      long df_words[2];
+      df_words[0] = (df_value >> 32) & 0xffffffff;
+      df_words[1] = df_value & 0xffffffff;
+
+      /* real_from_target takes the target words in  target order.  */
+      if (!BYTES_BIG_ENDIAN)
+	std::swap (df_words[0], df_words[1]);
+
+      real_from_target (&rv_type, df_words, DFmode);
+      rv = &rv_type;
+    }
+
+  /* Handle SFmode/DFmode constants.  Don't allow decimal or IEEE 128-bit
+     binary constants.  */
+  else if (CONST_DOUBLE_P (op) && (mode == SFmode || mode == DFmode))
+    rv = CONST_DOUBLE_REAL_VALUE (op);
+
+  /* We can't handle anything else with the XXSPLTIDP instruction.  */
+  else
+    return false;  
+
+  /* Validate that the number can be stored as a SFmode value.  */
+  if (!exact_real_truncate (SFmode, rv))
+    return false;
+
+  /* Validate that the number is not a SFmode subnormal value (exponent is 0,
+     mantissa field is non-zero) which is undefined for the XXSPLTIDP
+     instruction.  */
+  long sf_value;
+  real_to_target (&sf_value, rv, SFmode);
+
+  /* IEEE 754 32-bit values have 1 bit for the sign, 8 bits for the exponent,
+     and 23 bits for the mantissa.  Subnormal numbers have the exponent all
+     0 bits, and the mantissa non-zero.  */
+  long sf_exponent = (sf_value >> 23) & 0xFF;
+  long sf_mantissa = sf_value & 0x7FFFFF;
+
+  if (sf_exponent == 0 && sf_mantissa != 0)
+    return false;
+
+  return true;
+})
+
+;; Return 1 if the operand is a 64-bit vector constant that can be loaded via
+;; the XXSPLTIDP instruction, which takes a SFmode value and produces a
+;; V2DFmode or V2DI result.
+;;
+;; We cannot combine the scalar and vector cases because otherwise it is
+;; problematical if we assign an appropriate integer constant to a TImode
+;; value.  I.e.
+;;
+;;	(set (reg:TI 32)
+;;	     (const_int 0x8000000000000000))
+;;
+;; Otherwise, the constant would be splatted into the 2 64-bit positions in the
+;; vector register, and not loaded with the upper 64-bits 0, and the constant
+;; in the lower 64-bits.
+
+(define_predicate "easy_vector_constant_64bit_element"
+  (match_code "const_vector,vec_duplicate")
+{
+  /* Can we do the XXSPLTIDP instruction?  */
+  if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  if (mode == VOIDmode)
+    mode = GET_MODE (op);
+
+  if (mode != V2DFmode && mode != V2DImode)
+    return false;
+
+  if (CONST_VECTOR_P (op))
+    {
+      if (!CONST_VECTOR_DUPLICATE_P (op))
+	return false;
+
+      op = CONST_VECTOR_ELT (op, 0);
+    }
+
+  else if (GET_CODE (op) == VEC_DUPLICATE)
+    op = XEXP (op, 0);
+
+  else
+    return false;
+
+  return easy_fp_constant_64bit_scalar (op, GET_MODE_INNER (mode));
+})
+
 ;; Return 1 if the operand is a constant that can loaded with a XXSPLTIB
 ;; instruction and then a VUPKHSB, VECSB2W or VECSB2D instruction.
 
@@ -653,6 +790,9 @@
       if (zero_constant (op, mode) || all_ones_constant (op, mode))
 	return true;
 
+      if (easy_vector_constant_64bit_element (op, mode))
+	return true;
+
       if (TARGET_P9_VECTOR
           && xxspltib_constant_p (op, mode, &num_insns, &value))
 	return true;
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 14f6b313105..e9be9c4d99f 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -32,6 +32,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, int, int, int,
 
 extern int easy_altivec_constant (rtx, machine_mode);
 extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
+extern long xxspltidp_constant_immediate (rtx, machine_mode);
 extern int vspltis_shifted (rtx);
 extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
 extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
@@ -198,6 +199,7 @@ enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode);
 extern bool prefixed_load_p (rtx_insn *);
 extern bool prefixed_store_p (rtx_insn *);
 extern bool prefixed_paddi_p (rtx_insn *);
+extern bool prefixed_xxsplti_p (rtx_insn *);
 extern void rs6000_asm_output_opcode (FILE *);
 extern void output_pcrel_opt_reloc (rtx);
 extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index ad860728169..83d243269e3 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6946,6 +6946,60 @@ xxspltib_constant_p (rtx op,
   return true;
 }
 
+/* Return the immediate value used in the XXSPLTIDP instruction.  */
+
+long
+xxspltidp_constant_immediate (rtx op, machine_mode mode)
+{
+  long ret;
+
+  /* Handle vectors.  */
+  if (CONST_VECTOR_P (op))
+    {
+      op = CONST_VECTOR_ELT (op, 0);
+      mode = GET_MODE_INNER (mode);
+    }
+
+  else if (GET_CODE (op) == VEC_DUPLICATE)
+    {
+      op = XEXP (op, 0);
+      mode = GET_MODE (op);
+    }
+
+  gcc_assert (easy_fp_constant_64bit_scalar (op, mode));
+
+  /* Handle DImode/V2DImode by creating a DF value from it and then converting
+     the DFmode value to SFmode.  */
+  if (CONST_INT_P (op))
+    {
+      HOST_WIDE_INT df_value = INTVAL (op);
+      long df_words[2];
+
+      df_words[0] = (df_value >> 32) & 0xffffffff;
+      df_words[1] = df_value & 0xffffffff;
+
+      /* real_to_target takes input in target endian order.  */
+      if (!BYTES_BIG_ENDIAN)
+	std::swap (df_words[0], df_words[1]);
+
+      REAL_VALUE_TYPE r;
+      real_from_target (&r, &df_words[0], DFmode);
+      real_to_target (&ret, &r, SFmode);
+    }
+
+  /* For floating point constants, convert to SFmode.  */
+  else if (CONST_DOUBLE_P (op) && (mode == SFmode || mode == DFmode))
+    {
+      const REAL_VALUE_TYPE *rv = CONST_DOUBLE_REAL_VALUE (op);
+      real_to_target (&ret, rv, SFmode);
+    }
+
+  else
+    gcc_unreachable ();
+
+  return ret;
+}
+
 const char *
 output_vec_const_move (rtx *operands)
 {
@@ -6990,6 +7044,13 @@ output_vec_const_move (rtx *operands)
 	    gcc_unreachable ();
 	}
 
+      if (easy_fp_constant_64bit_scalar (vec, mode)
+	  || easy_vector_constant_64bit_element (vec, mode))
+	{
+	  operands[2] = GEN_INT (xxspltidp_constant_immediate (vec, mode));
+	  return "xxspltidp %x0,%2";
+	}
+
       if (TARGET_P9_VECTOR
 	  && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
 	{
@@ -26724,6 +26785,41 @@ prefixed_paddi_p (rtx_insn *insn)
   return (iform == INSN_FORM_PCREL_EXTERNAL || iform == INSN_FORM_PCREL_LOCAL);
 }
 
+/* Whether a permute type instruction is a prefixed XXSPLTI* instruction.
+   This is called from the prefixed attribute processing.  */
+
+bool
+prefixed_xxsplti_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  if (!set)
+    return false;
+
+  rtx dest = SET_DEST (set);
+  rtx src = SET_SRC (set);
+  machine_mode mode = GET_MODE (dest);
+
+  if (!REG_P (dest) && !SUBREG_P (dest))
+    return false;
+
+  switch (mode)
+    {
+    case E_DImode:
+    case E_DFmode:
+    case E_SFmode:
+      return easy_fp_constant_64bit_scalar (src, mode);
+
+    case E_V2DImode:
+    case E_V2DFmode:
+      return easy_vector_constant_64bit_element (src, mode);
+
+    default:
+      break;
+    }
+
+  return false;
+}
+
 /* Whether the next instruction needs a 'p' prefix issued before the
    instruction is printed out.  */
 static bool prepend_p_to_next_insn;
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 6bec2bddbde..8afc4b2756d 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -314,6 +314,11 @@
 
 	 (eq_attr "type" "integer,add")
 	 (if_then_else (match_test "prefixed_paddi_p (insn)")
+		       (const_string "yes")
+		       (const_string "no"))
+
+	 (eq_attr "type" "vecperm")
+	 (if_then_else (match_test "prefixed_xxsplti_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))]
 
@@ -7759,17 +7764,17 @@
 ;;
 ;;	LWZ          LFS        LXSSP       LXSSPX     STFS       STXSSP
 ;;	STXSSPX      STW        XXLXOR      LI         FMR        XSCPSGNDP
-;;	MR           MT<x>      MF<x>       NOP
+;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP
 
 (define_insn "movsf_hardfloat"
   [(set (match_operand:SF 0 "nonimmediate_operand"
 	 "=!r,       f,         v,          wa,        m,         wY,
 	  Z,         m,         wa,         !r,        f,         wa,
-	  !r,        *c*l,      !r,         *h")
+	  !r,        *c*l,      !r,         *h,        wa")
 	(match_operand:SF 1 "input_operand"
 	 "m,         m,         wY,         Z,         f,         v,
 	  wa,        r,         j,          j,         f,         wa,
-	  r,         r,         *h,         0"))]
+	  r,         r,         *h,         0,         eF"))]
   "(register_operand (operands[0], SFmode)
    || register_operand (operands[1], SFmode))
    && TARGET_HARD_FLOAT
@@ -7791,15 +7796,16 @@
    mr %0,%1
    mt%0 %1
    mf%1 %0
-   nop"
+   nop
+   #"
   [(set_attr "type"
 	"load,       fpload,    fpload,     fpload,    fpstore,   fpstore,
 	 fpstore,    store,     veclogical, integer,   fpsimple,  fpsimple,
-	 *,          mtjmpr,    mfjmpr,     *")
+	 *,          mtjmpr,    mfjmpr,     *,         vecperm")
    (set_attr "isa"
 	"*,          *,         p9v,        p8v,       *,         p9v,
 	 p8v,        *,         *,          *,         *,         *,
-	 *,          *,         *,          *")])
+	 *,          *,         *,          *,         p10")])
 
 ;;	LWZ          LFIWZX     STW        STFIWX     MTVSRWZ    MFVSRWZ
 ;;	FMR          MR         MT%0       MF%1       NOP
@@ -8059,18 +8065,18 @@
 
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSD         STXSD       XXLOR       XXLXOR      GPR<-0
-;;           LWZ          STW         MR
+;;           LWZ          STW         MR          XXSPLTIDP
 
 
 (define_insn "*mov<mode>_hardfloat32"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
             "=m,          d,          d,          <f64_p9>,   wY,
               <f64_av>,   Z,          <f64_vsx>,  <f64_vsx>,  !r,
-              Y,          r,          !r")
+              Y,          r,          !r,         wa")
 	(match_operand:FMOVE64 1 "input_operand"
              "d,          m,          d,          wY,         <f64_p9>,
               Z,          <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
-              r,          Y,          r"))]
+              r,          Y,          r,          eF"))]
   "! TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8087,20 +8093,21 @@
    #
    #
    #
+   #
    #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, two,
-             store,       load,       two")
+             store,       load,       two,        vecperm")
    (set_attr "size" "64")
    (set_attr "length"
             "*,           *,          *,          *,          *,
              *,           *,          *,          *,          8,
-             8,           8,          8")
+             8,           8,          8,          *")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
-             *,           *,          *")])
+             *,           *,          *,          p10")])
 
 ;;           STW      LWZ     MR      G-const H-const F-const
 
@@ -8127,19 +8134,19 @@
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSDX        STXSDX      XXLOR       XXLXOR      LI 0
 ;;           STD          LD          MR          MT{CTR,LR}  MF{CTR,LR}
-;;           NOP          MFVSRD      MTVSRD
+;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP
 
 (define_insn "*mov<mode>_hardfloat64"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
            "=m,           d,          d,          <f64_p9>,   wY,
              <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
              YZ,          r,          !r,         *c*l,       !r,
-            *h,           r,          <f64_dm>")
+            *h,           r,          <f64_dm>,   wa")
 	(match_operand:FMOVE64 1 "input_operand"
             "d,           m,          d,          wY,         <f64_p9>,
              Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
              r,           YZ,         r,          r,          *h,
-             0,           <f64_dm>,   r"))]
+             0,           <f64_dm>,   r,          eF"))]
   "TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8161,18 +8168,19 @@
    mf%1 %0
    nop
    mfvsrd %0,%x1
-   mtvsrd %x0,%1"
+   mtvsrd %x0,%1
+   #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, integer,
              store,       load,       *,          mtjmpr,     mfjmpr,
-             *,           mfvsr,      mtvsr")
+             *,           mfvsr,      mtvsr,      vecperm")
    (set_attr "size" "64")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
              *,           *,          *,          *,          *,
-             *,           p8v,        p8v")])
+             *,           p8v,        p8v,        p10")])
 
 ;;           STD      LD       MR      MT<SPR> MF<SPR> G-const
 ;;           H-const  F-const  Special
@@ -9220,6 +9228,7 @@
 ;; a gpr into a fpr instead of reloading an invalid 'Y' address
 
 ;;        GPR store  GPR load   GPR move   FPR store  FPR load   FPR move
+;;	  XXSPLTIDP
 ;;        GPR const  AVX store  AVX store  AVX load   AVX load   VSX move
 ;;        P9 0       P9 -1      AVX 0/-1   VSX 0      VSX -1     P9 const
 ;;        AVX const  
@@ -9227,11 +9236,13 @@
 (define_insn "*movdi_internal32"
   [(set (match_operand:DI 0 "nonimmediate_operand"
          "=Y,        r,         r,         m,         ^d,        ^d,
+          ^wa,
           r,         wY,        Z,         ^v,        $v,        ^wa,
           wa,        wa,        v,         wa,        *i,        v,
           v")
 	(match_operand:DI 1 "input_operand"
          "r,         Y,         r,         ^d,        m,         ^d,
+          eF,
           IJKnF,     ^v,        $v,        wY,        Z,         ^wa,
           Oj,        wM,        OjwM,      Oj,        wM,        wS,
           wB"))]
@@ -9246,6 +9257,7 @@
    lfd%U1%X1 %0,%1
    fmr %0,%1
    #
+   #
    stxsd %1,%0
    stxsdx %x1,%y0
    lxsd %0,%1
@@ -9260,17 +9272,20 @@
    #"
   [(set_attr "type"
          "store,     load,      *,         fpstore,   fpload,    fpsimple,
+          vecperm,
           *,         fpstore,   fpstore,   fpload,    fpload,    veclogical,
           vecsimple, vecsimple, vecsimple, veclogical,veclogical,vecsimple,
           vecsimple")
    (set_attr "size" "64")
    (set_attr "length"
          "8,         8,         8,         *,         *,         *,
+          *,
           16,        *,         *,         *,         *,         *,
           *,         *,         *,         *,         *,         8,
           *")
    (set_attr "isa"
          "*,         *,         *,         *,         *,         *,
+          p10,
           *,         p9v,       p7v,       p9v,       p7v,       *,
           p9v,       p9v,       p7v,       *,         *,         p7v,
           p7v")])
@@ -9306,6 +9321,7 @@
 })
 
 ;;	   GPR store   GPR load    GPR move
+;;	   XXSPLTIDP
 ;;	   GPR li      GPR lis     GPR pli     GPR #
 ;;	   FPR store   FPR load    FPR move
 ;;	   AVX store   AVX store   AVX load    AVX load    VSX move
@@ -9316,6 +9332,7 @@
 (define_insn "*movdi_internal64"
   [(set (match_operand:DI 0 "nonimmediate_operand"
 	  "=YZ,        r,          r,
+	   ^wa,
 	   r,          r,          r,          r,
 	   m,          ^d,         ^d,
 	   wY,         Z,          $v,         $v,         ^wa,
@@ -9325,6 +9342,7 @@
 	   ?r,         ?wa")
 	(match_operand:DI 1 "input_operand"
 	  "r,          YZ,         r,
+	   eF,
 	   I,          L,          eI,         nF,
 	   ^d,         m,          ^d,
 	   ^v,         $v,         wY,         Z,          ^wa,
@@ -9339,6 +9357,7 @@
    std%U0%X0 %1,%0
    ld%U1%X1 %0,%1
    mr %0,%1
+   #
    li %0,%1
    lis %0,%v1
    li %0,%1
@@ -9365,6 +9384,7 @@
    mtvsrd %x0,%1"
   [(set_attr "type"
 	  "store,      load,       *,
+	   vecperm,
 	   *,          *,          *,          *,
 	   fpstore,    fpload,     fpsimple,
 	   fpstore,    fpstore,    fpload,     fpload,     veclogical,
@@ -9375,6 +9395,7 @@
    (set_attr "size" "64")
    (set_attr "length"
 	  "*,          *,          *,
+	   *,
 	   *,          *,          *,          20,
 	   *,          *,          *,
 	   *,          *,          *,          *,          *,
@@ -9384,6 +9405,7 @@
 	   *,          *")
    (set_attr "isa"
 	  "*,          *,          *,
+	   p10,
 	   *,          *,          p10,        *,
 	   *,          *,          *,
 	   p9v,        p7v,        p9v,        p7v,        *,
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 9d7878f144a..1d7ce4cc94a 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -640,6 +640,10 @@ mprivileged
 Target Var(rs6000_privileged) Init(0)
 Generate code that will run in privileged state.
 
+mxxspltidp
+Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
+Generate (do not generate) XXSPLTIDP instructions.
+
 -param=rs6000-density-pct-threshold=
 Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) IntegerRange(0, 100) Param
 When costing for loop vectorization, we probably need to penalize the loop body
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index bf033e31c1c..fa33c9d9fbf 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1191,16 +1191,19 @@
 ;; instruction). But generate XXLXOR/XXLORC if it will avoid a register move.
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
+;;              XXSPLTIDP
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        r,         we,        ?wQ,
+                wa,
                 ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
                 ?wa,       v,         <??r>,     wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        we,        r,         r,
+                eV,
                 wQ,        Y,         r,         r,         wE,        jwM,
                 ?jwM,      W,         <nW>,      v,         wZ"))]
 
@@ -1212,36 +1215,44 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
+                vecperm,
                 store,     load,      store,     *,         vecsimple, vecsimple,
                 vecsimple, *,         *,         vecstore,  vecload")
    (set_attr "num_insns"
                "*,         *,         *,         2,         *,         2,
+                *,
                 2,         2,         2,         2,         *,         *,
                 *,         5,         2,         *,         *")
    (set_attr "max_prefixed_insns"
                "*,         *,         *,         *,         *,         2,
+                *,
                 2,         2,         2,         2,         *,         *,
                 *,         *,         *,         *,         *")
    (set_attr "length"
                "*,         *,         *,         8,         *,         8,
+                *,
                 8,         8,         8,         8,         *,         *,
                 *,         20,        8,         *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
+                p10,
                 *,         *,         *,         *,         p9v,       *,
                 <VSisa>,   *,         *,         *,         *")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
+;;              XXSPLTIDP
 ;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
+                wa,
                 wa,        v,         ?wa,       v,         <??r>,
                 wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        Y,         r,         r,
+                eV,
                 wE,        jwM,       ?jwM,      W,         <nW>,
                 v,         wZ"))]
 
@@ -1253,14 +1264,17 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, load,      store,    *,
+                vecperm,
                 vecsimple, vecsimple, vecsimple, *,         *,
                 vecstore,  vecload")
    (set_attr "length"
                "*,         *,         *,         16,        16,        16,
+                *,
                 *,         *,         *,         20,        16,
                 *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
+                p10,
                 p9v,       *,         <VSisa>,   *,         *,
                 *,         *")])
 
@@ -6449,15 +6463,53 @@
   DONE;
 })
 
-(define_insn "xxspltidp_v2df_inst"
-  [(set (match_operand:V2DF 0 "register_operand" "=wa")
-	(unspec:V2DF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
-		     UNSPEC_XXSPLTIDP))]
+(define_mode_iterator XXSPLTIDP_S [DI SF DF])
+(define_mode_iterator XXSPLTIDP_V [V2DF V2DI])
+(define_mode_iterator XXSPLTIDP   [DI SF DF V2DF V2DI])
+
+(define_insn "xxspltidp_<mode>_inst"
+  [(set (match_operand:XXSPLTIDP 0 "register_operand" "=wa")
+	(unspec:XXSPLTIDP [(match_operand:SI 1 "c32bit_cint_operand" "n")]
+			  UNSPEC_XXSPLTIDP))]
   "TARGET_POWER10"
   "xxspltidp %x0,%1"
   [(set_attr "type" "vecperm")
    (set_attr "prefixed" "yes")])
 
+;; Generate the XXSPLTIDP instruction to support SFmode, DFmode, and DImode
+;; scalar constants and V2DF and V2DI vector constants where both elements are
+;; the same.  The constant has to be expressible as a SFmode constant that is
+;; not a SFmode denormal value.
+(define_insn_and_split "*xxspltidp_<mode>_internal"
+  [(set (match_operand:XXSPLTIDP_S 0 "vsx_register_operand" "=wa")
+	(match_operand:XXSPLTIDP_S 1 "easy_fp_constant_64bit_scalar" "eF"))]
+  "TARGET_POWER10"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(unspec:XXSPLTIDP_S [(match_dup 2)] UNSPEC_XXSPLTIDP))]
+{
+  long immediate = xxspltidp_constant_immediate (operands[1], <MODE>mode);
+  operands[2] = GEN_INT (immediate);
+}
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "yes")])
+
+(define_insn_and_split "*xxspltidp_<mode>_internal"
+  [(set (match_operand:XXSPLTIDP_V 0 "vsx_register_operand" "=wa")
+	(match_operand:XXSPLTIDP_V 1 "easy_vector_constant_64bit_element" "eV"))]
+  "TARGET_POWER10"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(unspec:XXSPLTIDP_V [(match_dup 2)] UNSPEC_XXSPLTIDP))]
+{
+  long immediate = xxspltidp_constant_immediate (operands[1], <MODE>mode);
+  operands[2] = GEN_INT (immediate);
+}
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "yes")])
+
 ;; XXSPLTI32DX built-in function support
 (define_expand "xxsplti32dx_v4si"
   [(set (match_operand:V4SI 0 "register_operand" "=wa")
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 2b41cb7fb7b..5035a3fd604 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3333,9 +3333,15 @@ The integer constant zero.
 A constant whose negation is a signed 16-bit constant.
 @end ifset
 
+@item eF
+A 64-bit scalar constant that can be loaded with the XXSPLTIDP instruction.
+
 @item eI
 A signed 34-bit integer constant if prefixed instructions are supported.
 
+@item eV
+A 128-bit vector constant that can be loaded with the XXSPLTIDP instruction.
+
 @ifset INTERNALS
 @item G
 A floating point constant that can be loaded into a register with one
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
new file mode 100644
index 00000000000..8f6e176f9af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+double
+scalar_double_0 (void)
+{
+  return 0.0;			/* XXSPLTIB or XXLXOR.  */
+}
+
+double
+scalar_double_1 (void)
+{
+  return 1.0;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+double
+scalar_double_m0 (void)
+{
+  return -0.0;			/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_nan (void)
+{
+  return __builtin_nan ("");	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_inf (void)
+{
+  return __builtin_inf ();	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inf ();
+}
+#endif
+
+double
+scalar_double_pi (void)
+{
+  return M_PI;			/* PLFD.  */
+}
+
+double
+scalar_double_denorm (void)
+{
+  return 0x1p-149f;		/* PLFD.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c
new file mode 100644
index 00000000000..75714d0b11d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c
@@ -0,0 +1,70 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test generating DImode constants that have the same bit pattern as DFmode
+   constants that can be loaded with the XXSPLTIDP instruction with the ISA 3.1
+   (power10).  We use asm to force the value into vector registers.  */
+
+double
+scalar_0 (void)
+{
+  /* XXSPLTIB or XXLXOR.  */
+  double d;
+  long long ll = 0;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+double
+scalar_1 (void)
+{
+  /* VSPLTISW/VUPKLSW or XXSPLTIB/VEXTSB2D.  */
+  double d;
+  long long ll = 1;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated
+   with XXSPLTIDP.  */
+double
+scalar_float_neg_0 (void)
+{
+  /* XXSPLTIDP.  */
+  double d;
+  long long ll = 0x8000000000000000LL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with
+   XXSPLTIDP.  */
+double
+scalar_float_1_0 (void)
+{
+  /* XXSPLTIDP.  */
+  double d;
+  long long ll = 0x3ff0000000000000LL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated
+   with XXSPLTIDP.  */
+double
+scalar_pi (void)
+{
+  /* PLXV.  */
+  double d;
+  long long ll = 0x400921fb54442d18LL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
new file mode 100644
index 00000000000..72504bdfbbd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating SFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+float
+scalar_float_0 (void)
+{
+  return 0.0f;			/* XXSPLTIB or XXLXOR.  */
+}
+
+float
+scalar_float_1 (void)
+{
+  return 1.0f;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+float
+scalar_float_m0 (void)
+{
+  return -0.0f;			/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_nan (void)
+{
+  return __builtin_nanf ("");	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_inf (void)
+{
+  return __builtin_inff ();	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inff ();
+}
+#endif
+
+float
+scalar_float_pi (void)
+{
+  return (float)M_PI;		/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_denorm (void)
+{
+  return 0x1p-149f;		/* PLFS.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
new file mode 100644
index 00000000000..82ffc86f8aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
@@ -0,0 +1,64 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating V2DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+vector double
+v2df_double_0 (void)
+{
+  return (vector double) { 0.0, 0.0 };			/* XXSPLTIB or XXLXOR.  */
+}
+
+vector double
+v2df_double_1 (void)
+{
+  return (vector double) { 1.0, 1.0 };			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+vector double
+v2df_double_m0 (void)
+{
+  return (vector double) { -0.0, -0.0 };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_nan (void)
+{
+  return (vector double) { __builtin_nan (""),
+			   __builtin_nan ("") };	/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_inf (void)
+{
+  return (vector double) { __builtin_inf (),
+			   __builtin_inf () };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_m_inf (void)
+{
+  return (vector double) { - __builtin_inf (),
+			   - __builtin_inf () };	/* XXSPLTIDP.  */
+}
+#endif
+
+vector double
+v2df_double_pi (void)
+{
+  return (vector double) { M_PI, M_PI };		/* PLVX.  */
+}
+
+vector double
+v2df_double_denorm (void)
+{
+  return (vector double) { (double)0x1p-149f,
+			   (double)0x1p-149f };		/* PLVX.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
new file mode 100644
index 00000000000..4d44f943d26
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test generating V2DImode constants that have the same bit pattern as
+   V2DFmode constants that can be loaded with the XXSPLTIDP instruction with
+   the ISA 3.1 (power10).  */
+
+vector long long
+vector_0 (void)
+{
+  /* XXSPLTIB or XXLXOR.  */
+  return (vector long long) { 0LL, 0LL };
+}
+
+vector long long
+vector_1 (void)
+{
+  /* XXSPLTIB and VEXTSB2D.  */
+  return (vector long long) { 1LL, 1LL };
+}
+
+/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated
+   with XXSPLTISDP.  */
+vector long long
+vector_float_neg_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x8000000000000000LL, 0x8000000000000000LL };
+}
+
+/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with
+   XXSPLTISDP.  */
+vector long long
+vector_float_1_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x3ff0000000000000LL, 0x3ff0000000000000LL };
+}
+
+/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated
+   with XXSPLTIDP.  */
+vector long long
+scalar_pi (void)
+{
+  /* PLXV.  */
+  return (vector long long) { 0x400921fb54442d18LL, 0x400921fb54442d18LL };
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [gcc(refs/users/meissner/heads/work070)] Generate XXSPLTIDP on power10.
@ 2021-10-05 21:11 Michael Meissner
  0 siblings, 0 replies; 5+ messages in thread
From: Michael Meissner @ 2021-10-05 21:11 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:4544188d46ae061016d44d77dd2b568c48b36d0f

commit 4544188d46ae061016d44d77dd2b568c48b36d0f
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Tue Oct 5 17:10:03 2021 -0400

    Generate XXSPLTIDP on power10.
    
    This patch implements XXSPLTIDP support for SF, DF, and DI scalar constants and
    V2DF and V2DI vector constants.  The XXSPLTIDP instruction is given a 32-bit
    immediate that is converted to a vector of two DFmode constants.  The immediate
    is in SFmode format, so only constants that fit as SFmode values can be loaded
    with XXSPLTIDP.
    
    I added two new constraints (eF and eV) to match scalar and vector constants
    that can be loaded with the XXSPLTIDP instruction.
    
    I have added a temporary switch (-mxxspltidp) to control whether or not the
    XXSPLTIDP instruction is generated.
    
    I added 5 new tests to test loading up SF/DF/DI scalar and V2DI/V2DF vector
    constants.
    
    2021-10-05  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/constraints.md (eF): New constraint.
            (eV): New constraint.
            * config/rs6000/predicates.md (easy_fp_constant): If we can load
            the scalar constant with XXSPLTIDP, the constant is easy.
            (easy_fp_constant_64bit_scalar): New predicate.
            (easy_vector_constant_64bit_element): New predicate.
            (easy_vector_constant): If we can generate XXSPLTIDP, mark the
            vector constant as easy.
            * config/rs6000/rs6000-protos.h (xxspltidp_constant_immediate): New
            declaration.
            (prefixed_xxsplti_p): Likewise.
            * config/rs6000/rs6000.c (xxspltidp_constant_immediate): New function.
            (output_vec_const_move): Add support for XXSPLTIDP.
            (prefixed_xxsplti_p): New function.
            * config/rs6000/rs6000.md (prefixed attribute): Add support for the
            xxsplti* prefixed instructions.
            (movsf_hardfloat): Add XXSPLTIDP support.
            (mov<mode>_hardfloat32, FMOVE64 iterator): Likewise.
            (mov<mode>_hardfloat64, FMOVE64 iterator): Likewise.
            (movdi_internal32): Likewise.
            (movdi_internal64): Likewise.
            * config/rs6000/rs6000.opt (-mxxspltidp): New switch.
            * config/rs6000/vsx.md (vsx_move<mode>_64bit): Add XXSPLTIDP
            support.
            (vsx_move<mode>_32bit): Likewise.
            (XXSPLTIDP_S): New mode iterator.
            (XXSPLTIDP_V): Likewise.
            (XXSPLTIDP): Likewise.
            (xxspltidp_<mode>_inst): Replace xxspltidp_v2df_inst with an
            iterated form that also does SFmode, DFmode, DImode, and
            V2DImode.
            (xxspltidp_<mode>_internal): New insn and splits.
            * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
            eF and eV constraints.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/vec-splat-constant-df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-di.c: New test.
            * gcc.target/powerpc/vec-splat-constant-sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2di.c: New test.

Diff:
---
 gcc/config/rs6000/constraints.md                   |  10 ++
 gcc/config/rs6000/predicates.md                    | 140 +++++++++++++++++++++
 gcc/config/rs6000/rs6000-protos.h                  |   2 +
 gcc/config/rs6000/rs6000.c                         |  96 ++++++++++++++
 gcc/config/rs6000/rs6000.md                        |  58 ++++++---
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           |  60 ++++++++-
 gcc/doc/md.texi                                    |   6 +
 .../gcc.target/powerpc/vec-splat-constant-df.c     |  60 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-di.c     |  70 +++++++++++
 .../gcc.target/powerpc/vec-splat-constant-sf.c     |  60 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-v2df.c   |  64 ++++++++++
 .../gcc.target/powerpc/vec-splat-constant-v2di.c   |  50 ++++++++
 13 files changed, 658 insertions(+), 22 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index c8cff1a3038..1ff46c9f4fc 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -208,11 +208,21 @@
   (and (match_code "const_int")
        (match_test "((- (unsigned HOST_WIDE_INT) ival) + 0x8000) < 0x10000")))
 
+;; DI/SF/DF scalar constant that can be loaded with the XXSPLTIDP instruction.
+(define_constraint "eF"
+  "A 64-bit scalar constant that can be loaded with the XXSPLTIDP instruction."
+  (match_operand 0 "easy_fp_constant_64bit_scalar"))
+
 ;; 34-bit signed integer constant
 (define_constraint "eI"
   "A signed 34-bit integer constant if prefixed instructions are supported."
   (match_operand 0 "cint34_operand"))
 
+;; V2DI/V2DF vector constant that can be loaded with the XXSPLTIDP instruction.
+(define_constraint "eV"
+  "A 128-bit vector constant that can be loaded with the XXSPLTIDP instruction."
+  (match_operand 0 "easy_vector_constant_64bit_element"))
+
 ;; Floating-point constraints.  These two are defined so that insn
 ;; length attributes can be calculated exactly.
 
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 956e42bc514..7544ac87700 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -601,6 +601,11 @@
   if (TARGET_VSX && op == CONST0_RTX (mode))
     return 1;
 
+  /* If we have the ISA 3.1 XXSPLTIDP instruction, see if the constant can
+     be loaded with that instruction.  */
+  if (easy_fp_constant_64bit_scalar (op, mode))
+    return 1;
+
   /* Otherwise consider floating point constants hard, so that the
      constant gets pushed to memory during the early RTL phases.  This
      has the advantage that double precision constants that can be
@@ -609,6 +614,138 @@
    return 0;
 })
 
+;; Return 1 if the operand is a 64-bit scalar constant that can be loaded via
+;; the XXSPLTIDP instruction, which takes a SFmode value and produces a V2DF or
+;; V2DI mode result that is interpretted as a 64-bit scalar.
+(define_predicate "easy_fp_constant_64bit_scalar"
+  (match_code "const_int,const_double")
+{
+  const REAL_VALUE_TYPE *rv;
+  REAL_VALUE_TYPE rv_type;
+
+  /* Can we do the XXSPLTIDP instruction?  */
+  if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  if (mode == VOIDmode)
+    mode = GET_MODE (op);
+
+  /* Don't return true for 0.0 or 0 since that is easy to create without
+     XXSPLTIDP.  */
+  if (op == CONST0_RTX (mode))
+    return false;
+
+  /* Handle DImode by creating a DF value from it.  */
+  if (CONST_INT_P (op) && (mode == DImode || mode == VOIDmode))
+    {
+      HOST_WIDE_INT df_value = INTVAL (op);
+
+      /* Avoid values that look like DFmode NaN's.  The IEEE 754 64-bit
+         floating format has 1 bit for sign, 11 bits for the exponent,
+         and 52 bits for the mantissa.  NaN values have the exponent set
+         to all 1 bits, and the mantissa non-zero (mantissa == 0 is
+         infinity).  */
+      int df_exponent = (df_value >> 52) & 0x7ff;
+      HOST_WIDE_INT df_mantissa = df_value & HOST_WIDE_INT_C (0x1fffffffffffff);
+
+      if (df_exponent == 0x7ff && df_mantissa != 0)	/* NaN.  */
+	return false;
+
+      /* Avoid values that are DFmode subnormal values.  Subnormal numbers
+         have the exponent all 0 bits, and the mantissa non-zero.  If the
+         value is subnormal, then the hidden bit in the mantissa is not
+         set.  */
+      if (df_exponent == 0 && df_mantissa != 0)		/* subnormal.  */
+	return false;
+
+      long df_words[2];
+      df_words[0] = (df_value >> 32) & 0xffffffff;
+      df_words[1] = df_value & 0xffffffff;
+
+      /* real_from_target takes the target words in  target order.  */
+      if (!BYTES_BIG_ENDIAN)
+	std::swap (df_words[0], df_words[1]);
+
+      real_from_target (&rv_type, df_words, DFmode);
+      rv = &rv_type;
+    }
+
+  /* Handle SFmode/DFmode constants.  Don't allow decimal or IEEE 128-bit
+     binary constants.  */
+  else if (CONST_DOUBLE_P (op) && (mode == SFmode || mode == DFmode))
+    rv = CONST_DOUBLE_REAL_VALUE (op);
+
+  /* We can't handle anything else with the XXSPLTIDP instruction.  */
+  else
+    return false;  
+
+  /* Validate that the number can be stored as a SFmode value.  */
+  if (!exact_real_truncate (SFmode, rv))
+    return false;
+
+  /* Validate that the number is not a SFmode subnormal value (exponent is 0,
+     mantissa field is non-zero) which is undefined for the XXSPLTIDP
+     instruction.  */
+  long sf_value;
+  real_to_target (&sf_value, rv, SFmode);
+
+  /* IEEE 754 32-bit values have 1 bit for the sign, 8 bits for the exponent,
+     and 23 bits for the mantissa.  Subnormal numbers have the exponent all
+     0 bits, and the mantissa non-zero.  */
+  long sf_exponent = (sf_value >> 23) & 0xFF;
+  long sf_mantissa = sf_value & 0x7FFFFF;
+
+  if (sf_exponent == 0 && sf_mantissa != 0)
+    return false;
+
+  return true;
+})
+
+;; Return 1 if the operand is a 64-bit vector constant that can be loaded via
+;; the XXSPLTIDP instruction, which takes a SFmode value and produces a
+;; V2DFmode or V2DI result.
+;;
+;; We cannot combine the scalar and vector cases because otherwise it is
+;; problematical if we assign an appropriate integer constant to a TImode
+;; value.  I.e.
+;;
+;;	(set (reg:TI 32)
+;;	     (const_int 0x8000000000000000))
+;;
+;; Otherwise, the constant would be splatted into the 2 64-bit positions in the
+;; vector register, and not loaded with the upper 64-bits 0, and the constant
+;; in the lower 64-bits.
+
+(define_predicate "easy_vector_constant_64bit_element"
+  (match_code "const_vector,vec_duplicate")
+{
+  /* Can we do the XXSPLTIDP instruction?  */
+  if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  if (mode == VOIDmode)
+    mode = GET_MODE (op);
+
+  if (mode != V2DFmode && mode != V2DImode)
+    return false;
+
+  if (CONST_VECTOR_P (op))
+    {
+      if (!CONST_VECTOR_DUPLICATE_P (op))
+	return false;
+
+      op = CONST_VECTOR_ELT (op, 0);
+    }
+
+  else if (GET_CODE (op) == VEC_DUPLICATE)
+    op = XEXP (op, 0);
+
+  else
+    return false;
+
+  return easy_fp_constant_64bit_scalar (op, GET_MODE_INNER (mode));
+})
+
 ;; Return 1 if the operand is a constant that can loaded with a XXSPLTIB
 ;; instruction and then a VUPKHSB, VECSB2W or VECSB2D instruction.
 
@@ -653,6 +790,9 @@
       if (zero_constant (op, mode) || all_ones_constant (op, mode))
 	return true;
 
+      if (easy_vector_constant_64bit_element (op, mode))
+	return true;
+
       if (TARGET_P9_VECTOR
           && xxspltib_constant_p (op, mode, &num_insns, &value))
 	return true;
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 14f6b313105..e9be9c4d99f 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -32,6 +32,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, int, int, int,
 
 extern int easy_altivec_constant (rtx, machine_mode);
 extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
+extern long xxspltidp_constant_immediate (rtx, machine_mode);
 extern int vspltis_shifted (rtx);
 extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
 extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
@@ -198,6 +199,7 @@ enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode);
 extern bool prefixed_load_p (rtx_insn *);
 extern bool prefixed_store_p (rtx_insn *);
 extern bool prefixed_paddi_p (rtx_insn *);
+extern bool prefixed_xxsplti_p (rtx_insn *);
 extern void rs6000_asm_output_opcode (FILE *);
 extern void output_pcrel_opt_reloc (rtx);
 extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index ad860728169..83d243269e3 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6946,6 +6946,60 @@ xxspltib_constant_p (rtx op,
   return true;
 }
 
+/* Return the immediate value used in the XXSPLTIDP instruction.  */
+
+long
+xxspltidp_constant_immediate (rtx op, machine_mode mode)
+{
+  long ret;
+
+  /* Handle vectors.  */
+  if (CONST_VECTOR_P (op))
+    {
+      op = CONST_VECTOR_ELT (op, 0);
+      mode = GET_MODE_INNER (mode);
+    }
+
+  else if (GET_CODE (op) == VEC_DUPLICATE)
+    {
+      op = XEXP (op, 0);
+      mode = GET_MODE (op);
+    }
+
+  gcc_assert (easy_fp_constant_64bit_scalar (op, mode));
+
+  /* Handle DImode/V2DImode by creating a DF value from it and then converting
+     the DFmode value to SFmode.  */
+  if (CONST_INT_P (op))
+    {
+      HOST_WIDE_INT df_value = INTVAL (op);
+      long df_words[2];
+
+      df_words[0] = (df_value >> 32) & 0xffffffff;
+      df_words[1] = df_value & 0xffffffff;
+
+      /* real_to_target takes input in target endian order.  */
+      if (!BYTES_BIG_ENDIAN)
+	std::swap (df_words[0], df_words[1]);
+
+      REAL_VALUE_TYPE r;
+      real_from_target (&r, &df_words[0], DFmode);
+      real_to_target (&ret, &r, SFmode);
+    }
+
+  /* For floating point constants, convert to SFmode.  */
+  else if (CONST_DOUBLE_P (op) && (mode == SFmode || mode == DFmode))
+    {
+      const REAL_VALUE_TYPE *rv = CONST_DOUBLE_REAL_VALUE (op);
+      real_to_target (&ret, rv, SFmode);
+    }
+
+  else
+    gcc_unreachable ();
+
+  return ret;
+}
+
 const char *
 output_vec_const_move (rtx *operands)
 {
@@ -6990,6 +7044,13 @@ output_vec_const_move (rtx *operands)
 	    gcc_unreachable ();
 	}
 
+      if (easy_fp_constant_64bit_scalar (vec, mode)
+	  || easy_vector_constant_64bit_element (vec, mode))
+	{
+	  operands[2] = GEN_INT (xxspltidp_constant_immediate (vec, mode));
+	  return "xxspltidp %x0,%2";
+	}
+
       if (TARGET_P9_VECTOR
 	  && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
 	{
@@ -26724,6 +26785,41 @@ prefixed_paddi_p (rtx_insn *insn)
   return (iform == INSN_FORM_PCREL_EXTERNAL || iform == INSN_FORM_PCREL_LOCAL);
 }
 
+/* Whether a permute type instruction is a prefixed XXSPLTI* instruction.
+   This is called from the prefixed attribute processing.  */
+
+bool
+prefixed_xxsplti_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  if (!set)
+    return false;
+
+  rtx dest = SET_DEST (set);
+  rtx src = SET_SRC (set);
+  machine_mode mode = GET_MODE (dest);
+
+  if (!REG_P (dest) && !SUBREG_P (dest))
+    return false;
+
+  switch (mode)
+    {
+    case E_DImode:
+    case E_DFmode:
+    case E_SFmode:
+      return easy_fp_constant_64bit_scalar (src, mode);
+
+    case E_V2DImode:
+    case E_V2DFmode:
+      return easy_vector_constant_64bit_element (src, mode);
+
+    default:
+      break;
+    }
+
+  return false;
+}
+
 /* Whether the next instruction needs a 'p' prefix issued before the
    instruction is printed out.  */
 static bool prepend_p_to_next_insn;
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 6bec2bddbde..8afc4b2756d 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -314,6 +314,11 @@
 
 	 (eq_attr "type" "integer,add")
 	 (if_then_else (match_test "prefixed_paddi_p (insn)")
+		       (const_string "yes")
+		       (const_string "no"))
+
+	 (eq_attr "type" "vecperm")
+	 (if_then_else (match_test "prefixed_xxsplti_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))]
 
@@ -7759,17 +7764,17 @@
 ;;
 ;;	LWZ          LFS        LXSSP       LXSSPX     STFS       STXSSP
 ;;	STXSSPX      STW        XXLXOR      LI         FMR        XSCPSGNDP
-;;	MR           MT<x>      MF<x>       NOP
+;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP
 
 (define_insn "movsf_hardfloat"
   [(set (match_operand:SF 0 "nonimmediate_operand"
 	 "=!r,       f,         v,          wa,        m,         wY,
 	  Z,         m,         wa,         !r,        f,         wa,
-	  !r,        *c*l,      !r,         *h")
+	  !r,        *c*l,      !r,         *h,        wa")
 	(match_operand:SF 1 "input_operand"
 	 "m,         m,         wY,         Z,         f,         v,
 	  wa,        r,         j,          j,         f,         wa,
-	  r,         r,         *h,         0"))]
+	  r,         r,         *h,         0,         eF"))]
   "(register_operand (operands[0], SFmode)
    || register_operand (operands[1], SFmode))
    && TARGET_HARD_FLOAT
@@ -7791,15 +7796,16 @@
    mr %0,%1
    mt%0 %1
    mf%1 %0
-   nop"
+   nop
+   #"
   [(set_attr "type"
 	"load,       fpload,    fpload,     fpload,    fpstore,   fpstore,
 	 fpstore,    store,     veclogical, integer,   fpsimple,  fpsimple,
-	 *,          mtjmpr,    mfjmpr,     *")
+	 *,          mtjmpr,    mfjmpr,     *,         vecperm")
    (set_attr "isa"
 	"*,          *,         p9v,        p8v,       *,         p9v,
 	 p8v,        *,         *,          *,         *,         *,
-	 *,          *,         *,          *")])
+	 *,          *,         *,          *,         p10")])
 
 ;;	LWZ          LFIWZX     STW        STFIWX     MTVSRWZ    MFVSRWZ
 ;;	FMR          MR         MT%0       MF%1       NOP
@@ -8059,18 +8065,18 @@
 
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSD         STXSD       XXLOR       XXLXOR      GPR<-0
-;;           LWZ          STW         MR
+;;           LWZ          STW         MR          XXSPLTIDP
 
 
 (define_insn "*mov<mode>_hardfloat32"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
             "=m,          d,          d,          <f64_p9>,   wY,
               <f64_av>,   Z,          <f64_vsx>,  <f64_vsx>,  !r,
-              Y,          r,          !r")
+              Y,          r,          !r,         wa")
 	(match_operand:FMOVE64 1 "input_operand"
              "d,          m,          d,          wY,         <f64_p9>,
               Z,          <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
-              r,          Y,          r"))]
+              r,          Y,          r,          eF"))]
   "! TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8087,20 +8093,21 @@
    #
    #
    #
+   #
    #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, two,
-             store,       load,       two")
+             store,       load,       two,        vecperm")
    (set_attr "size" "64")
    (set_attr "length"
             "*,           *,          *,          *,          *,
              *,           *,          *,          *,          8,
-             8,           8,          8")
+             8,           8,          8,          *")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
-             *,           *,          *")])
+             *,           *,          *,          p10")])
 
 ;;           STW      LWZ     MR      G-const H-const F-const
 
@@ -8127,19 +8134,19 @@
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSDX        STXSDX      XXLOR       XXLXOR      LI 0
 ;;           STD          LD          MR          MT{CTR,LR}  MF{CTR,LR}
-;;           NOP          MFVSRD      MTVSRD
+;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP
 
 (define_insn "*mov<mode>_hardfloat64"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
            "=m,           d,          d,          <f64_p9>,   wY,
              <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
              YZ,          r,          !r,         *c*l,       !r,
-            *h,           r,          <f64_dm>")
+            *h,           r,          <f64_dm>,   wa")
 	(match_operand:FMOVE64 1 "input_operand"
             "d,           m,          d,          wY,         <f64_p9>,
              Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
              r,           YZ,         r,          r,          *h,
-             0,           <f64_dm>,   r"))]
+             0,           <f64_dm>,   r,          eF"))]
   "TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8161,18 +8168,19 @@
    mf%1 %0
    nop
    mfvsrd %0,%x1
-   mtvsrd %x0,%1"
+   mtvsrd %x0,%1
+   #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, integer,
              store,       load,       *,          mtjmpr,     mfjmpr,
-             *,           mfvsr,      mtvsr")
+             *,           mfvsr,      mtvsr,      vecperm")
    (set_attr "size" "64")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
              *,           *,          *,          *,          *,
-             *,           p8v,        p8v")])
+             *,           p8v,        p8v,        p10")])
 
 ;;           STD      LD       MR      MT<SPR> MF<SPR> G-const
 ;;           H-const  F-const  Special
@@ -9220,6 +9228,7 @@
 ;; a gpr into a fpr instead of reloading an invalid 'Y' address
 
 ;;        GPR store  GPR load   GPR move   FPR store  FPR load   FPR move
+;;	  XXSPLTIDP
 ;;        GPR const  AVX store  AVX store  AVX load   AVX load   VSX move
 ;;        P9 0       P9 -1      AVX 0/-1   VSX 0      VSX -1     P9 const
 ;;        AVX const  
@@ -9227,11 +9236,13 @@
 (define_insn "*movdi_internal32"
   [(set (match_operand:DI 0 "nonimmediate_operand"
          "=Y,        r,         r,         m,         ^d,        ^d,
+          ^wa,
           r,         wY,        Z,         ^v,        $v,        ^wa,
           wa,        wa,        v,         wa,        *i,        v,
           v")
 	(match_operand:DI 1 "input_operand"
          "r,         Y,         r,         ^d,        m,         ^d,
+          eF,
           IJKnF,     ^v,        $v,        wY,        Z,         ^wa,
           Oj,        wM,        OjwM,      Oj,        wM,        wS,
           wB"))]
@@ -9246,6 +9257,7 @@
    lfd%U1%X1 %0,%1
    fmr %0,%1
    #
+   #
    stxsd %1,%0
    stxsdx %x1,%y0
    lxsd %0,%1
@@ -9260,17 +9272,20 @@
    #"
   [(set_attr "type"
          "store,     load,      *,         fpstore,   fpload,    fpsimple,
+          vecperm,
           *,         fpstore,   fpstore,   fpload,    fpload,    veclogical,
           vecsimple, vecsimple, vecsimple, veclogical,veclogical,vecsimple,
           vecsimple")
    (set_attr "size" "64")
    (set_attr "length"
          "8,         8,         8,         *,         *,         *,
+          *,
           16,        *,         *,         *,         *,         *,
           *,         *,         *,         *,         *,         8,
           *")
    (set_attr "isa"
          "*,         *,         *,         *,         *,         *,
+          p10,
           *,         p9v,       p7v,       p9v,       p7v,       *,
           p9v,       p9v,       p7v,       *,         *,         p7v,
           p7v")])
@@ -9306,6 +9321,7 @@
 })
 
 ;;	   GPR store   GPR load    GPR move
+;;	   XXSPLTIDP
 ;;	   GPR li      GPR lis     GPR pli     GPR #
 ;;	   FPR store   FPR load    FPR move
 ;;	   AVX store   AVX store   AVX load    AVX load    VSX move
@@ -9316,6 +9332,7 @@
 (define_insn "*movdi_internal64"
   [(set (match_operand:DI 0 "nonimmediate_operand"
 	  "=YZ,        r,          r,
+	   ^wa,
 	   r,          r,          r,          r,
 	   m,          ^d,         ^d,
 	   wY,         Z,          $v,         $v,         ^wa,
@@ -9325,6 +9342,7 @@
 	   ?r,         ?wa")
 	(match_operand:DI 1 "input_operand"
 	  "r,          YZ,         r,
+	   eF,
 	   I,          L,          eI,         nF,
 	   ^d,         m,          ^d,
 	   ^v,         $v,         wY,         Z,          ^wa,
@@ -9339,6 +9357,7 @@
    std%U0%X0 %1,%0
    ld%U1%X1 %0,%1
    mr %0,%1
+   #
    li %0,%1
    lis %0,%v1
    li %0,%1
@@ -9365,6 +9384,7 @@
    mtvsrd %x0,%1"
   [(set_attr "type"
 	  "store,      load,       *,
+	   vecperm,
 	   *,          *,          *,          *,
 	   fpstore,    fpload,     fpsimple,
 	   fpstore,    fpstore,    fpload,     fpload,     veclogical,
@@ -9375,6 +9395,7 @@
    (set_attr "size" "64")
    (set_attr "length"
 	  "*,          *,          *,
+	   *,
 	   *,          *,          *,          20,
 	   *,          *,          *,
 	   *,          *,          *,          *,          *,
@@ -9384,6 +9405,7 @@
 	   *,          *")
    (set_attr "isa"
 	  "*,          *,          *,
+	   p10,
 	   *,          *,          p10,        *,
 	   *,          *,          *,
 	   p9v,        p7v,        p9v,        p7v,        *,
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 9d7878f144a..1d7ce4cc94a 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -640,6 +640,10 @@ mprivileged
 Target Var(rs6000_privileged) Init(0)
 Generate code that will run in privileged state.
 
+mxxspltidp
+Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
+Generate (do not generate) XXSPLTIDP instructions.
+
 -param=rs6000-density-pct-threshold=
 Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) IntegerRange(0, 100) Param
 When costing for loop vectorization, we probably need to penalize the loop body
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index bf033e31c1c..fa33c9d9fbf 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1191,16 +1191,19 @@
 ;; instruction). But generate XXLXOR/XXLORC if it will avoid a register move.
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
+;;              XXSPLTIDP
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        r,         we,        ?wQ,
+                wa,
                 ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
                 ?wa,       v,         <??r>,     wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        we,        r,         r,
+                eV,
                 wQ,        Y,         r,         r,         wE,        jwM,
                 ?jwM,      W,         <nW>,      v,         wZ"))]
 
@@ -1212,36 +1215,44 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
+                vecperm,
                 store,     load,      store,     *,         vecsimple, vecsimple,
                 vecsimple, *,         *,         vecstore,  vecload")
    (set_attr "num_insns"
                "*,         *,         *,         2,         *,         2,
+                *,
                 2,         2,         2,         2,         *,         *,
                 *,         5,         2,         *,         *")
    (set_attr "max_prefixed_insns"
                "*,         *,         *,         *,         *,         2,
+                *,
                 2,         2,         2,         2,         *,         *,
                 *,         *,         *,         *,         *")
    (set_attr "length"
                "*,         *,         *,         8,         *,         8,
+                *,
                 8,         8,         8,         8,         *,         *,
                 *,         20,        8,         *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
+                p10,
                 *,         *,         *,         *,         p9v,       *,
                 <VSisa>,   *,         *,         *,         *")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
+;;              XXSPLTIDP
 ;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
+                wa,
                 wa,        v,         ?wa,       v,         <??r>,
                 wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        Y,         r,         r,
+                eV,
                 wE,        jwM,       ?jwM,      W,         <nW>,
                 v,         wZ"))]
 
@@ -1253,14 +1264,17 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, load,      store,    *,
+                vecperm,
                 vecsimple, vecsimple, vecsimple, *,         *,
                 vecstore,  vecload")
    (set_attr "length"
                "*,         *,         *,         16,        16,        16,
+                *,
                 *,         *,         *,         20,        16,
                 *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
+                p10,
                 p9v,       *,         <VSisa>,   *,         *,
                 *,         *")])
 
@@ -6449,15 +6463,53 @@
   DONE;
 })
 
-(define_insn "xxspltidp_v2df_inst"
-  [(set (match_operand:V2DF 0 "register_operand" "=wa")
-	(unspec:V2DF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
-		     UNSPEC_XXSPLTIDP))]
+(define_mode_iterator XXSPLTIDP_S [DI SF DF])
+(define_mode_iterator XXSPLTIDP_V [V2DF V2DI])
+(define_mode_iterator XXSPLTIDP   [DI SF DF V2DF V2DI])
+
+(define_insn "xxspltidp_<mode>_inst"
+  [(set (match_operand:XXSPLTIDP 0 "register_operand" "=wa")
+	(unspec:XXSPLTIDP [(match_operand:SI 1 "c32bit_cint_operand" "n")]
+			  UNSPEC_XXSPLTIDP))]
   "TARGET_POWER10"
   "xxspltidp %x0,%1"
   [(set_attr "type" "vecperm")
    (set_attr "prefixed" "yes")])
 
+;; Generate the XXSPLTIDP instruction to support SFmode, DFmode, and DImode
+;; scalar constants and V2DF and V2DI vector constants where both elements are
+;; the same.  The constant has to be expressible as a SFmode constant that is
+;; not a SFmode denormal value.
+(define_insn_and_split "*xxspltidp_<mode>_internal"
+  [(set (match_operand:XXSPLTIDP_S 0 "vsx_register_operand" "=wa")
+	(match_operand:XXSPLTIDP_S 1 "easy_fp_constant_64bit_scalar" "eF"))]
+  "TARGET_POWER10"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(unspec:XXSPLTIDP_S [(match_dup 2)] UNSPEC_XXSPLTIDP))]
+{
+  long immediate = xxspltidp_constant_immediate (operands[1], <MODE>mode);
+  operands[2] = GEN_INT (immediate);
+}
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "yes")])
+
+(define_insn_and_split "*xxspltidp_<mode>_internal"
+  [(set (match_operand:XXSPLTIDP_V 0 "vsx_register_operand" "=wa")
+	(match_operand:XXSPLTIDP_V 1 "easy_vector_constant_64bit_element" "eV"))]
+  "TARGET_POWER10"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(unspec:XXSPLTIDP_V [(match_dup 2)] UNSPEC_XXSPLTIDP))]
+{
+  long immediate = xxspltidp_constant_immediate (operands[1], <MODE>mode);
+  operands[2] = GEN_INT (immediate);
+}
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "yes")])
+
 ;; XXSPLTI32DX built-in function support
 (define_expand "xxsplti32dx_v4si"
   [(set (match_operand:V4SI 0 "register_operand" "=wa")
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 2b41cb7fb7b..5035a3fd604 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3333,9 +3333,15 @@ The integer constant zero.
 A constant whose negation is a signed 16-bit constant.
 @end ifset
 
+@item eF
+A 64-bit scalar constant that can be loaded with the XXSPLTIDP instruction.
+
 @item eI
 A signed 34-bit integer constant if prefixed instructions are supported.
 
+@item eV
+A 128-bit vector constant that can be loaded with the XXSPLTIDP instruction.
+
 @ifset INTERNALS
 @item G
 A floating point constant that can be loaded into a register with one
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
new file mode 100644
index 00000000000..8f6e176f9af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+double
+scalar_double_0 (void)
+{
+  return 0.0;			/* XXSPLTIB or XXLXOR.  */
+}
+
+double
+scalar_double_1 (void)
+{
+  return 1.0;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+double
+scalar_double_m0 (void)
+{
+  return -0.0;			/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_nan (void)
+{
+  return __builtin_nan ("");	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_inf (void)
+{
+  return __builtin_inf ();	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inf ();
+}
+#endif
+
+double
+scalar_double_pi (void)
+{
+  return M_PI;			/* PLFD.  */
+}
+
+double
+scalar_double_denorm (void)
+{
+  return 0x1p-149f;		/* PLFD.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c
new file mode 100644
index 00000000000..75714d0b11d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c
@@ -0,0 +1,70 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test generating DImode constants that have the same bit pattern as DFmode
+   constants that can be loaded with the XXSPLTIDP instruction with the ISA 3.1
+   (power10).  We use asm to force the value into vector registers.  */
+
+double
+scalar_0 (void)
+{
+  /* XXSPLTIB or XXLXOR.  */
+  double d;
+  long long ll = 0;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+double
+scalar_1 (void)
+{
+  /* VSPLTISW/VUPKLSW or XXSPLTIB/VEXTSB2D.  */
+  double d;
+  long long ll = 1;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated
+   with XXSPLTIDP.  */
+double
+scalar_float_neg_0 (void)
+{
+  /* XXSPLTIDP.  */
+  double d;
+  long long ll = 0x8000000000000000LL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with
+   XXSPLTIDP.  */
+double
+scalar_float_1_0 (void)
+{
+  /* XXSPLTIDP.  */
+  double d;
+  long long ll = 0x3ff0000000000000LL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated
+   with XXSPLTIDP.  */
+double
+scalar_pi (void)
+{
+  /* PLXV.  */
+  double d;
+  long long ll = 0x400921fb54442d18LL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
new file mode 100644
index 00000000000..72504bdfbbd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating SFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+float
+scalar_float_0 (void)
+{
+  return 0.0f;			/* XXSPLTIB or XXLXOR.  */
+}
+
+float
+scalar_float_1 (void)
+{
+  return 1.0f;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+float
+scalar_float_m0 (void)
+{
+  return -0.0f;			/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_nan (void)
+{
+  return __builtin_nanf ("");	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_inf (void)
+{
+  return __builtin_inff ();	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inff ();
+}
+#endif
+
+float
+scalar_float_pi (void)
+{
+  return (float)M_PI;		/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_denorm (void)
+{
+  return 0x1p-149f;		/* PLFS.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
new file mode 100644
index 00000000000..82ffc86f8aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
@@ -0,0 +1,64 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating V2DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+vector double
+v2df_double_0 (void)
+{
+  return (vector double) { 0.0, 0.0 };			/* XXSPLTIB or XXLXOR.  */
+}
+
+vector double
+v2df_double_1 (void)
+{
+  return (vector double) { 1.0, 1.0 };			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+vector double
+v2df_double_m0 (void)
+{
+  return (vector double) { -0.0, -0.0 };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_nan (void)
+{
+  return (vector double) { __builtin_nan (""),
+			   __builtin_nan ("") };	/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_inf (void)
+{
+  return (vector double) { __builtin_inf (),
+			   __builtin_inf () };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_m_inf (void)
+{
+  return (vector double) { - __builtin_inf (),
+			   - __builtin_inf () };	/* XXSPLTIDP.  */
+}
+#endif
+
+vector double
+v2df_double_pi (void)
+{
+  return (vector double) { M_PI, M_PI };		/* PLVX.  */
+}
+
+vector double
+v2df_double_denorm (void)
+{
+  return (vector double) { (double)0x1p-149f,
+			   (double)0x1p-149f };		/* PLVX.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
new file mode 100644
index 00000000000..4d44f943d26
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test generating V2DImode constants that have the same bit pattern as
+   V2DFmode constants that can be loaded with the XXSPLTIDP instruction with
+   the ISA 3.1 (power10).  */
+
+vector long long
+vector_0 (void)
+{
+  /* XXSPLTIB or XXLXOR.  */
+  return (vector long long) { 0LL, 0LL };
+}
+
+vector long long
+vector_1 (void)
+{
+  /* XXSPLTIB and VEXTSB2D.  */
+  return (vector long long) { 1LL, 1LL };
+}
+
+/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated
+   with XXSPLTISDP.  */
+vector long long
+vector_float_neg_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x8000000000000000LL, 0x8000000000000000LL };
+}
+
+/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with
+   XXSPLTISDP.  */
+vector long long
+vector_float_1_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x3ff0000000000000LL, 0x3ff0000000000000LL };
+}
+
+/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated
+   with XXSPLTIDP.  */
+vector long long
+scalar_pi (void)
+{
+  /* PLXV.  */
+  return (vector long long) { 0x400921fb54442d18LL, 0x400921fb54442d18LL };
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [gcc(refs/users/meissner/heads/work070)] Generate XXSPLTIDP on power10.
@ 2021-10-04 20:17 Michael Meissner
  0 siblings, 0 replies; 5+ messages in thread
From: Michael Meissner @ 2021-10-04 20:17 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:f62b92fd93d8124fe9773cd3202de6bbf117d656

commit f62b92fd93d8124fe9773cd3202de6bbf117d656
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Mon Oct 4 16:16:58 2021 -0400

    Generate XXSPLTIDP on power10.
    
    This patch implements XXSPLTIDP support for SF, DF, and DI scalar constants and
    V2DF and V2DI vector constants.  The XXSPLTIDP instruction is given a 32-bit
    immediate that is converted to a vector of two DFmode constants.  The immediate
    is in SFmode format, so only constants that fit as SFmode values can be loaded
    with XXSPLTIDP.
    
    I added two new constraints (eF and eV) to match scalar and vector constants
    that can be loaded with the XXSPLTIDP instruction.
    
    I have added a temporary switch (-mxxspltidp) to control whether or not the
    XXSPLTIDP instruction is generated.
    
    I added 5 new tests to test loading up SF/DF/DI scalar and V2DI/V2DF vector
    constants.
    
    2021-10-04  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/constraints.md (eF): New constraint.
            (eV): New constraint.
            * config/rs6000/predicates.md (easy_fp_constant): If we can load
            the scalar constant with XXSPLTIDP, the constant is easy.
            (easy_fp_constant_64bit_scalar): New predicate.
            (easy_vector_constant_64bit_element): New predicate.
            (easy_vector_constant): If we can generate XXSPLTIDP, mark the
            vector constant as easy.
            * config/rs6000/rs6000-protos.h (xxspltidp_constant_immediate): New
            declaration.
            (prefixed_xxsplti_p): Likewise.
            * config/rs6000/rs6000.c (xxspltidp_constant_immediate): New function.
            (output_vec_const_move): Add support for XXSPLTIDP.
            (prefixed_xxsplti_p): New function.
            * config/rs6000/rs6000.md (prefixed attribute): Add support for the
            xxsplti* prefixed instructions.
            (movsf_hardfloat): Add XXSPLTIDP support.
            (mov<mode>_hardfloat32, FMOVE64 iterator): Likewise.
            (mov<mode>_hardfloat64, FMOVE64 iterator): Likewise.
            (movdi_internal32): Likewise.
            (movdi_internal64): Likewise.
            * config/rs6000/rs6000.opt (-mxxspltidp): New switch.
            * config/rs6000/vsx.md (vsx_move<mode>_64bit): Add XXSPLTIDP
            support.
            (vsx_move<mode>_32bit): Likewise.
            (XXSPLTIDP_S): New mode iterator.
            (XXSPLTIDP_V): Likewise.
            (XXSPLTIDP): Likewise.
            (xxspltidp_<mode>_inst): Replace xxspltidp_v2df_inst with an
            iterated form that also does SFmode, DFmode, DImode, and
            V2DImode.
            (xxspltidp_<mode>_internal): New insn and splits.
            * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
            eF and eV constraints.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/vec-splat-constant-df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-di.c: New test.
            * gcc.target/powerpc/vec-splat-constant-sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2di.c: New test.

Diff:
---
 gcc/config/rs6000/constraints.md                   |  10 ++
 gcc/config/rs6000/predicates.md                    | 140 +++++++++++++++++++++
 gcc/config/rs6000/rs6000-protos.h                  |   2 +
 gcc/config/rs6000/rs6000.c                         |  96 ++++++++++++++
 gcc/config/rs6000/rs6000.md                        |  58 ++++++---
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           |  60 ++++++++-
 gcc/doc/md.texi                                    |   6 +
 .../gcc.target/powerpc/vec-splat-constant-df.c     |  60 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-di.c     |  70 +++++++++++
 .../gcc.target/powerpc/vec-splat-constant-sf.c     |  60 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-v2df.c   |  64 ++++++++++
 .../gcc.target/powerpc/vec-splat-constant-v2di.c   |  50 ++++++++
 13 files changed, 658 insertions(+), 22 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index c8cff1a3038..1ff46c9f4fc 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -208,11 +208,21 @@
   (and (match_code "const_int")
        (match_test "((- (unsigned HOST_WIDE_INT) ival) + 0x8000) < 0x10000")))
 
+;; DI/SF/DF scalar constant that can be loaded with the XXSPLTIDP instruction.
+(define_constraint "eF"
+  "A 64-bit scalar constant that can be loaded with the XXSPLTIDP instruction."
+  (match_operand 0 "easy_fp_constant_64bit_scalar"))
+
 ;; 34-bit signed integer constant
 (define_constraint "eI"
   "A signed 34-bit integer constant if prefixed instructions are supported."
   (match_operand 0 "cint34_operand"))
 
+;; V2DI/V2DF vector constant that can be loaded with the XXSPLTIDP instruction.
+(define_constraint "eV"
+  "A 128-bit vector constant that can be loaded with the XXSPLTIDP instruction."
+  (match_operand 0 "easy_vector_constant_64bit_element"))
+
 ;; Floating-point constraints.  These two are defined so that insn
 ;; length attributes can be calculated exactly.
 
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 956e42bc514..7544ac87700 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -601,6 +601,11 @@
   if (TARGET_VSX && op == CONST0_RTX (mode))
     return 1;
 
+  /* If we have the ISA 3.1 XXSPLTIDP instruction, see if the constant can
+     be loaded with that instruction.  */
+  if (easy_fp_constant_64bit_scalar (op, mode))
+    return 1;
+
   /* Otherwise consider floating point constants hard, so that the
      constant gets pushed to memory during the early RTL phases.  This
      has the advantage that double precision constants that can be
@@ -609,6 +614,138 @@
    return 0;
 })
 
+;; Return 1 if the operand is a 64-bit scalar constant that can be loaded via
+;; the XXSPLTIDP instruction, which takes a SFmode value and produces a V2DF or
+;; V2DI mode result that is interpretted as a 64-bit scalar.
+(define_predicate "easy_fp_constant_64bit_scalar"
+  (match_code "const_int,const_double")
+{
+  const REAL_VALUE_TYPE *rv;
+  REAL_VALUE_TYPE rv_type;
+
+  /* Can we do the XXSPLTIDP instruction?  */
+  if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  if (mode == VOIDmode)
+    mode = GET_MODE (op);
+
+  /* Don't return true for 0.0 or 0 since that is easy to create without
+     XXSPLTIDP.  */
+  if (op == CONST0_RTX (mode))
+    return false;
+
+  /* Handle DImode by creating a DF value from it.  */
+  if (CONST_INT_P (op) && (mode == DImode || mode == VOIDmode))
+    {
+      HOST_WIDE_INT df_value = INTVAL (op);
+
+      /* Avoid values that look like DFmode NaN's.  The IEEE 754 64-bit
+         floating format has 1 bit for sign, 11 bits for the exponent,
+         and 52 bits for the mantissa.  NaN values have the exponent set
+         to all 1 bits, and the mantissa non-zero (mantissa == 0 is
+         infinity).  */
+      int df_exponent = (df_value >> 52) & 0x7ff;
+      HOST_WIDE_INT df_mantissa = df_value & HOST_WIDE_INT_C (0x1fffffffffffff);
+
+      if (df_exponent == 0x7ff && df_mantissa != 0)	/* NaN.  */
+	return false;
+
+      /* Avoid values that are DFmode subnormal values.  Subnormal numbers
+         have the exponent all 0 bits, and the mantissa non-zero.  If the
+         value is subnormal, then the hidden bit in the mantissa is not
+         set.  */
+      if (df_exponent == 0 && df_mantissa != 0)		/* subnormal.  */
+	return false;
+
+      long df_words[2];
+      df_words[0] = (df_value >> 32) & 0xffffffff;
+      df_words[1] = df_value & 0xffffffff;
+
+      /* real_from_target takes the target words in  target order.  */
+      if (!BYTES_BIG_ENDIAN)
+	std::swap (df_words[0], df_words[1]);
+
+      real_from_target (&rv_type, df_words, DFmode);
+      rv = &rv_type;
+    }
+
+  /* Handle SFmode/DFmode constants.  Don't allow decimal or IEEE 128-bit
+     binary constants.  */
+  else if (CONST_DOUBLE_P (op) && (mode == SFmode || mode == DFmode))
+    rv = CONST_DOUBLE_REAL_VALUE (op);
+
+  /* We can't handle anything else with the XXSPLTIDP instruction.  */
+  else
+    return false;  
+
+  /* Validate that the number can be stored as a SFmode value.  */
+  if (!exact_real_truncate (SFmode, rv))
+    return false;
+
+  /* Validate that the number is not a SFmode subnormal value (exponent is 0,
+     mantissa field is non-zero) which is undefined for the XXSPLTIDP
+     instruction.  */
+  long sf_value;
+  real_to_target (&sf_value, rv, SFmode);
+
+  /* IEEE 754 32-bit values have 1 bit for the sign, 8 bits for the exponent,
+     and 23 bits for the mantissa.  Subnormal numbers have the exponent all
+     0 bits, and the mantissa non-zero.  */
+  long sf_exponent = (sf_value >> 23) & 0xFF;
+  long sf_mantissa = sf_value & 0x7FFFFF;
+
+  if (sf_exponent == 0 && sf_mantissa != 0)
+    return false;
+
+  return true;
+})
+
+;; Return 1 if the operand is a 64-bit vector constant that can be loaded via
+;; the XXSPLTIDP instruction, which takes a SFmode value and produces a
+;; V2DFmode or V2DI result.
+;;
+;; We cannot combine the scalar and vector cases because otherwise it is
+;; problematical if we assign an appropriate integer constant to a TImode
+;; value.  I.e.
+;;
+;;	(set (reg:TI 32)
+;;	     (const_int 0x8000000000000000))
+;;
+;; Otherwise, the constant would be splatted into the 2 64-bit positions in the
+;; vector register, and not loaded with the upper 64-bits 0, and the constant
+;; in the lower 64-bits.
+
+(define_predicate "easy_vector_constant_64bit_element"
+  (match_code "const_vector,vec_duplicate")
+{
+  /* Can we do the XXSPLTIDP instruction?  */
+  if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  if (mode == VOIDmode)
+    mode = GET_MODE (op);
+
+  if (mode != V2DFmode && mode != V2DImode)
+    return false;
+
+  if (CONST_VECTOR_P (op))
+    {
+      if (!CONST_VECTOR_DUPLICATE_P (op))
+	return false;
+
+      op = CONST_VECTOR_ELT (op, 0);
+    }
+
+  else if (GET_CODE (op) == VEC_DUPLICATE)
+    op = XEXP (op, 0);
+
+  else
+    return false;
+
+  return easy_fp_constant_64bit_scalar (op, GET_MODE_INNER (mode));
+})
+
 ;; Return 1 if the operand is a constant that can loaded with a XXSPLTIB
 ;; instruction and then a VUPKHSB, VECSB2W or VECSB2D instruction.
 
@@ -653,6 +790,9 @@
       if (zero_constant (op, mode) || all_ones_constant (op, mode))
 	return true;
 
+      if (easy_vector_constant_64bit_element (op, mode))
+	return true;
+
       if (TARGET_P9_VECTOR
           && xxspltib_constant_p (op, mode, &num_insns, &value))
 	return true;
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 14f6b313105..e9be9c4d99f 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -32,6 +32,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, int, int, int,
 
 extern int easy_altivec_constant (rtx, machine_mode);
 extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
+extern long xxspltidp_constant_immediate (rtx, machine_mode);
 extern int vspltis_shifted (rtx);
 extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
 extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
@@ -198,6 +199,7 @@ enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode);
 extern bool prefixed_load_p (rtx_insn *);
 extern bool prefixed_store_p (rtx_insn *);
 extern bool prefixed_paddi_p (rtx_insn *);
+extern bool prefixed_xxsplti_p (rtx_insn *);
 extern void rs6000_asm_output_opcode (FILE *);
 extern void output_pcrel_opt_reloc (rtx);
 extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index ad860728169..83d243269e3 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6946,6 +6946,60 @@ xxspltib_constant_p (rtx op,
   return true;
 }
 
+/* Return the immediate value used in the XXSPLTIDP instruction.  */
+
+long
+xxspltidp_constant_immediate (rtx op, machine_mode mode)
+{
+  long ret;
+
+  /* Handle vectors.  */
+  if (CONST_VECTOR_P (op))
+    {
+      op = CONST_VECTOR_ELT (op, 0);
+      mode = GET_MODE_INNER (mode);
+    }
+
+  else if (GET_CODE (op) == VEC_DUPLICATE)
+    {
+      op = XEXP (op, 0);
+      mode = GET_MODE (op);
+    }
+
+  gcc_assert (easy_fp_constant_64bit_scalar (op, mode));
+
+  /* Handle DImode/V2DImode by creating a DF value from it and then converting
+     the DFmode value to SFmode.  */
+  if (CONST_INT_P (op))
+    {
+      HOST_WIDE_INT df_value = INTVAL (op);
+      long df_words[2];
+
+      df_words[0] = (df_value >> 32) & 0xffffffff;
+      df_words[1] = df_value & 0xffffffff;
+
+      /* real_to_target takes input in target endian order.  */
+      if (!BYTES_BIG_ENDIAN)
+	std::swap (df_words[0], df_words[1]);
+
+      REAL_VALUE_TYPE r;
+      real_from_target (&r, &df_words[0], DFmode);
+      real_to_target (&ret, &r, SFmode);
+    }
+
+  /* For floating point constants, convert to SFmode.  */
+  else if (CONST_DOUBLE_P (op) && (mode == SFmode || mode == DFmode))
+    {
+      const REAL_VALUE_TYPE *rv = CONST_DOUBLE_REAL_VALUE (op);
+      real_to_target (&ret, rv, SFmode);
+    }
+
+  else
+    gcc_unreachable ();
+
+  return ret;
+}
+
 const char *
 output_vec_const_move (rtx *operands)
 {
@@ -6990,6 +7044,13 @@ output_vec_const_move (rtx *operands)
 	    gcc_unreachable ();
 	}
 
+      if (easy_fp_constant_64bit_scalar (vec, mode)
+	  || easy_vector_constant_64bit_element (vec, mode))
+	{
+	  operands[2] = GEN_INT (xxspltidp_constant_immediate (vec, mode));
+	  return "xxspltidp %x0,%2";
+	}
+
       if (TARGET_P9_VECTOR
 	  && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
 	{
@@ -26724,6 +26785,41 @@ prefixed_paddi_p (rtx_insn *insn)
   return (iform == INSN_FORM_PCREL_EXTERNAL || iform == INSN_FORM_PCREL_LOCAL);
 }
 
+/* Whether a permute type instruction is a prefixed XXSPLTI* instruction.
+   This is called from the prefixed attribute processing.  */
+
+bool
+prefixed_xxsplti_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  if (!set)
+    return false;
+
+  rtx dest = SET_DEST (set);
+  rtx src = SET_SRC (set);
+  machine_mode mode = GET_MODE (dest);
+
+  if (!REG_P (dest) && !SUBREG_P (dest))
+    return false;
+
+  switch (mode)
+    {
+    case E_DImode:
+    case E_DFmode:
+    case E_SFmode:
+      return easy_fp_constant_64bit_scalar (src, mode);
+
+    case E_V2DImode:
+    case E_V2DFmode:
+      return easy_vector_constant_64bit_element (src, mode);
+
+    default:
+      break;
+    }
+
+  return false;
+}
+
 /* Whether the next instruction needs a 'p' prefix issued before the
    instruction is printed out.  */
 static bool prepend_p_to_next_insn;
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 6bec2bddbde..8afc4b2756d 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -314,6 +314,11 @@
 
 	 (eq_attr "type" "integer,add")
 	 (if_then_else (match_test "prefixed_paddi_p (insn)")
+		       (const_string "yes")
+		       (const_string "no"))
+
+	 (eq_attr "type" "vecperm")
+	 (if_then_else (match_test "prefixed_xxsplti_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))]
 
@@ -7759,17 +7764,17 @@
 ;;
 ;;	LWZ          LFS        LXSSP       LXSSPX     STFS       STXSSP
 ;;	STXSSPX      STW        XXLXOR      LI         FMR        XSCPSGNDP
-;;	MR           MT<x>      MF<x>       NOP
+;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP
 
 (define_insn "movsf_hardfloat"
   [(set (match_operand:SF 0 "nonimmediate_operand"
 	 "=!r,       f,         v,          wa,        m,         wY,
 	  Z,         m,         wa,         !r,        f,         wa,
-	  !r,        *c*l,      !r,         *h")
+	  !r,        *c*l,      !r,         *h,        wa")
 	(match_operand:SF 1 "input_operand"
 	 "m,         m,         wY,         Z,         f,         v,
 	  wa,        r,         j,          j,         f,         wa,
-	  r,         r,         *h,         0"))]
+	  r,         r,         *h,         0,         eF"))]
   "(register_operand (operands[0], SFmode)
    || register_operand (operands[1], SFmode))
    && TARGET_HARD_FLOAT
@@ -7791,15 +7796,16 @@
    mr %0,%1
    mt%0 %1
    mf%1 %0
-   nop"
+   nop
+   #"
   [(set_attr "type"
 	"load,       fpload,    fpload,     fpload,    fpstore,   fpstore,
 	 fpstore,    store,     veclogical, integer,   fpsimple,  fpsimple,
-	 *,          mtjmpr,    mfjmpr,     *")
+	 *,          mtjmpr,    mfjmpr,     *,         vecperm")
    (set_attr "isa"
 	"*,          *,         p9v,        p8v,       *,         p9v,
 	 p8v,        *,         *,          *,         *,         *,
-	 *,          *,         *,          *")])
+	 *,          *,         *,          *,         p10")])
 
 ;;	LWZ          LFIWZX     STW        STFIWX     MTVSRWZ    MFVSRWZ
 ;;	FMR          MR         MT%0       MF%1       NOP
@@ -8059,18 +8065,18 @@
 
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSD         STXSD       XXLOR       XXLXOR      GPR<-0
-;;           LWZ          STW         MR
+;;           LWZ          STW         MR          XXSPLTIDP
 
 
 (define_insn "*mov<mode>_hardfloat32"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
             "=m,          d,          d,          <f64_p9>,   wY,
               <f64_av>,   Z,          <f64_vsx>,  <f64_vsx>,  !r,
-              Y,          r,          !r")
+              Y,          r,          !r,         wa")
 	(match_operand:FMOVE64 1 "input_operand"
              "d,          m,          d,          wY,         <f64_p9>,
               Z,          <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
-              r,          Y,          r"))]
+              r,          Y,          r,          eF"))]
   "! TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8087,20 +8093,21 @@
    #
    #
    #
+   #
    #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, two,
-             store,       load,       two")
+             store,       load,       two,        vecperm")
    (set_attr "size" "64")
    (set_attr "length"
             "*,           *,          *,          *,          *,
              *,           *,          *,          *,          8,
-             8,           8,          8")
+             8,           8,          8,          *")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
-             *,           *,          *")])
+             *,           *,          *,          p10")])
 
 ;;           STW      LWZ     MR      G-const H-const F-const
 
@@ -8127,19 +8134,19 @@
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSDX        STXSDX      XXLOR       XXLXOR      LI 0
 ;;           STD          LD          MR          MT{CTR,LR}  MF{CTR,LR}
-;;           NOP          MFVSRD      MTVSRD
+;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP
 
 (define_insn "*mov<mode>_hardfloat64"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
            "=m,           d,          d,          <f64_p9>,   wY,
              <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
              YZ,          r,          !r,         *c*l,       !r,
-            *h,           r,          <f64_dm>")
+            *h,           r,          <f64_dm>,   wa")
 	(match_operand:FMOVE64 1 "input_operand"
             "d,           m,          d,          wY,         <f64_p9>,
              Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
              r,           YZ,         r,          r,          *h,
-             0,           <f64_dm>,   r"))]
+             0,           <f64_dm>,   r,          eF"))]
   "TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8161,18 +8168,19 @@
    mf%1 %0
    nop
    mfvsrd %0,%x1
-   mtvsrd %x0,%1"
+   mtvsrd %x0,%1
+   #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, integer,
              store,       load,       *,          mtjmpr,     mfjmpr,
-             *,           mfvsr,      mtvsr")
+             *,           mfvsr,      mtvsr,      vecperm")
    (set_attr "size" "64")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
              *,           *,          *,          *,          *,
-             *,           p8v,        p8v")])
+             *,           p8v,        p8v,        p10")])
 
 ;;           STD      LD       MR      MT<SPR> MF<SPR> G-const
 ;;           H-const  F-const  Special
@@ -9220,6 +9228,7 @@
 ;; a gpr into a fpr instead of reloading an invalid 'Y' address
 
 ;;        GPR store  GPR load   GPR move   FPR store  FPR load   FPR move
+;;	  XXSPLTIDP
 ;;        GPR const  AVX store  AVX store  AVX load   AVX load   VSX move
 ;;        P9 0       P9 -1      AVX 0/-1   VSX 0      VSX -1     P9 const
 ;;        AVX const  
@@ -9227,11 +9236,13 @@
 (define_insn "*movdi_internal32"
   [(set (match_operand:DI 0 "nonimmediate_operand"
          "=Y,        r,         r,         m,         ^d,        ^d,
+          ^wa,
           r,         wY,        Z,         ^v,        $v,        ^wa,
           wa,        wa,        v,         wa,        *i,        v,
           v")
 	(match_operand:DI 1 "input_operand"
          "r,         Y,         r,         ^d,        m,         ^d,
+          eF,
           IJKnF,     ^v,        $v,        wY,        Z,         ^wa,
           Oj,        wM,        OjwM,      Oj,        wM,        wS,
           wB"))]
@@ -9246,6 +9257,7 @@
    lfd%U1%X1 %0,%1
    fmr %0,%1
    #
+   #
    stxsd %1,%0
    stxsdx %x1,%y0
    lxsd %0,%1
@@ -9260,17 +9272,20 @@
    #"
   [(set_attr "type"
          "store,     load,      *,         fpstore,   fpload,    fpsimple,
+          vecperm,
           *,         fpstore,   fpstore,   fpload,    fpload,    veclogical,
           vecsimple, vecsimple, vecsimple, veclogical,veclogical,vecsimple,
           vecsimple")
    (set_attr "size" "64")
    (set_attr "length"
          "8,         8,         8,         *,         *,         *,
+          *,
           16,        *,         *,         *,         *,         *,
           *,         *,         *,         *,         *,         8,
           *")
    (set_attr "isa"
          "*,         *,         *,         *,         *,         *,
+          p10,
           *,         p9v,       p7v,       p9v,       p7v,       *,
           p9v,       p9v,       p7v,       *,         *,         p7v,
           p7v")])
@@ -9306,6 +9321,7 @@
 })
 
 ;;	   GPR store   GPR load    GPR move
+;;	   XXSPLTIDP
 ;;	   GPR li      GPR lis     GPR pli     GPR #
 ;;	   FPR store   FPR load    FPR move
 ;;	   AVX store   AVX store   AVX load    AVX load    VSX move
@@ -9316,6 +9332,7 @@
 (define_insn "*movdi_internal64"
   [(set (match_operand:DI 0 "nonimmediate_operand"
 	  "=YZ,        r,          r,
+	   ^wa,
 	   r,          r,          r,          r,
 	   m,          ^d,         ^d,
 	   wY,         Z,          $v,         $v,         ^wa,
@@ -9325,6 +9342,7 @@
 	   ?r,         ?wa")
 	(match_operand:DI 1 "input_operand"
 	  "r,          YZ,         r,
+	   eF,
 	   I,          L,          eI,         nF,
 	   ^d,         m,          ^d,
 	   ^v,         $v,         wY,         Z,          ^wa,
@@ -9339,6 +9357,7 @@
    std%U0%X0 %1,%0
    ld%U1%X1 %0,%1
    mr %0,%1
+   #
    li %0,%1
    lis %0,%v1
    li %0,%1
@@ -9365,6 +9384,7 @@
    mtvsrd %x0,%1"
   [(set_attr "type"
 	  "store,      load,       *,
+	   vecperm,
 	   *,          *,          *,          *,
 	   fpstore,    fpload,     fpsimple,
 	   fpstore,    fpstore,    fpload,     fpload,     veclogical,
@@ -9375,6 +9395,7 @@
    (set_attr "size" "64")
    (set_attr "length"
 	  "*,          *,          *,
+	   *,
 	   *,          *,          *,          20,
 	   *,          *,          *,
 	   *,          *,          *,          *,          *,
@@ -9384,6 +9405,7 @@
 	   *,          *")
    (set_attr "isa"
 	  "*,          *,          *,
+	   p10,
 	   *,          *,          p10,        *,
 	   *,          *,          *,
 	   p9v,        p7v,        p9v,        p7v,        *,
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 9d7878f144a..1d7ce4cc94a 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -640,6 +640,10 @@ mprivileged
 Target Var(rs6000_privileged) Init(0)
 Generate code that will run in privileged state.
 
+mxxspltidp
+Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
+Generate (do not generate) XXSPLTIDP instructions.
+
 -param=rs6000-density-pct-threshold=
 Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) IntegerRange(0, 100) Param
 When costing for loop vectorization, we probably need to penalize the loop body
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index bf033e31c1c..fa33c9d9fbf 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1191,16 +1191,19 @@
 ;; instruction). But generate XXLXOR/XXLORC if it will avoid a register move.
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
+;;              XXSPLTIDP
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        r,         we,        ?wQ,
+                wa,
                 ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
                 ?wa,       v,         <??r>,     wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        we,        r,         r,
+                eV,
                 wQ,        Y,         r,         r,         wE,        jwM,
                 ?jwM,      W,         <nW>,      v,         wZ"))]
 
@@ -1212,36 +1215,44 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
+                vecperm,
                 store,     load,      store,     *,         vecsimple, vecsimple,
                 vecsimple, *,         *,         vecstore,  vecload")
    (set_attr "num_insns"
                "*,         *,         *,         2,         *,         2,
+                *,
                 2,         2,         2,         2,         *,         *,
                 *,         5,         2,         *,         *")
    (set_attr "max_prefixed_insns"
                "*,         *,         *,         *,         *,         2,
+                *,
                 2,         2,         2,         2,         *,         *,
                 *,         *,         *,         *,         *")
    (set_attr "length"
                "*,         *,         *,         8,         *,         8,
+                *,
                 8,         8,         8,         8,         *,         *,
                 *,         20,        8,         *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
+                p10,
                 *,         *,         *,         *,         p9v,       *,
                 <VSisa>,   *,         *,         *,         *")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
+;;              XXSPLTIDP
 ;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
+                wa,
                 wa,        v,         ?wa,       v,         <??r>,
                 wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        Y,         r,         r,
+                eV,
                 wE,        jwM,       ?jwM,      W,         <nW>,
                 v,         wZ"))]
 
@@ -1253,14 +1264,17 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, load,      store,    *,
+                vecperm,
                 vecsimple, vecsimple, vecsimple, *,         *,
                 vecstore,  vecload")
    (set_attr "length"
                "*,         *,         *,         16,        16,        16,
+                *,
                 *,         *,         *,         20,        16,
                 *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
+                p10,
                 p9v,       *,         <VSisa>,   *,         *,
                 *,         *")])
 
@@ -6449,15 +6463,53 @@
   DONE;
 })
 
-(define_insn "xxspltidp_v2df_inst"
-  [(set (match_operand:V2DF 0 "register_operand" "=wa")
-	(unspec:V2DF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
-		     UNSPEC_XXSPLTIDP))]
+(define_mode_iterator XXSPLTIDP_S [DI SF DF])
+(define_mode_iterator XXSPLTIDP_V [V2DF V2DI])
+(define_mode_iterator XXSPLTIDP   [DI SF DF V2DF V2DI])
+
+(define_insn "xxspltidp_<mode>_inst"
+  [(set (match_operand:XXSPLTIDP 0 "register_operand" "=wa")
+	(unspec:XXSPLTIDP [(match_operand:SI 1 "c32bit_cint_operand" "n")]
+			  UNSPEC_XXSPLTIDP))]
   "TARGET_POWER10"
   "xxspltidp %x0,%1"
   [(set_attr "type" "vecperm")
    (set_attr "prefixed" "yes")])
 
+;; Generate the XXSPLTIDP instruction to support SFmode, DFmode, and DImode
+;; scalar constants and V2DF and V2DI vector constants where both elements are
+;; the same.  The constant has to be expressible as a SFmode constant that is
+;; not a SFmode denormal value.
+(define_insn_and_split "*xxspltidp_<mode>_internal"
+  [(set (match_operand:XXSPLTIDP_S 0 "vsx_register_operand" "=wa")
+	(match_operand:XXSPLTIDP_S 1 "easy_fp_constant_64bit_scalar" "eF"))]
+  "TARGET_POWER10"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(unspec:XXSPLTIDP_S [(match_dup 2)] UNSPEC_XXSPLTIDP))]
+{
+  long immediate = xxspltidp_constant_immediate (operands[1], <MODE>mode);
+  operands[2] = GEN_INT (immediate);
+}
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "yes")])
+
+(define_insn_and_split "*xxspltidp_<mode>_internal"
+  [(set (match_operand:XXSPLTIDP_V 0 "vsx_register_operand" "=wa")
+	(match_operand:XXSPLTIDP_V 1 "easy_vector_constant_64bit_element" "eV"))]
+  "TARGET_POWER10"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(unspec:XXSPLTIDP_V [(match_dup 2)] UNSPEC_XXSPLTIDP))]
+{
+  long immediate = xxspltidp_constant_immediate (operands[1], <MODE>mode);
+  operands[2] = GEN_INT (immediate);
+}
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "yes")])
+
 ;; XXSPLTI32DX built-in function support
 (define_expand "xxsplti32dx_v4si"
   [(set (match_operand:V4SI 0 "register_operand" "=wa")
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 2b41cb7fb7b..5035a3fd604 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3333,9 +3333,15 @@ The integer constant zero.
 A constant whose negation is a signed 16-bit constant.
 @end ifset
 
+@item eF
+A 64-bit scalar constant that can be loaded with the XXSPLTIDP instruction.
+
 @item eI
 A signed 34-bit integer constant if prefixed instructions are supported.
 
+@item eV
+A 128-bit vector constant that can be loaded with the XXSPLTIDP instruction.
+
 @ifset INTERNALS
 @item G
 A floating point constant that can be loaded into a register with one
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
new file mode 100644
index 00000000000..8f6e176f9af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+double
+scalar_double_0 (void)
+{
+  return 0.0;			/* XXSPLTIB or XXLXOR.  */
+}
+
+double
+scalar_double_1 (void)
+{
+  return 1.0;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+double
+scalar_double_m0 (void)
+{
+  return -0.0;			/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_nan (void)
+{
+  return __builtin_nan ("");	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_inf (void)
+{
+  return __builtin_inf ();	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inf ();
+}
+#endif
+
+double
+scalar_double_pi (void)
+{
+  return M_PI;			/* PLFD.  */
+}
+
+double
+scalar_double_denorm (void)
+{
+  return 0x1p-149f;		/* PLFD.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c
new file mode 100644
index 00000000000..75714d0b11d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c
@@ -0,0 +1,70 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test generating DImode constants that have the same bit pattern as DFmode
+   constants that can be loaded with the XXSPLTIDP instruction with the ISA 3.1
+   (power10).  We use asm to force the value into vector registers.  */
+
+double
+scalar_0 (void)
+{
+  /* XXSPLTIB or XXLXOR.  */
+  double d;
+  long long ll = 0;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+double
+scalar_1 (void)
+{
+  /* VSPLTISW/VUPKLSW or XXSPLTIB/VEXTSB2D.  */
+  double d;
+  long long ll = 1;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated
+   with XXSPLTIDP.  */
+double
+scalar_float_neg_0 (void)
+{
+  /* XXSPLTIDP.  */
+  double d;
+  long long ll = 0x8000000000000000LL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with
+   XXSPLTIDP.  */
+double
+scalar_float_1_0 (void)
+{
+  /* XXSPLTIDP.  */
+  double d;
+  long long ll = 0x3ff0000000000000LL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated
+   with XXSPLTIDP.  */
+double
+scalar_pi (void)
+{
+  /* PLXV.  */
+  double d;
+  long long ll = 0x400921fb54442d18LL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
new file mode 100644
index 00000000000..72504bdfbbd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating SFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+float
+scalar_float_0 (void)
+{
+  return 0.0f;			/* XXSPLTIB or XXLXOR.  */
+}
+
+float
+scalar_float_1 (void)
+{
+  return 1.0f;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+float
+scalar_float_m0 (void)
+{
+  return -0.0f;			/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_nan (void)
+{
+  return __builtin_nanf ("");	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_inf (void)
+{
+  return __builtin_inff ();	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inff ();
+}
+#endif
+
+float
+scalar_float_pi (void)
+{
+  return (float)M_PI;		/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_denorm (void)
+{
+  return 0x1p-149f;		/* PLFS.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
new file mode 100644
index 00000000000..82ffc86f8aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
@@ -0,0 +1,64 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating V2DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+vector double
+v2df_double_0 (void)
+{
+  return (vector double) { 0.0, 0.0 };			/* XXSPLTIB or XXLXOR.  */
+}
+
+vector double
+v2df_double_1 (void)
+{
+  return (vector double) { 1.0, 1.0 };			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+vector double
+v2df_double_m0 (void)
+{
+  return (vector double) { -0.0, -0.0 };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_nan (void)
+{
+  return (vector double) { __builtin_nan (""),
+			   __builtin_nan ("") };	/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_inf (void)
+{
+  return (vector double) { __builtin_inf (),
+			   __builtin_inf () };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_m_inf (void)
+{
+  return (vector double) { - __builtin_inf (),
+			   - __builtin_inf () };	/* XXSPLTIDP.  */
+}
+#endif
+
+vector double
+v2df_double_pi (void)
+{
+  return (vector double) { M_PI, M_PI };		/* PLVX.  */
+}
+
+vector double
+v2df_double_denorm (void)
+{
+  return (vector double) { (double)0x1p-149f,
+			   (double)0x1p-149f };		/* PLVX.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
new file mode 100644
index 00000000000..4d44f943d26
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test generating V2DImode constants that have the same bit pattern as
+   V2DFmode constants that can be loaded with the XXSPLTIDP instruction with
+   the ISA 3.1 (power10).  */
+
+vector long long
+vector_0 (void)
+{
+  /* XXSPLTIB or XXLXOR.  */
+  return (vector long long) { 0LL, 0LL };
+}
+
+vector long long
+vector_1 (void)
+{
+  /* XXSPLTIB and VEXTSB2D.  */
+  return (vector long long) { 1LL, 1LL };
+}
+
+/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated
+   with XXSPLTISDP.  */
+vector long long
+vector_float_neg_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x8000000000000000LL, 0x8000000000000000LL };
+}
+
+/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with
+   XXSPLTISDP.  */
+vector long long
+vector_float_1_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x3ff0000000000000LL, 0x3ff0000000000000LL };
+}
+
+/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated
+   with XXSPLTIDP.  */
+vector long long
+scalar_pi (void)
+{
+  /* PLXV.  */
+  return (vector long long) { 0x400921fb54442d18LL, 0x400921fb54442d18LL };
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-10-05 21:59 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-05 21:36 [gcc(refs/users/meissner/heads/work070)] Generate XXSPLTIDP on power10 Michael Meissner
  -- strict thread matches above, loose matches on Subject: below --
2021-10-05 21:59 Michael Meissner
2021-10-05 21:51 Michael Meissner
2021-10-05 21:11 Michael Meissner
2021-10-04 20:17 Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).