[gcc(refs/users/meissner/heads/work070)] Generate XXSPLTI32DX on power10.

public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed

* [gcc(refs/users/meissner/heads/work070)] Generate XXSPLTI32DX on power10.
@ 2021-10-05 17:57 Michael Meissner
  0 siblings, 0 replies; 4+ messages in thread
From: Michael Meissner @ 2021-10-05 17:57 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:e13e60d75fde4550cbd6ec189eff4bdd62fee316

commit e13e60d75fde4550cbd6ec189eff4bdd62fee316
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Tue Oct 5 13:57:16 2021 -0400

    Generate XXSPLTI32DX on power10.
    
    This patch generates XXSPLTI32DX for SF/DF floating point constants that
    cannot be generated with the XXSPLTIDP instruction.  In addition, it adds
    support for using XXSPLTI32DX to load up V2DF constants, where both constants
    are the same.
    
    At the present time, XXSPLTI32DX is not enabled by default.
    
    2021-10-05  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/constraint.md (eD): New constraint.
            * config/rs6000/predicates.md (easy_fp_constant): If the constant
            can be loaded with XXSPLTI32DX, it is easy.
            (easy_vector_constant_2insns): New predicate.
            (easy_vector_constant): If the constant can be loaded with
            XXSPLTI32DX, it is easy.
            * config/rs6000/rs6000-protos.h (xxsplti32dx_constant_immediate):
            New declaration.
            * config/rs6000/rs6000.c (xxsplti32dx_constant_immediate): New
            helper function.
            (output_vec_const_move): If the operand can be loaded with
            XXSPLTI32DX, split it.
            (rs6000_output_move_128bit): Likewise.
            (prefixed_xxsplti_p): Constants loaded with XXSPLTI32DX are
            prefixed.
            * config/rs6000/rs6000.md (movsf_hardfloat): Add support for
            constants loaded with XXSPLTI32DX.
            (mov<mode>_hardfloat32, FMOVE64 iterator):  Likewise.
            (mov<mode>_hardfloat64, FMOVE64 iterator): Likewise.
            (movdi_internal32): Likewise.
            (movdi_internal64): Likewise.
            * config/rs6000/rs6000.opt (-mxxsplti32dx): New option.
            * config/rs6000/vsx.md (UNSPEC_XXSPLTI32DX_CONST): New unspec.
            (vsx_mov<mode>_64bit): Add support for constants loaded with
            XXSPLTI32DX.
            (vsx_mov<mode>_32bit): Likewise.
            (XXSPLTI32DX): New mode iterator.
            (splitter for XXSPLTI32DX): Add splitter for constants loaded with
            XXSPLTI32DX.
            (xxsplti32dx_<mode>_first): New insns.
            (xxsplti32dx_<mode>_second): New insns.
            * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
            eD constraint.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/vec-splat-constant-df-2.c: New test.
            * gcc.target/powerpc/vec-splat-constant-di-2.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2df-2.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2di-2.c: New test.

Diff:
---
 gcc/config/rs6000/constraints.md                   |   6 ++
 gcc/config/rs6000/predicates.md                    |  62 +++++++++++
 gcc/config/rs6000/rs6000-protos.h                  |   1 +
 gcc/config/rs6000/rs6000.c                         |  65 ++++++++++-
 gcc/config/rs6000/rs6000.md                        | 119 ++++++++++++++++-----
 gcc/config/rs6000/rs6000.opt                       |   5 +
 gcc/config/rs6000/vsx.md                           | 107 +++++++++++++++---
 gcc/doc/md.texi                                    |   3 +
 .../gcc.target/powerpc/vec-splat-constant-df-2.c   |  24 +++++
 .../gcc.target/powerpc/vec-splat-constant-di-2.c   |  38 +++++++
 .../gcc.target/powerpc/vec-splat-constant-v2df-2.c |  24 +++++
 11 files changed, 410 insertions(+), 44 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index 46daeb0861c..f9d1d1ab446 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -213,6 +213,12 @@
   "A 64-bit scalar constant that can be loaded with the XXSPLTIDP instruction."
   (match_operand 0 "easy_fp_constant_64bit_scalar"))
 
+;; DImode, DFmode, V2DImode, V2DFmode constant that can be loaded with 2
+;; XXSPLTI32DX instruction.
+(define_constraint "eD"
+  "A constant that can be loaded with a pair of XXSPLTI32DX instructions."
+  (match_operand 0 "easy_vector_constant_2insns"))
+
 ;; 34-bit signed integer constant
 (define_constraint "eI"
   "A signed 34-bit integer constant if prefixed instructions are supported."
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 9b9f5934e58..49b0cb2a060 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -611,6 +611,11 @@
   if (easy_fp_constant_ieee128 (op, mode))
     return 1;
 
+  /* If we have the ISA 3.1 XXSPLTI32DX instruction, see if the constant can be
+     loaded with a pair of instructions.  */
+  if (easy_vector_constant_2insns (op, mode))
+    return 1;
+
   /* Otherwise consider floating point constants hard, so that the
      constant gets pushed to memory during the early RTL phases.  This
      has the advantage that double precision constants that can be
@@ -751,6 +756,60 @@
   return easy_fp_constant_64bit_scalar (op, GET_MODE_INNER (mode));
 })
 
+;; Return 1 if the operand is either a DImode/DFmode scalar constant or
+;; V2DImode/V2DFmode vector constant that needs 2 XXSPLTI32DX instructions to
+;; load the value
+
+(define_predicate "easy_vector_constant_2insns"
+  (match_code "const_vector,vec_duplicate,const_int,const_double")
+{
+  /* Can we do the XXSPLTI32DX instruction?  */
+  if (!TARGET_XXSPLTI32DX || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  if (mode == VOIDmode)
+    mode = GET_MODE (op);
+
+  /* Convert vector constant/duplicate into a scalar.  */
+  if (CONST_VECTOR_P (op))
+    {
+      if (!CONST_VECTOR_DUPLICATE_P (op))
+	return false;
+
+      op = CONST_VECTOR_ELT (op, 0);
+      mode = GET_MODE_INNER (mode);
+    }
+
+  else if (GET_CODE (op) == VEC_DUPLICATE)
+    {
+      op = XEXP (op, 0);
+      mode = GET_MODE_INNER (mode);
+    }
+
+  if (GET_MODE_SIZE (mode) > 8)
+    return false;
+
+  /* 0.0 or 0 is easy to generate.  */
+  if (op == CONST0_RTX (mode))
+    return false;
+
+  /* If we can load up the constant in other ways (either a single load
+     constant and a direct move or XXSPLTIDP), don't generate the
+     XXSPLTI32DX.  */
+  if (CONST_INT_P (op))
+    return !(satisfies_constraint_I (op)
+             || satisfies_constraint_L (op)
+             || satisfies_constraint_eI (op)
+             || easy_fp_constant_64bit_scalar (op, mode));
+
+  /* For floating point, if we can use XXSPLTIDP, we don't want to
+     generate XXSPLTI32DX's.  */
+  else if (CONST_DOUBLE_P (op) && (mode == SFmode || mode == DFmode))
+    return !easy_fp_constant_64bit_scalar (op, mode);
+
+  return false;
+})
+
 ;; Return 1 if the operand is a constant that can be loaded with the XXSPLTIW
 ;; instruction that loads up a 32-bit immediate and splats it into the vector.
 
@@ -972,6 +1031,9 @@
       if (easy_vector_constant_splat_word (op, mode))
 	return true;
 
+      if (easy_vector_constant_2insns (op, mode))
+	return 1;
+
       if (TARGET_P9_VECTOR
           && xxspltib_constant_p (op, mode, &num_insns, &value))
 	return true;
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 540c401e7ad..f517624cc56 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -32,6 +32,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, int, int, int,
 
 extern int easy_altivec_constant (rtx, machine_mode);
 extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
+extern void xxsplti32dx_constant_immediate (rtx, machine_mode, long *, long *);
 extern long xxspltidp_constant_immediate (rtx, machine_mode);
 extern long xxspltiw_constant_immediate (rtx, machine_mode);
 extern int lxvkq_constant_immediate (rtx, machine_mode);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 92cd2a1cf87..7a8ce88546f 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6951,6 +6951,59 @@ xxspltib_constant_p (rtx op,
   return true;
 }
 
+/* Return the two 32-bit constants to use in the two XXSPLTI32DX instructions
+   via HIGH_PTR and LOW_PTR.  */
+
+void
+xxsplti32dx_constant_immediate (rtx op,
+				machine_mode mode,
+				long *high_ptr,
+				long *low_ptr)
+{
+  gcc_assert (easy_vector_constant_2insns (op, mode));
+
+  if (mode == VOIDmode)
+    mode = GET_MODE (op);
+
+  if (CONST_VECTOR_P (op))
+    {
+      op = CONST_VECTOR_ELT (op, 0);
+      mode = GET_MODE_INNER (mode);
+    }
+
+  else if (GET_CODE (op) == VEC_DUPLICATE)
+    {
+      op = XEXP (op, 0);
+      mode = GET_MODE_INNER (mode);
+    }
+
+  if (CONST_INT_P (op))
+    {
+      HOST_WIDE_INT value = INTVAL (op);
+      *high_ptr = (value >> 32) & 0xffffffff;
+      *low_ptr = value & 0xffffffff;
+      return;
+    }
+
+  else if (CONST_DOUBLE_P (op) && (mode == SFmode || mode == DFmode))
+    {
+      long high_low[2];
+      const struct real_value *rv = CONST_DOUBLE_REAL_VALUE (op);
+      REAL_VALUE_TO_TARGET_DOUBLE (*rv, high_low);
+
+      /* The double precision value is laid out in memory order.  We need to
+	 undo this for XXSPLTI32DX.  */
+      if (!BYTES_BIG_ENDIAN)
+	std::swap (high_low[0], high_low[1]);
+
+      *high_ptr = high_low[0] & 0xffffffff;
+      *low_ptr = high_low[1] & 0xffffffff;
+      return;
+    }
+
+  gcc_unreachable ();
+}
+
 /* Return the immediate value used in the XXSPLTIDP instruction.  */
 
 long
@@ -7234,6 +7287,9 @@ output_vec_const_move (rtx *operands)
 	  return "lxvkq %x0,%2";
 	}
 
+      if (easy_vector_constant_2insns (vec, mode))
+	return "#";
+
       if (TARGET_P9_VECTOR
 	  && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
 	{
@@ -14082,6 +14138,9 @@ rs6000_output_move_128bit (rtx operands[])
       return "lxvkq %x0,%2";
     }
 
+  else if (dest_vsx_p && easy_vector_constant_2insns (src, mode))
+    return "#";
+
   else if (dest_regno >= 0
 	   && (CONST_INT_P (src)
 	       || CONST_WIDE_INT_P (src)
@@ -26996,11 +27055,13 @@ prefixed_xxsplti_p (rtx_insn *insn)
     case E_DImode:
     case E_DFmode:
     case E_SFmode:
-      return easy_fp_constant_64bit_scalar (src, mode);
+      return (easy_fp_constant_64bit_scalar (src, mode)
+	      || easy_vector_constant_2insns (src, mode));
 
     case E_V2DImode:
     case E_V2DFmode:
-      return easy_vector_constant_64bit_element (src, mode);
+      return (easy_vector_constant_64bit_element (src, mode)
+	      || easy_vector_constant_2insns (src, mode));
 
     case E_V16QImode:
     case E_V8HImode:
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 8afc4b2756d..5c120ef1672 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -7764,17 +7764,17 @@
 ;;
 ;;	LWZ          LFS        LXSSP       LXSSPX     STFS       STXSSP
 ;;	STXSSPX      STW        XXLXOR      LI         FMR        XSCPSGNDP
-;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP
+;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP  XXSPLTI32DX
 
 (define_insn "movsf_hardfloat"
   [(set (match_operand:SF 0 "nonimmediate_operand"
 	 "=!r,       f,         v,          wa,        m,         wY,
 	  Z,         m,         wa,         !r,        f,         wa,
-	  !r,        *c*l,      !r,         *h,        wa")
+	  !r,        *c*l,      !r,         *h,        wa,        wa")
 	(match_operand:SF 1 "input_operand"
 	 "m,         m,         wY,         Z,         f,         v,
 	  wa,        r,         j,          j,         f,         wa,
-	  r,         r,         *h,         0,         eF"))]
+	  r,         r,         *h,         0,         eF,        eD"))]
   "(register_operand (operands[0], SFmode)
    || register_operand (operands[1], SFmode))
    && TARGET_HARD_FLOAT
@@ -7797,15 +7797,24 @@
    mt%0 %1
    mf%1 %0
    nop
+   #
    #"
   [(set_attr "type"
 	"load,       fpload,    fpload,     fpload,    fpstore,   fpstore,
 	 fpstore,    store,     veclogical, integer,   fpsimple,  fpsimple,
-	 *,          mtjmpr,    mfjmpr,     *,         vecperm")
+	 *,          mtjmpr,    mfjmpr,     *,         vecperm,   vecperm")
    (set_attr "isa"
 	"*,          *,         p9v,        p8v,       *,         p9v,
 	 p8v,        *,         *,          *,         *,         *,
-	 *,          *,         *,          *,         p10")])
+	 *,          *,         *,          *,         p10,       p10")
+   (set_attr "max_prefixed_insns"
+	"*,          *,         *,          *,         *,         *,
+	 *,          *,         *,          *,         *,         *,
+	 *,          *,         *,          *,         *,         2")
+   (set_attr "num_insns"
+	"*,          *,         *,          *,         *,         *,
+	 *,          *,         *,          *,         *,         *,
+	 *,          *,         *,          *,         *,         2")])
 
 ;;	LWZ          LFIWZX     STW        STFIWX     MTVSRWZ    MFVSRWZ
 ;;	FMR          MR         MT%0       MF%1       NOP
@@ -8065,18 +8074,18 @@
 
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSD         STXSD       XXLOR       XXLXOR      GPR<-0
-;;           LWZ          STW         MR          XXSPLTIDP
+;;           LWZ          STW         MR          XXSPLTIDP   XXSPLTI32DX
 
 
 (define_insn "*mov<mode>_hardfloat32"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
             "=m,          d,          d,          <f64_p9>,   wY,
               <f64_av>,   Z,          <f64_vsx>,  <f64_vsx>,  !r,
-              Y,          r,          !r,         wa")
+              Y,          r,          !r,         wa,         wa")
 	(match_operand:FMOVE64 1 "input_operand"
              "d,          m,          d,          wY,         <f64_p9>,
               Z,          <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
-              r,          Y,          r,          eF"))]
+              r,          Y,          r,          eF,         eD"))]
   "! TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8094,20 +8103,29 @@
    #
    #
    #
+   #
    #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, two,
-             store,       load,       two,        vecperm")
+             store,       load,       two,        vecperm,    vecperm")
    (set_attr "size" "64")
    (set_attr "length"
             "*,           *,          *,          *,          *,
              *,           *,          *,          *,          8,
-             8,           8,          8,          *")
+             8,           8,          8,          *,          *")
+   (set_attr "num_insns"
+            "*,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *,          2")
+   (set_attr "max_prefixed_insns"
+            "*,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *,          2")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
-             *,           *,          *,          p10")])
+             *,           *,          *,          p10,        p10")])
 
 ;;           STW      LWZ     MR      G-const H-const F-const
 
@@ -8134,19 +8152,19 @@
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSDX        STXSDX      XXLOR       XXLXOR      LI 0
 ;;           STD          LD          MR          MT{CTR,LR}  MF{CTR,LR}
-;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP
+;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP   XXSPLTI32DX
 
 (define_insn "*mov<mode>_hardfloat64"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
            "=m,           d,          d,          <f64_p9>,   wY,
              <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
              YZ,          r,          !r,         *c*l,       !r,
-            *h,           r,          <f64_dm>,   wa")
+            *h,           r,          <f64_dm>,   wa,         wa")
 	(match_operand:FMOVE64 1 "input_operand"
             "d,           m,          d,          wY,         <f64_p9>,
              Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
              r,           YZ,         r,          r,          *h,
-             0,           <f64_dm>,   r,          eF"))]
+             0,           <f64_dm>,   r,          eF,         eD"))]
   "TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8169,18 +8187,29 @@
    nop
    mfvsrd %0,%x1
    mtvsrd %x0,%1
+   #
    #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, integer,
              store,       load,       *,          mtjmpr,     mfjmpr,
-             *,           mfvsr,      mtvsr,      vecperm")
+             *,           mfvsr,      mtvsr,      vecperm,    vecperm")
    (set_attr "size" "64")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
              *,           *,          *,          *,          *,
-             *,           p8v,        p8v,        p10")])
+             *,           p8v,        p8v,        p10,        p10")
+   (set_attr "num_insns"
+            "*,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *,          2")
+   (set_attr "max_prefixed_insns"
+            "*,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *,          2")])
 
 ;;           STD      LD       MR      MT<SPR> MF<SPR> G-const
 ;;           H-const  F-const  Special
@@ -9228,7 +9257,7 @@
 ;; a gpr into a fpr instead of reloading an invalid 'Y' address
 
 ;;        GPR store  GPR load   GPR move   FPR store  FPR load   FPR move
-;;	  XXSPLTIDP
+;;	  XXSPLTIDP  XXSPLTI32DX
 ;;        GPR const  AVX store  AVX store  AVX load   AVX load   VSX move
 ;;        P9 0       P9 -1      AVX 0/-1   VSX 0      VSX -1     P9 const
 ;;        AVX const  
@@ -9236,13 +9265,13 @@
 (define_insn "*movdi_internal32"
   [(set (match_operand:DI 0 "nonimmediate_operand"
          "=Y,        r,         r,         m,         ^d,        ^d,
-          ^wa,
+          ^wa,       ^wa,
           r,         wY,        Z,         ^v,        $v,        ^wa,
           wa,        wa,        v,         wa,        *i,        v,
           v")
 	(match_operand:DI 1 "input_operand"
          "r,         Y,         r,         ^d,        m,         ^d,
-          eF,
+          eF,        eD,
           IJKnF,     ^v,        $v,        wY,        Z,         ^wa,
           Oj,        wM,        OjwM,      Oj,        wM,        wS,
           wB"))]
@@ -9258,6 +9287,7 @@
    fmr %0,%1
    #
    #
+   #
    stxsd %1,%0
    stxsdx %x1,%y0
    lxsd %0,%1
@@ -9272,20 +9302,32 @@
    #"
   [(set_attr "type"
          "store,     load,      *,         fpstore,   fpload,    fpsimple,
-          vecperm,
+          vecperm,   vecperm,
           *,         fpstore,   fpstore,   fpload,    fpload,    veclogical,
           vecsimple, vecsimple, vecsimple, veclogical,veclogical,vecsimple,
           vecsimple")
    (set_attr "size" "64")
    (set_attr "length"
          "8,         8,         8,         *,         *,         *,
-          *,
+          *,         *,
           16,        *,         *,         *,         *,         *,
           *,         *,         *,         *,         *,         8,
           *")
+   (set_attr "num_insns"
+         "*,         *,         *,         *,         *,         *,
+          *,         *,
+          *,         *,         *,         *,         *,         *,
+          *,         *,         *,         *,         *,         *,
+          *")
+   (set_attr "max_prefixed_insns"
+         "*,         *,         *,         *,         *,         *,
+          *,         *,
+          *,         *,         *,         *,         *,         *,
+          *,         *,         *,         *,         *,         *,
+          *")
    (set_attr "isa"
          "*,         *,         *,         *,         *,         *,
-          p10,
+          p10,       p10,
           *,         p9v,       p7v,       p9v,       p7v,       *,
           p9v,       p9v,       p7v,       *,         *,         p7v,
           p7v")])
@@ -9321,7 +9363,7 @@
 })
 
 ;;	   GPR store   GPR load    GPR move
-;;	   XXSPLTIDP
+;;	   XXSPLTIDP   XXSPLTI32DX
 ;;	   GPR li      GPR lis     GPR pli     GPR #
 ;;	   FPR store   FPR load    FPR move
 ;;	   AVX store   AVX store   AVX load    AVX load    VSX move
@@ -9332,7 +9374,7 @@
 (define_insn "*movdi_internal64"
   [(set (match_operand:DI 0 "nonimmediate_operand"
 	  "=YZ,        r,          r,
-	   ^wa,
+	   ^wa,        ^wa,
 	   r,          r,          r,          r,
 	   m,          ^d,         ^d,
 	   wY,         Z,          $v,         $v,         ^wa,
@@ -9342,7 +9384,7 @@
 	   ?r,         ?wa")
 	(match_operand:DI 1 "input_operand"
 	  "r,          YZ,         r,
-	   eF,
+	   eF,         eD,
 	   I,          L,          eI,         nF,
 	   ^d,         m,          ^d,
 	   ^v,         $v,         wY,         Z,          ^wa,
@@ -9358,6 +9400,7 @@
    ld%U1%X1 %0,%1
    mr %0,%1
    #
+   #
    li %0,%1
    lis %0,%v1
    li %0,%1
@@ -9384,7 +9427,7 @@
    mtvsrd %x0,%1"
   [(set_attr "type"
 	  "store,      load,       *,
-	   vecperm,
+	   vecperm,    vecperm,
 	   *,          *,          *,          *,
 	   fpstore,    fpload,     fpsimple,
 	   fpstore,    fpstore,    fpload,     fpload,     veclogical,
@@ -9395,7 +9438,7 @@
    (set_attr "size" "64")
    (set_attr "length"
 	  "*,          *,          *,
-	   *,
+	   *,          *,
 	   *,          *,          *,          20,
 	   *,          *,          *,
 	   *,          *,          *,          *,          *,
@@ -9403,9 +9446,29 @@
 	   8,          *,
 	   *,          *,          *,
 	   *,          *")
+   (set_attr "num_insns"
+	  "*,          *,          *,
+	   *,          2,
+	   *,          *,          *,          *,
+	   *,          *,          *,
+	   *,          *,          *,          *,          *,
+	   *,          *,          *,          *,          *,
+	   8,          *,
+	   *,          *,          *,
+	   *,          *")
+   (set_attr "max_prefixed_insns"
+	  "*,          *,          *,
+	   *,          2,
+	   *,          *,          *,          *,
+	   *,          *,          *,
+	   *,          *,          *,          *,          *,
+	   *,          *,          *,          *,          *,
+	   8,          *,
+	   *,          *,          *,
+	   *,          *")
    (set_attr "isa"
 	  "*,          *,          *,
-	   p10,
+	   p10,        p10,
 	   *,          *,          p10,        *,
 	   *,          *,          *,
 	   p9v,        p7v,        p9v,        p7v,        *,
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index a53aad72547..898bc4e9e6e 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -640,6 +640,11 @@ mprivileged
 Target Var(rs6000_privileged) Init(0)
 Generate code that will run in privileged state.
 
+;; Do not enable at this time.
+mxxsplti32dx
+Target Undocumented Var(TARGET_XXSPLTI32DX) Init(0) Save
+Generate (do not generate) XXSPLTI32DX instructions.
+
 mxxspltidp
 Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
 Generate (do not generate) XXSPLTIDP instructions.
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 712e5df0c02..cc21c454491 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -376,6 +376,7 @@
    UNSPEC_XXSPLTIW
    UNSPEC_XXSPLTIDP
    UNSPEC_XXSPLTI32DX
+   UNSPEC_XXSPLTI32DX_CONST
    UNSPEC_XXBLEND
    UNSPEC_XXPERMX
   ])
@@ -1191,19 +1192,19 @@
 ;; instruction). But generate XXLXOR/XXLORC if it will avoid a register move.
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
-;;              XXSPLTIDP  XXSPLTIW   LXVKQ
+;;              XXSPLTIDP  XXSPLTIW   LXVKQ     XXSPLTI32DX
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        r,         we,        ?wQ,
-                wa,        wa,        wa,
+                wa,        wa,        wa,        wa,
                 ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
                 ?wa,       v,         <??r>,     wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        we,        r,         r,
-                eV,        eW,        eQ,
+                eV,        eW,        eQ,        eD,
                 wQ,        Y,         r,         r,         wE,        jwM,
                 ?jwM,      W,         <nW>,      v,         wZ"))]
 
@@ -1215,44 +1216,44 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
-                vecperm,   vecperm,   vecperm,
+                vecperm,   vecperm,   vecperm,   vecperm,
                 store,     load,      store,     *,         vecsimple, vecsimple,
                 vecsimple, *,         *,         vecstore,  vecload")
    (set_attr "num_insns"
                "*,         *,         *,         2,         *,         2,
-                *,         *,         *,
+                *,         *,         *,         2,
                 2,         2,         2,         2,         *,         *,
                 *,         5,         2,         *,         *")
    (set_attr "max_prefixed_insns"
                "*,         *,         *,         *,         *,         2,
-                *,         *,         *,
+                *,         *,         *,         2,
                 2,         2,         2,         2,         *,         *,
                 *,         *,         *,         *,         *")
    (set_attr "length"
                "*,         *,         *,         8,         *,         8,
-                *,         *,         *,
+                *,         *,         *,         *,
                 8,         8,         8,         8,         *,         *,
                 *,         20,        8,         *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
-                p10,       p10,       p10,
+                p10,       p10,       p10,       p10,
                 *,         *,         *,         *,         p9v,       *,
                 <VSisa>,   *,         *,         *,         *")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
-;;              XXSPLTIDP  XXSPLTIW   LXVKQ
+;;              XXSPLTIDP  XXSPLTIW   LXVKQ      XXSPLTI32DX
 ;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
-                wa,        wa,        wa,
+                wa,        wa,        wa,        wa,
                 wa,        v,         ?wa,       v,         <??r>,
                 wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        Y,         r,         r,
-                eV,        eW,        eQ,
+                eV,        eW,        eQ,        eD,
                 wE,        jwM,       ?jwM,      W,         <nW>,
                 v,         wZ"))]
 
@@ -1264,17 +1265,27 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, load,      store,    *,
-                vecperm,   vecperm,   vecperm,
+                vecperm,   vecperm,   vecperm,   vecperm,
                 vecsimple, vecsimple, vecsimple, *,         *,
                 vecstore,  vecload")
    (set_attr "length"
                "*,         *,         *,         16,        16,        16,
-                *,         *,         *,
+                *,         *,         *,         *,
                 *,         *,         *,         20,        16,
                 *,         *")
+   (set_attr "num_insns"
+               "*,         *,         *,         *,         *,         *,
+                *,         *,         *,         2,
+                *,         *,         *,         *,         *,
+                *,         *")
+   (set_attr "length"
+               "*,         *,         *,         *,         *,         *,
+                *,         *,         *,         2,
+                *,         *,         *,         *,         *,
+                *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
-                p10,       p10,       p10,
+                p10,       p10,       p10,       p10,
                 p9v,       *,         <VSisa>,   *,         *,
                 *,         *")])
 
@@ -6570,6 +6581,74 @@
   [(set_attr "type" "vecperm")
    (set_attr "prefixed" "yes")])
 
+;; XXSPLTI32DX used to create 64-bit constants or vector constants where the
+;; even elements match and the odd elements match.
+(define_mode_iterator XXSPLTI32DX [DI SF DF V2DF V2DI])
+
+;; Don't split DImode before register allocation, so that it has a better
+;; chance of winding up in a GPR register.
+(define_split
+  [(set (match_operand:XXSPLTI32DX 0 "vsx_register_operand")
+	(match_operand:XXSPLTI32DX 1 "easy_vector_constant_2insns"))]
+  "TARGET_POWER10 && (reload_completed || <MODE>mode != DImode)"
+  [(set (match_dup 0)
+	(unspec:XXSPLTI32DX [(match_dup 2)
+			     (match_dup 3)] UNSPEC_XXSPLTI32DX_CONST))
+   (set (match_dup 0)
+	(unspec:XXSPLTI32DX [(match_dup 0)
+			     (match_dup 4)
+			     (match_dup 5)] UNSPEC_XXSPLTI32DX_CONST))]
+{
+  long high = 0, low = 0;
+
+  xxsplti32dx_constant_immediate (operands[1], <MODE>mode, &high, &low);
+
+  /* If the low bits are 0 or all 1s, initialize that word first.  This way we
+     can use a smaller XXSPLTIB/XXLXOR/XXLORC instruction instead the first
+     XXSPLTI32DX.  */
+  if (low == 0 || low ==  -1)
+    {
+      operands[2] = const1_rtx;
+      operands[3] = GEN_INT (low);
+      operands[4] = const0_rtx;
+      operands[5] = GEN_INT (high);
+    }
+  else
+    {
+      operands[2] = const0_rtx;
+      operands[3] = GEN_INT (high);
+      operands[4] = const1_rtx;
+      operands[5] = GEN_INT (low);
+    }
+})
+
+;; First word of XXSPLTI32DX
+(define_insn "*xxsplti32dx_<mode>_first"
+  [(set (match_operand:XXSPLTI32DX 0 "vsx_register_operand" "=wa,wa,wa")
+	(unspec:XXSPLTI32DX [(match_operand 1 "u1bit_cint_operand" "n,n,n")
+			     (match_operand 2 "const_int_operand" "O,wM,n")]
+			    UNSPEC_XXSPLTI32DX_CONST))]
+  "TARGET_XXSPLTI32DX"
+  "@
+   xxlxor %x0,%x0,%x0
+   xxlorc %x0,%x0,%x0
+   xxsplti32dx %x0,%1,%2"
+  [(set_attr "type" "veclogical,veclogical,vecperm")
+   (set_attr "prefixed" "*,*,yes")])
+
+;; Second word of XXSPLTI32DX
+(define_insn "*xxsplti32dx_<mode>_second"
+  [(set (match_operand:XXSPLTI32DX 0 "vsx_register_operand" "=wa")
+	(unspec:XXSPLTI32DX [(match_operand:XXSPLTI32DX 1 "vsx_register_operand" "0")
+			     (match_operand 2 "u1bit_cint_operand" "n")
+			     (match_operand 3 "const_int_operand" "n")]
+			    UNSPEC_XXSPLTI32DX_CONST))]
+  "TARGET_XXSPLTI32DX"
+  "xxsplti32dx %x0,%2,%3"
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
+
 ;; XXBLEND built-in function support
 (define_insn "xxblend_<mode>"
   [(set (match_operand:VM3 0 "register_operand" "=wa")
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 4ad0e745c94..feaa205291a 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3333,6 +3333,9 @@ The integer constant zero.
 A constant whose negation is a signed 16-bit constant.
 @end ifset
 
+@item eD
+A constant that can be loaded with a pair of XXSPLTI32DX instructions.
+
 @item eF
 A 64-bit scalar constant that can be loaded with the XXSPLTIDP instruction.
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df-2.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df-2.c
new file mode 100644
index 00000000000..3b4b4e01d1b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df-2.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxsplti32dx -mxxspltiw" } */
+
+#define M_PI		3.14159265358979323846
+#define SUBNORMAL	0x1p-149f
+
+/* Test generation of floating point constants with XXSPLTI32DX.  */
+
+double
+df_double_pi (void)
+{
+  return M_PI;			/* 2x XXSPLTI32DX.  */
+}
+
+/* This float subnormal cannot be loaded with XXSPLTIDP.  */
+
+double
+v2df_double_denorm (void)
+{
+  return SUBNORMAL;		/* XXLXOR, XXSPLTI32DX.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di-2.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di-2.c
new file mode 100644
index 00000000000..30ad33388e8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di-2.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxsplti32dx -mxxspltiw" } */
+
+/* Test generation of integer constants loaded into the vector registers with
+   the ISA 3.1 (power10) instruction XXSPLTI32DX.  We use asm to force the
+   value into vector registers.  */
+
+#define LARGE_BITS	0x12345678ABCDEF01LL
+#define SUBNORMAL	0x8000000000000001LL
+
+/* 0x8000000000000001LL is the bit pattern for a negative subnormal value can
+   be generated with XXSPLTI32DX but not XXSLTIDP.  */
+double
+scalar_float_subnormal (void)
+{
+  /* 2x XXSPLTI32DX.  */
+  double d;
+  long long ll = SUBNORMAL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x12345678ABCDEF01LL is a large constant that can be loaded with 2x
+   XXSPLTI32DX instructions.  */
+double
+scalar_large_constant (void)
+{
+  /* 2x XXSPLTI32DX.  */
+  double d;
+  long long ll = LARGE_BITS;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 4 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df-2.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df-2.c
new file mode 100644
index 00000000000..8bc119ad41f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df-2.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxsplti32dx -mxxspltiw" } */
+
+#define M_PI		3.14159265358979323846
+#define SUBNORMAL	0x1p-149f
+
+/* Test generation of floating point constants with XXSPLTI32DX.  */
+
+vector double
+v2df_double_pi (void)
+{
+  /* 2x XXSPLTI32DX.  */
+  return (vector double) { M_PI, M_PI };
+}
+
+vector double
+v2df_double_denorm (void)
+{
+  /* XXLXOR, XXSPLTI32DX.  */
+  return (vector double) { SUBNORMAL, SUBNORMAL };
+}
+
+/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [gcc(refs/users/meissner/heads/work070)] Generate XXSPLTI32DX on power10.
@ 2021-10-05 23:31 Michael Meissner
  0 siblings, 0 replies; 4+ messages in thread
From: Michael Meissner @ 2021-10-05 23:31 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:54e299dcc7d77817fb23f041f7e27a02d9ccc945

commit 54e299dcc7d77817fb23f041f7e27a02d9ccc945
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Tue Oct 5 19:30:52 2021 -0400

    Generate XXSPLTI32DX on power10.
    
    This patch generates XXSPLTI32DX for SF/DF floating point constants that
    cannot be generated with the XXSPLTIDP instruction.  In addition, it adds
    support for using XXSPLTI32DX to load up V2DF constants, where both constants
    are the same.
    
    At the present time, XXSPLTI32DX is not enabled by default.
    
    2021-10-05  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/constraint.md (eD): New constraint.
            * config/rs6000/predicates.md (easy_fp_constant): If the constant
            can be loaded with XXSPLTI32DX, it is easy.
            (easy_vector_constant_2insns): New predicate.
            (easy_vector_constant): If the constant can be loaded with
            XXSPLTI32DX, it is easy.
            * config/rs6000/rs6000-protos.h (xxsplti32dx_constant_immediate):
            New declaration.
            * config/rs6000/rs6000.c (xxsplti32dx_constant_immediate): New
            helper function.
            (output_vec_const_move): If the operand can be loaded with
            XXSPLTI32DX, split it.
            (rs6000_output_move_128bit): Likewise.
            (prefixed_xxsplti_p): Constants loaded with XXSPLTI32DX are
            prefixed.
            * config/rs6000/rs6000.md (movsf_hardfloat): Add support for
            constants loaded with XXSPLTI32DX.
            (mov<mode>_hardfloat32, FMOVE64 iterator):  Likewise.
            (mov<mode>_hardfloat64, FMOVE64 iterator): Likewise.
            (movdi_internal32): Likewise.
            (movdi_internal64): Likewise.
            * config/rs6000/rs6000.opt (-mxxsplti32dx): New option.
            * config/rs6000/vsx.md (UNSPEC_XXSPLTI32DX_CONST): New unspec.
            (vsx_mov<mode>_64bit): Add support for constants loaded with
            XXSPLTI32DX.
            (vsx_mov<mode>_32bit): Likewise.
            (XXSPLTI32DX): New mode iterator.
            (splitter for XXSPLTI32DX): Add splitter for constants loaded with
            XXSPLTI32DX.
            (xxsplti32dx_<mode>_first): New insns.
            (xxsplti32dx_<mode>_second): New insns.
            * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
            eD constraint.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/vec-splat-constant-df-2.c: New test.
            * gcc.target/powerpc/vec-splat-constant-di-2.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2df-2.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2di-2.c: New test.

Diff:
---
 gcc/config/rs6000/constraints.md                   |   6 ++
 gcc/config/rs6000/predicates.md                    |  62 +++++++++++
 gcc/config/rs6000/rs6000-protos.h                  |   1 +
 gcc/config/rs6000/rs6000.c                         |  65 ++++++++++-
 gcc/config/rs6000/rs6000.md                        | 119 ++++++++++++++++-----
 gcc/config/rs6000/rs6000.opt                       |   5 +
 gcc/config/rs6000/vsx.md                           | 107 +++++++++++++++---
 gcc/doc/md.texi                                    |   3 +
 .../gcc.target/powerpc/vec-splat-constant-df-2.c   |  24 +++++
 .../gcc.target/powerpc/vec-splat-constant-di-2.c   |  38 +++++++
 .../gcc.target/powerpc/vec-splat-constant-v2df-2.c |  24 +++++
 .../gcc.target/powerpc/vec-splat-constant-v2di-2.c |  29 +++++
 12 files changed, 439 insertions(+), 44 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index 46daeb0861c..f9d1d1ab446 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -213,6 +213,12 @@
   "A 64-bit scalar constant that can be loaded with the XXSPLTIDP instruction."
   (match_operand 0 "easy_fp_constant_64bit_scalar"))
 
+;; DImode, DFmode, V2DImode, V2DFmode constant that can be loaded with 2
+;; XXSPLTI32DX instruction.
+(define_constraint "eD"
+  "A constant that can be loaded with a pair of XXSPLTI32DX instructions."
+  (match_operand 0 "easy_vector_constant_2insns"))
+
 ;; 34-bit signed integer constant
 (define_constraint "eI"
   "A signed 34-bit integer constant if prefixed instructions are supported."
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 9b9f5934e58..49b0cb2a060 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -611,6 +611,11 @@
   if (easy_fp_constant_ieee128 (op, mode))
     return 1;
 
+  /* If we have the ISA 3.1 XXSPLTI32DX instruction, see if the constant can be
+     loaded with a pair of instructions.  */
+  if (easy_vector_constant_2insns (op, mode))
+    return 1;
+
   /* Otherwise consider floating point constants hard, so that the
      constant gets pushed to memory during the early RTL phases.  This
      has the advantage that double precision constants that can be
@@ -751,6 +756,60 @@
   return easy_fp_constant_64bit_scalar (op, GET_MODE_INNER (mode));
 })
 
+;; Return 1 if the operand is either a DImode/DFmode scalar constant or
+;; V2DImode/V2DFmode vector constant that needs 2 XXSPLTI32DX instructions to
+;; load the value
+
+(define_predicate "easy_vector_constant_2insns"
+  (match_code "const_vector,vec_duplicate,const_int,const_double")
+{
+  /* Can we do the XXSPLTI32DX instruction?  */
+  if (!TARGET_XXSPLTI32DX || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  if (mode == VOIDmode)
+    mode = GET_MODE (op);
+
+  /* Convert vector constant/duplicate into a scalar.  */
+  if (CONST_VECTOR_P (op))
+    {
+      if (!CONST_VECTOR_DUPLICATE_P (op))
+	return false;
+
+      op = CONST_VECTOR_ELT (op, 0);
+      mode = GET_MODE_INNER (mode);
+    }
+
+  else if (GET_CODE (op) == VEC_DUPLICATE)
+    {
+      op = XEXP (op, 0);
+      mode = GET_MODE_INNER (mode);
+    }
+
+  if (GET_MODE_SIZE (mode) > 8)
+    return false;
+
+  /* 0.0 or 0 is easy to generate.  */
+  if (op == CONST0_RTX (mode))
+    return false;
+
+  /* If we can load up the constant in other ways (either a single load
+     constant and a direct move or XXSPLTIDP), don't generate the
+     XXSPLTI32DX.  */
+  if (CONST_INT_P (op))
+    return !(satisfies_constraint_I (op)
+             || satisfies_constraint_L (op)
+             || satisfies_constraint_eI (op)
+             || easy_fp_constant_64bit_scalar (op, mode));
+
+  /* For floating point, if we can use XXSPLTIDP, we don't want to
+     generate XXSPLTI32DX's.  */
+  else if (CONST_DOUBLE_P (op) && (mode == SFmode || mode == DFmode))
+    return !easy_fp_constant_64bit_scalar (op, mode);
+
+  return false;
+})
+
 ;; Return 1 if the operand is a constant that can be loaded with the XXSPLTIW
 ;; instruction that loads up a 32-bit immediate and splats it into the vector.
 
@@ -972,6 +1031,9 @@
       if (easy_vector_constant_splat_word (op, mode))
 	return true;
 
+      if (easy_vector_constant_2insns (op, mode))
+	return 1;
+
       if (TARGET_P9_VECTOR
           && xxspltib_constant_p (op, mode, &num_insns, &value))
 	return true;
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 540c401e7ad..f517624cc56 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -32,6 +32,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, int, int, int,
 
 extern int easy_altivec_constant (rtx, machine_mode);
 extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
+extern void xxsplti32dx_constant_immediate (rtx, machine_mode, long *, long *);
 extern long xxspltidp_constant_immediate (rtx, machine_mode);
 extern long xxspltiw_constant_immediate (rtx, machine_mode);
 extern int lxvkq_constant_immediate (rtx, machine_mode);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index f93a7c80801..6bd34444c07 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6951,6 +6951,59 @@ xxspltib_constant_p (rtx op,
   return true;
 }
 
+/* Return the two 32-bit constants to use in the two XXSPLTI32DX instructions
+   via HIGH_PTR and LOW_PTR.  */
+
+void
+xxsplti32dx_constant_immediate (rtx op,
+				machine_mode mode,
+				long *high_ptr,
+				long *low_ptr)
+{
+  gcc_assert (easy_vector_constant_2insns (op, mode));
+
+  if (mode == VOIDmode)
+    mode = GET_MODE (op);
+
+  if (CONST_VECTOR_P (op))
+    {
+      op = CONST_VECTOR_ELT (op, 0);
+      mode = GET_MODE_INNER (mode);
+    }
+
+  else if (GET_CODE (op) == VEC_DUPLICATE)
+    {
+      op = XEXP (op, 0);
+      mode = GET_MODE_INNER (mode);
+    }
+
+  if (CONST_INT_P (op))
+    {
+      HOST_WIDE_INT value = INTVAL (op);
+      *high_ptr = (value >> 32) & 0xffffffff;
+      *low_ptr = value & 0xffffffff;
+      return;
+    }
+
+  else if (CONST_DOUBLE_P (op) && (mode == SFmode || mode == DFmode))
+    {
+      long high_low[2];
+      const struct real_value *rv = CONST_DOUBLE_REAL_VALUE (op);
+      REAL_VALUE_TO_TARGET_DOUBLE (*rv, high_low);
+
+      /* The double precision value is laid out in memory order.  We need to
+	 undo this for XXSPLTI32DX.  */
+      if (!BYTES_BIG_ENDIAN)
+	std::swap (high_low[0], high_low[1]);
+
+      *high_ptr = high_low[0] & 0xffffffff;
+      *low_ptr = high_low[1] & 0xffffffff;
+      return;
+    }
+
+  gcc_unreachable ();
+}
+
 /* Return the immediate value used in the XXSPLTIDP instruction.  */
 
 long
@@ -7234,6 +7287,9 @@ output_vec_const_move (rtx *operands)
 	  return "lxvkq %x0,%2";
 	}
 
+      if (easy_vector_constant_2insns (vec, mode))
+	return "#";
+
       if (TARGET_P9_VECTOR
 	  && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
 	{
@@ -14082,6 +14138,9 @@ rs6000_output_move_128bit (rtx operands[])
       return "lxvkq %x0,%2";
     }
 
+  else if (dest_vsx_p && easy_vector_constant_2insns (src, mode))
+    return "#";
+
   else if (dest_regno >= 0
 	   && (CONST_INT_P (src)
 	       || CONST_WIDE_INT_P (src)
@@ -26996,11 +27055,13 @@ prefixed_xxsplti_p (rtx_insn *insn)
     case E_DImode:
     case E_DFmode:
     case E_SFmode:
-      return easy_fp_constant_64bit_scalar (src, mode);
+      return (easy_fp_constant_64bit_scalar (src, mode)
+	      || easy_vector_constant_2insns (src, mode));
 
     case E_V2DImode:
     case E_V2DFmode:
-      return easy_vector_constant_64bit_element (src, mode);
+      return (easy_vector_constant_64bit_element (src, mode)
+	      || easy_vector_constant_2insns (src, mode));
 
     case E_V16QImode:
     case E_V8HImode:
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 8afc4b2756d..5c120ef1672 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -7764,17 +7764,17 @@
 ;;
 ;;	LWZ          LFS        LXSSP       LXSSPX     STFS       STXSSP
 ;;	STXSSPX      STW        XXLXOR      LI         FMR        XSCPSGNDP
-;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP
+;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP  XXSPLTI32DX
 
 (define_insn "movsf_hardfloat"
   [(set (match_operand:SF 0 "nonimmediate_operand"
 	 "=!r,       f,         v,          wa,        m,         wY,
 	  Z,         m,         wa,         !r,        f,         wa,
-	  !r,        *c*l,      !r,         *h,        wa")
+	  !r,        *c*l,      !r,         *h,        wa,        wa")
 	(match_operand:SF 1 "input_operand"
 	 "m,         m,         wY,         Z,         f,         v,
 	  wa,        r,         j,          j,         f,         wa,
-	  r,         r,         *h,         0,         eF"))]
+	  r,         r,         *h,         0,         eF,        eD"))]
   "(register_operand (operands[0], SFmode)
    || register_operand (operands[1], SFmode))
    && TARGET_HARD_FLOAT
@@ -7797,15 +7797,24 @@
    mt%0 %1
    mf%1 %0
    nop
+   #
    #"
   [(set_attr "type"
 	"load,       fpload,    fpload,     fpload,    fpstore,   fpstore,
 	 fpstore,    store,     veclogical, integer,   fpsimple,  fpsimple,
-	 *,          mtjmpr,    mfjmpr,     *,         vecperm")
+	 *,          mtjmpr,    mfjmpr,     *,         vecperm,   vecperm")
    (set_attr "isa"
 	"*,          *,         p9v,        p8v,       *,         p9v,
 	 p8v,        *,         *,          *,         *,         *,
-	 *,          *,         *,          *,         p10")])
+	 *,          *,         *,          *,         p10,       p10")
+   (set_attr "max_prefixed_insns"
+	"*,          *,         *,          *,         *,         *,
+	 *,          *,         *,          *,         *,         *,
+	 *,          *,         *,          *,         *,         2")
+   (set_attr "num_insns"
+	"*,          *,         *,          *,         *,         *,
+	 *,          *,         *,          *,         *,         *,
+	 *,          *,         *,          *,         *,         2")])
 
 ;;	LWZ          LFIWZX     STW        STFIWX     MTVSRWZ    MFVSRWZ
 ;;	FMR          MR         MT%0       MF%1       NOP
@@ -8065,18 +8074,18 @@
 
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSD         STXSD       XXLOR       XXLXOR      GPR<-0
-;;           LWZ          STW         MR          XXSPLTIDP
+;;           LWZ          STW         MR          XXSPLTIDP   XXSPLTI32DX
 
 
 (define_insn "*mov<mode>_hardfloat32"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
             "=m,          d,          d,          <f64_p9>,   wY,
               <f64_av>,   Z,          <f64_vsx>,  <f64_vsx>,  !r,
-              Y,          r,          !r,         wa")
+              Y,          r,          !r,         wa,         wa")
 	(match_operand:FMOVE64 1 "input_operand"
              "d,          m,          d,          wY,         <f64_p9>,
               Z,          <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
-              r,          Y,          r,          eF"))]
+              r,          Y,          r,          eF,         eD"))]
   "! TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8094,20 +8103,29 @@
    #
    #
    #
+   #
    #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, two,
-             store,       load,       two,        vecperm")
+             store,       load,       two,        vecperm,    vecperm")
    (set_attr "size" "64")
    (set_attr "length"
             "*,           *,          *,          *,          *,
              *,           *,          *,          *,          8,
-             8,           8,          8,          *")
+             8,           8,          8,          *,          *")
+   (set_attr "num_insns"
+            "*,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *,          2")
+   (set_attr "max_prefixed_insns"
+            "*,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *,          2")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
-             *,           *,          *,          p10")])
+             *,           *,          *,          p10,        p10")])
 
 ;;           STW      LWZ     MR      G-const H-const F-const
 
@@ -8134,19 +8152,19 @@
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSDX        STXSDX      XXLOR       XXLXOR      LI 0
 ;;           STD          LD          MR          MT{CTR,LR}  MF{CTR,LR}
-;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP
+;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP   XXSPLTI32DX
 
 (define_insn "*mov<mode>_hardfloat64"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
            "=m,           d,          d,          <f64_p9>,   wY,
              <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
              YZ,          r,          !r,         *c*l,       !r,
-            *h,           r,          <f64_dm>,   wa")
+            *h,           r,          <f64_dm>,   wa,         wa")
 	(match_operand:FMOVE64 1 "input_operand"
             "d,           m,          d,          wY,         <f64_p9>,
              Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
              r,           YZ,         r,          r,          *h,
-             0,           <f64_dm>,   r,          eF"))]
+             0,           <f64_dm>,   r,          eF,         eD"))]
   "TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8169,18 +8187,29 @@
    nop
    mfvsrd %0,%x1
    mtvsrd %x0,%1
+   #
    #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, integer,
              store,       load,       *,          mtjmpr,     mfjmpr,
-             *,           mfvsr,      mtvsr,      vecperm")
+             *,           mfvsr,      mtvsr,      vecperm,    vecperm")
    (set_attr "size" "64")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
              *,           *,          *,          *,          *,
-             *,           p8v,        p8v,        p10")])
+             *,           p8v,        p8v,        p10,        p10")
+   (set_attr "num_insns"
+            "*,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *,          2")
+   (set_attr "max_prefixed_insns"
+            "*,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *,          2")])
 
 ;;           STD      LD       MR      MT<SPR> MF<SPR> G-const
 ;;           H-const  F-const  Special
@@ -9228,7 +9257,7 @@
 ;; a gpr into a fpr instead of reloading an invalid 'Y' address
 
 ;;        GPR store  GPR load   GPR move   FPR store  FPR load   FPR move
-;;	  XXSPLTIDP
+;;	  XXSPLTIDP  XXSPLTI32DX
 ;;        GPR const  AVX store  AVX store  AVX load   AVX load   VSX move
 ;;        P9 0       P9 -1      AVX 0/-1   VSX 0      VSX -1     P9 const
 ;;        AVX const  
@@ -9236,13 +9265,13 @@
 (define_insn "*movdi_internal32"
   [(set (match_operand:DI 0 "nonimmediate_operand"
          "=Y,        r,         r,         m,         ^d,        ^d,
-          ^wa,
+          ^wa,       ^wa,
           r,         wY,        Z,         ^v,        $v,        ^wa,
           wa,        wa,        v,         wa,        *i,        v,
           v")
 	(match_operand:DI 1 "input_operand"
          "r,         Y,         r,         ^d,        m,         ^d,
-          eF,
+          eF,        eD,
           IJKnF,     ^v,        $v,        wY,        Z,         ^wa,
           Oj,        wM,        OjwM,      Oj,        wM,        wS,
           wB"))]
@@ -9258,6 +9287,7 @@
    fmr %0,%1
    #
    #
+   #
    stxsd %1,%0
    stxsdx %x1,%y0
    lxsd %0,%1
@@ -9272,20 +9302,32 @@
    #"
   [(set_attr "type"
          "store,     load,      *,         fpstore,   fpload,    fpsimple,
-          vecperm,
+          vecperm,   vecperm,
           *,         fpstore,   fpstore,   fpload,    fpload,    veclogical,
           vecsimple, vecsimple, vecsimple, veclogical,veclogical,vecsimple,
           vecsimple")
    (set_attr "size" "64")
    (set_attr "length"
          "8,         8,         8,         *,         *,         *,
-          *,
+          *,         *,
           16,        *,         *,         *,         *,         *,
           *,         *,         *,         *,         *,         8,
           *")
+   (set_attr "num_insns"
+         "*,         *,         *,         *,         *,         *,
+          *,         *,
+          *,         *,         *,         *,         *,         *,
+          *,         *,         *,         *,         *,         *,
+          *")
+   (set_attr "max_prefixed_insns"
+         "*,         *,         *,         *,         *,         *,
+          *,         *,
+          *,         *,         *,         *,         *,         *,
+          *,         *,         *,         *,         *,         *,
+          *")
    (set_attr "isa"
          "*,         *,         *,         *,         *,         *,
-          p10,
+          p10,       p10,
           *,         p9v,       p7v,       p9v,       p7v,       *,
           p9v,       p9v,       p7v,       *,         *,         p7v,
           p7v")])
@@ -9321,7 +9363,7 @@
 })
 
 ;;	   GPR store   GPR load    GPR move
-;;	   XXSPLTIDP
+;;	   XXSPLTIDP   XXSPLTI32DX
 ;;	   GPR li      GPR lis     GPR pli     GPR #
 ;;	   FPR store   FPR load    FPR move
 ;;	   AVX store   AVX store   AVX load    AVX load    VSX move
@@ -9332,7 +9374,7 @@
 (define_insn "*movdi_internal64"
   [(set (match_operand:DI 0 "nonimmediate_operand"
 	  "=YZ,        r,          r,
-	   ^wa,
+	   ^wa,        ^wa,
 	   r,          r,          r,          r,
 	   m,          ^d,         ^d,
 	   wY,         Z,          $v,         $v,         ^wa,
@@ -9342,7 +9384,7 @@
 	   ?r,         ?wa")
 	(match_operand:DI 1 "input_operand"
 	  "r,          YZ,         r,
-	   eF,
+	   eF,         eD,
 	   I,          L,          eI,         nF,
 	   ^d,         m,          ^d,
 	   ^v,         $v,         wY,         Z,          ^wa,
@@ -9358,6 +9400,7 @@
    ld%U1%X1 %0,%1
    mr %0,%1
    #
+   #
    li %0,%1
    lis %0,%v1
    li %0,%1
@@ -9384,7 +9427,7 @@
    mtvsrd %x0,%1"
   [(set_attr "type"
 	  "store,      load,       *,
-	   vecperm,
+	   vecperm,    vecperm,
 	   *,          *,          *,          *,
 	   fpstore,    fpload,     fpsimple,
 	   fpstore,    fpstore,    fpload,     fpload,     veclogical,
@@ -9395,7 +9438,7 @@
    (set_attr "size" "64")
    (set_attr "length"
 	  "*,          *,          *,
-	   *,
+	   *,          *,
 	   *,          *,          *,          20,
 	   *,          *,          *,
 	   *,          *,          *,          *,          *,
@@ -9403,9 +9446,29 @@
 	   8,          *,
 	   *,          *,          *,
 	   *,          *")
+   (set_attr "num_insns"
+	  "*,          *,          *,
+	   *,          2,
+	   *,          *,          *,          *,
+	   *,          *,          *,
+	   *,          *,          *,          *,          *,
+	   *,          *,          *,          *,          *,
+	   8,          *,
+	   *,          *,          *,
+	   *,          *")
+   (set_attr "max_prefixed_insns"
+	  "*,          *,          *,
+	   *,          2,
+	   *,          *,          *,          *,
+	   *,          *,          *,
+	   *,          *,          *,          *,          *,
+	   *,          *,          *,          *,          *,
+	   8,          *,
+	   *,          *,          *,
+	   *,          *")
    (set_attr "isa"
 	  "*,          *,          *,
-	   p10,
+	   p10,        p10,
 	   *,          *,          p10,        *,
 	   *,          *,          *,
 	   p9v,        p7v,        p9v,        p7v,        *,
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index a53aad72547..898bc4e9e6e 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -640,6 +640,11 @@ mprivileged
 Target Var(rs6000_privileged) Init(0)
 Generate code that will run in privileged state.
 
+;; Do not enable at this time.
+mxxsplti32dx
+Target Undocumented Var(TARGET_XXSPLTI32DX) Init(0) Save
+Generate (do not generate) XXSPLTI32DX instructions.
+
 mxxspltidp
 Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
 Generate (do not generate) XXSPLTIDP instructions.
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 712e5df0c02..cc21c454491 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -376,6 +376,7 @@
    UNSPEC_XXSPLTIW
    UNSPEC_XXSPLTIDP
    UNSPEC_XXSPLTI32DX
+   UNSPEC_XXSPLTI32DX_CONST
    UNSPEC_XXBLEND
    UNSPEC_XXPERMX
   ])
@@ -1191,19 +1192,19 @@
 ;; instruction). But generate XXLXOR/XXLORC if it will avoid a register move.
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
-;;              XXSPLTIDP  XXSPLTIW   LXVKQ
+;;              XXSPLTIDP  XXSPLTIW   LXVKQ     XXSPLTI32DX
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        r,         we,        ?wQ,
-                wa,        wa,        wa,
+                wa,        wa,        wa,        wa,
                 ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
                 ?wa,       v,         <??r>,     wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        we,        r,         r,
-                eV,        eW,        eQ,
+                eV,        eW,        eQ,        eD,
                 wQ,        Y,         r,         r,         wE,        jwM,
                 ?jwM,      W,         <nW>,      v,         wZ"))]
 
@@ -1215,44 +1216,44 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
-                vecperm,   vecperm,   vecperm,
+                vecperm,   vecperm,   vecperm,   vecperm,
                 store,     load,      store,     *,         vecsimple, vecsimple,
                 vecsimple, *,         *,         vecstore,  vecload")
    (set_attr "num_insns"
                "*,         *,         *,         2,         *,         2,
-                *,         *,         *,
+                *,         *,         *,         2,
                 2,         2,         2,         2,         *,         *,
                 *,         5,         2,         *,         *")
    (set_attr "max_prefixed_insns"
                "*,         *,         *,         *,         *,         2,
-                *,         *,         *,
+                *,         *,         *,         2,
                 2,         2,         2,         2,         *,         *,
                 *,         *,         *,         *,         *")
    (set_attr "length"
                "*,         *,         *,         8,         *,         8,
-                *,         *,         *,
+                *,         *,         *,         *,
                 8,         8,         8,         8,         *,         *,
                 *,         20,        8,         *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
-                p10,       p10,       p10,
+                p10,       p10,       p10,       p10,
                 *,         *,         *,         *,         p9v,       *,
                 <VSisa>,   *,         *,         *,         *")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
-;;              XXSPLTIDP  XXSPLTIW   LXVKQ
+;;              XXSPLTIDP  XXSPLTIW   LXVKQ      XXSPLTI32DX
 ;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
-                wa,        wa,        wa,
+                wa,        wa,        wa,        wa,
                 wa,        v,         ?wa,       v,         <??r>,
                 wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        Y,         r,         r,
-                eV,        eW,        eQ,
+                eV,        eW,        eQ,        eD,
                 wE,        jwM,       ?jwM,      W,         <nW>,
                 v,         wZ"))]
 
@@ -1264,17 +1265,27 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, load,      store,    *,
-                vecperm,   vecperm,   vecperm,
+                vecperm,   vecperm,   vecperm,   vecperm,
                 vecsimple, vecsimple, vecsimple, *,         *,
                 vecstore,  vecload")
    (set_attr "length"
                "*,         *,         *,         16,        16,        16,
-                *,         *,         *,
+                *,         *,         *,         *,
                 *,         *,         *,         20,        16,
                 *,         *")
+   (set_attr "num_insns"
+               "*,         *,         *,         *,         *,         *,
+                *,         *,         *,         2,
+                *,         *,         *,         *,         *,
+                *,         *")
+   (set_attr "length"
+               "*,         *,         *,         *,         *,         *,
+                *,         *,         *,         2,
+                *,         *,         *,         *,         *,
+                *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
-                p10,       p10,       p10,
+                p10,       p10,       p10,       p10,
                 p9v,       *,         <VSisa>,   *,         *,
                 *,         *")])
 
@@ -6570,6 +6581,74 @@
   [(set_attr "type" "vecperm")
    (set_attr "prefixed" "yes")])
 
+;; XXSPLTI32DX used to create 64-bit constants or vector constants where the
+;; even elements match and the odd elements match.
+(define_mode_iterator XXSPLTI32DX [DI SF DF V2DF V2DI])
+
+;; Don't split DImode before register allocation, so that it has a better
+;; chance of winding up in a GPR register.
+(define_split
+  [(set (match_operand:XXSPLTI32DX 0 "vsx_register_operand")
+	(match_operand:XXSPLTI32DX 1 "easy_vector_constant_2insns"))]
+  "TARGET_POWER10 && (reload_completed || <MODE>mode != DImode)"
+  [(set (match_dup 0)
+	(unspec:XXSPLTI32DX [(match_dup 2)
+			     (match_dup 3)] UNSPEC_XXSPLTI32DX_CONST))
+   (set (match_dup 0)
+	(unspec:XXSPLTI32DX [(match_dup 0)
+			     (match_dup 4)
+			     (match_dup 5)] UNSPEC_XXSPLTI32DX_CONST))]
+{
+  long high = 0, low = 0;
+
+  xxsplti32dx_constant_immediate (operands[1], <MODE>mode, &high, &low);
+
+  /* If the low bits are 0 or all 1s, initialize that word first.  This way we
+     can use a smaller XXSPLTIB/XXLXOR/XXLORC instruction instead the first
+     XXSPLTI32DX.  */
+  if (low == 0 || low ==  -1)
+    {
+      operands[2] = const1_rtx;
+      operands[3] = GEN_INT (low);
+      operands[4] = const0_rtx;
+      operands[5] = GEN_INT (high);
+    }
+  else
+    {
+      operands[2] = const0_rtx;
+      operands[3] = GEN_INT (high);
+      operands[4] = const1_rtx;
+      operands[5] = GEN_INT (low);
+    }
+})
+
+;; First word of XXSPLTI32DX
+(define_insn "*xxsplti32dx_<mode>_first"
+  [(set (match_operand:XXSPLTI32DX 0 "vsx_register_operand" "=wa,wa,wa")
+	(unspec:XXSPLTI32DX [(match_operand 1 "u1bit_cint_operand" "n,n,n")
+			     (match_operand 2 "const_int_operand" "O,wM,n")]
+			    UNSPEC_XXSPLTI32DX_CONST))]
+  "TARGET_XXSPLTI32DX"
+  "@
+   xxlxor %x0,%x0,%x0
+   xxlorc %x0,%x0,%x0
+   xxsplti32dx %x0,%1,%2"
+  [(set_attr "type" "veclogical,veclogical,vecperm")
+   (set_attr "prefixed" "*,*,yes")])
+
+;; Second word of XXSPLTI32DX
+(define_insn "*xxsplti32dx_<mode>_second"
+  [(set (match_operand:XXSPLTI32DX 0 "vsx_register_operand" "=wa")
+	(unspec:XXSPLTI32DX [(match_operand:XXSPLTI32DX 1 "vsx_register_operand" "0")
+			     (match_operand 2 "u1bit_cint_operand" "n")
+			     (match_operand 3 "const_int_operand" "n")]
+			    UNSPEC_XXSPLTI32DX_CONST))]
+  "TARGET_XXSPLTI32DX"
+  "xxsplti32dx %x0,%2,%3"
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
+
 ;; XXBLEND built-in function support
 (define_insn "xxblend_<mode>"
   [(set (match_operand:VM3 0 "register_operand" "=wa")
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 4ad0e745c94..feaa205291a 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3333,6 +3333,9 @@ The integer constant zero.
 A constant whose negation is a signed 16-bit constant.
 @end ifset
 
+@item eD
+A constant that can be loaded with a pair of XXSPLTI32DX instructions.
+
 @item eF
 A 64-bit scalar constant that can be loaded with the XXSPLTIDP instruction.
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df-2.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df-2.c
new file mode 100644
index 00000000000..3b4b4e01d1b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df-2.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxsplti32dx -mxxspltiw" } */
+
+#define M_PI		3.14159265358979323846
+#define SUBNORMAL	0x1p-149f
+
+/* Test generation of floating point constants with XXSPLTI32DX.  */
+
+double
+df_double_pi (void)
+{
+  return M_PI;			/* 2x XXSPLTI32DX.  */
+}
+
+/* This float subnormal cannot be loaded with XXSPLTIDP.  */
+
+double
+v2df_double_denorm (void)
+{
+  return SUBNORMAL;		/* XXLXOR, XXSPLTI32DX.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di-2.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di-2.c
new file mode 100644
index 00000000000..30ad33388e8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di-2.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxsplti32dx -mxxspltiw" } */
+
+/* Test generation of integer constants loaded into the vector registers with
+   the ISA 3.1 (power10) instruction XXSPLTI32DX.  We use asm to force the
+   value into vector registers.  */
+
+#define LARGE_BITS	0x12345678ABCDEF01LL
+#define SUBNORMAL	0x8000000000000001LL
+
+/* 0x8000000000000001LL is the bit pattern for a negative subnormal value can
+   be generated with XXSPLTI32DX but not XXSLTIDP.  */
+double
+scalar_float_subnormal (void)
+{
+  /* 2x XXSPLTI32DX.  */
+  double d;
+  long long ll = SUBNORMAL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x12345678ABCDEF01LL is a large constant that can be loaded with 2x
+   XXSPLTI32DX instructions.  */
+double
+scalar_large_constant (void)
+{
+  /* 2x XXSPLTI32DX.  */
+  double d;
+  long long ll = LARGE_BITS;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 4 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df-2.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df-2.c
new file mode 100644
index 00000000000..8bc119ad41f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df-2.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxsplti32dx -mxxspltiw" } */
+
+#define M_PI		3.14159265358979323846
+#define SUBNORMAL	0x1p-149f
+
+/* Test generation of floating point constants with XXSPLTI32DX.  */
+
+vector double
+v2df_double_pi (void)
+{
+  /* 2x XXSPLTI32DX.  */
+  return (vector double) { M_PI, M_PI };
+}
+
+vector double
+v2df_double_denorm (void)
+{
+  /* XXLXOR, XXSPLTI32DX.  */
+  return (vector double) { SUBNORMAL, SUBNORMAL };
+}
+
+/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di-2.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di-2.c
new file mode 100644
index 00000000000..2730742752a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di-2.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxsplti32dx -mxxspltiw" } */
+
+/* Test generation of integer constants loaded into the vector registers with
+   the ISA 3.1 (power10) instruction XXSPLTI32DX.  */
+
+#define LARGE_BITS	0x12345678ABCDEF01LL
+#define SUBNORMAL	0x8000000000000001LL
+
+/* 0x8000000000000001LL is the bit pattern for a negative subnormal value can
+   be generated with XXSPLTI32DX but not XXSLTIDP.  */
+vector long long
+vector_float_subnormal (void)
+{
+  /* 2x XXSPLTI32DX.  */
+  return (vector long long) { SUBNORMAL, SUBNORMAL };
+}
+
+/* 0x12345678ABCDEF01LL is a large constant that can be loaded with 2x
+   XXSPLTI32DX instructions.  */
+vector long long
+vector_large_constant (void)
+{
+  /* 2x XXSPLTI32DX.  */
+  return (vector long long) { LARGE_BITS, LARGE_BITS };
+}
+
+/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 4 } } */


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [gcc(refs/users/meissner/heads/work070)] Generate XXSPLTI32DX on power10.
@ 2021-10-05 22:15 Michael Meissner
  0 siblings, 0 replies; 4+ messages in thread
From: Michael Meissner @ 2021-10-05 22:15 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:77ef7ba7273c2292af3ab2635014556a66502dc9

commit 77ef7ba7273c2292af3ab2635014556a66502dc9
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Tue Oct 5 18:15:26 2021 -0400

    Generate XXSPLTI32DX on power10.
    
    This patch generates XXSPLTI32DX for SF/DF floating point constants that
    cannot be generated with the XXSPLTIDP instruction.  In addition, it adds
    support for using XXSPLTI32DX to load up V2DF constants, where both constants
    are the same.
    
    At the present time, XXSPLTI32DX is not enabled by default.
    
    2021-10-05  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/constraint.md (eD): New constraint.
            * config/rs6000/predicates.md (easy_fp_constant): If the constant
            can be loaded with XXSPLTI32DX, it is easy.
            (easy_vector_constant_2insns): New predicate.
            (easy_vector_constant): If the constant can be loaded with
            XXSPLTI32DX, it is easy.
            * config/rs6000/rs6000-protos.h (xxsplti32dx_constant_immediate):
            New declaration.
            * config/rs6000/rs6000.c (xxsplti32dx_constant_immediate): New
            helper function.
            (output_vec_const_move): If the operand can be loaded with
            XXSPLTI32DX, split it.
            (rs6000_output_move_128bit): Likewise.
            (prefixed_xxsplti_p): Constants loaded with XXSPLTI32DX are
            prefixed.
            * config/rs6000/rs6000.md (movsf_hardfloat): Add support for
            constants loaded with XXSPLTI32DX.
            (mov<mode>_hardfloat32, FMOVE64 iterator):  Likewise.
            (mov<mode>_hardfloat64, FMOVE64 iterator): Likewise.
            (movdi_internal32): Likewise.
            (movdi_internal64): Likewise.
            * config/rs6000/rs6000.opt (-mxxsplti32dx): New option.
            * config/rs6000/vsx.md (UNSPEC_XXSPLTI32DX_CONST): New unspec.
            (vsx_mov<mode>_64bit): Add support for constants loaded with
            XXSPLTI32DX.
            (vsx_mov<mode>_32bit): Likewise.
            (XXSPLTI32DX): New mode iterator.
            (splitter for XXSPLTI32DX): Add splitter for constants loaded with
            XXSPLTI32DX.
            (xxsplti32dx_<mode>_first): New insns.
            (xxsplti32dx_<mode>_second): New insns.
            * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
            eD constraint.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/vec-splat-constant-df-2.c: New test.
            * gcc.target/powerpc/vec-splat-constant-di-2.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2df-2.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2di-2.c: New test.

Diff:
---
 gcc/config/rs6000/constraints.md  |   6 ++
 gcc/config/rs6000/predicates.md   |  62 ++++++++++++++++++++
 gcc/config/rs6000/rs6000-protos.h |   1 +
 gcc/config/rs6000/rs6000.c        |  65 ++++++++++++++++++++-
 gcc/config/rs6000/rs6000.md       | 119 +++++++++++++++++++++++++++++---------
 gcc/config/rs6000/rs6000.opt      |   5 ++
 gcc/config/rs6000/vsx.md          | 107 +++++++++++++++++++++++++++++-----
 gcc/doc/md.texi                   |   3 +
 8 files changed, 324 insertions(+), 44 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index 46daeb0861c..f9d1d1ab446 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -213,6 +213,12 @@
   "A 64-bit scalar constant that can be loaded with the XXSPLTIDP instruction."
   (match_operand 0 "easy_fp_constant_64bit_scalar"))
 
+;; DImode, DFmode, V2DImode, V2DFmode constant that can be loaded with 2
+;; XXSPLTI32DX instruction.
+(define_constraint "eD"
+  "A constant that can be loaded with a pair of XXSPLTI32DX instructions."
+  (match_operand 0 "easy_vector_constant_2insns"))
+
 ;; 34-bit signed integer constant
 (define_constraint "eI"
   "A signed 34-bit integer constant if prefixed instructions are supported."
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 9b9f5934e58..49b0cb2a060 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -611,6 +611,11 @@
   if (easy_fp_constant_ieee128 (op, mode))
     return 1;
 
+  /* If we have the ISA 3.1 XXSPLTI32DX instruction, see if the constant can be
+     loaded with a pair of instructions.  */
+  if (easy_vector_constant_2insns (op, mode))
+    return 1;
+
   /* Otherwise consider floating point constants hard, so that the
      constant gets pushed to memory during the early RTL phases.  This
      has the advantage that double precision constants that can be
@@ -751,6 +756,60 @@
   return easy_fp_constant_64bit_scalar (op, GET_MODE_INNER (mode));
 })
 
+;; Return 1 if the operand is either a DImode/DFmode scalar constant or
+;; V2DImode/V2DFmode vector constant that needs 2 XXSPLTI32DX instructions to
+;; load the value
+
+(define_predicate "easy_vector_constant_2insns"
+  (match_code "const_vector,vec_duplicate,const_int,const_double")
+{
+  /* Can we do the XXSPLTI32DX instruction?  */
+  if (!TARGET_XXSPLTI32DX || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  if (mode == VOIDmode)
+    mode = GET_MODE (op);
+
+  /* Convert vector constant/duplicate into a scalar.  */
+  if (CONST_VECTOR_P (op))
+    {
+      if (!CONST_VECTOR_DUPLICATE_P (op))
+	return false;
+
+      op = CONST_VECTOR_ELT (op, 0);
+      mode = GET_MODE_INNER (mode);
+    }
+
+  else if (GET_CODE (op) == VEC_DUPLICATE)
+    {
+      op = XEXP (op, 0);
+      mode = GET_MODE_INNER (mode);
+    }
+
+  if (GET_MODE_SIZE (mode) > 8)
+    return false;
+
+  /* 0.0 or 0 is easy to generate.  */
+  if (op == CONST0_RTX (mode))
+    return false;
+
+  /* If we can load up the constant in other ways (either a single load
+     constant and a direct move or XXSPLTIDP), don't generate the
+     XXSPLTI32DX.  */
+  if (CONST_INT_P (op))
+    return !(satisfies_constraint_I (op)
+             || satisfies_constraint_L (op)
+             || satisfies_constraint_eI (op)
+             || easy_fp_constant_64bit_scalar (op, mode));
+
+  /* For floating point, if we can use XXSPLTIDP, we don't want to
+     generate XXSPLTI32DX's.  */
+  else if (CONST_DOUBLE_P (op) && (mode == SFmode || mode == DFmode))
+    return !easy_fp_constant_64bit_scalar (op, mode);
+
+  return false;
+})
+
 ;; Return 1 if the operand is a constant that can be loaded with the XXSPLTIW
 ;; instruction that loads up a 32-bit immediate and splats it into the vector.
 
@@ -972,6 +1031,9 @@
       if (easy_vector_constant_splat_word (op, mode))
 	return true;
 
+      if (easy_vector_constant_2insns (op, mode))
+	return 1;
+
       if (TARGET_P9_VECTOR
           && xxspltib_constant_p (op, mode, &num_insns, &value))
 	return true;
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 540c401e7ad..f517624cc56 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -32,6 +32,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, int, int, int,
 
 extern int easy_altivec_constant (rtx, machine_mode);
 extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
+extern void xxsplti32dx_constant_immediate (rtx, machine_mode, long *, long *);
 extern long xxspltidp_constant_immediate (rtx, machine_mode);
 extern long xxspltiw_constant_immediate (rtx, machine_mode);
 extern int lxvkq_constant_immediate (rtx, machine_mode);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index f93a7c80801..6bd34444c07 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6951,6 +6951,59 @@ xxspltib_constant_p (rtx op,
   return true;
 }
 
+/* Return the two 32-bit constants to use in the two XXSPLTI32DX instructions
+   via HIGH_PTR and LOW_PTR.  */
+
+void
+xxsplti32dx_constant_immediate (rtx op,
+				machine_mode mode,
+				long *high_ptr,
+				long *low_ptr)
+{
+  gcc_assert (easy_vector_constant_2insns (op, mode));
+
+  if (mode == VOIDmode)
+    mode = GET_MODE (op);
+
+  if (CONST_VECTOR_P (op))
+    {
+      op = CONST_VECTOR_ELT (op, 0);
+      mode = GET_MODE_INNER (mode);
+    }
+
+  else if (GET_CODE (op) == VEC_DUPLICATE)
+    {
+      op = XEXP (op, 0);
+      mode = GET_MODE_INNER (mode);
+    }
+
+  if (CONST_INT_P (op))
+    {
+      HOST_WIDE_INT value = INTVAL (op);
+      *high_ptr = (value >> 32) & 0xffffffff;
+      *low_ptr = value & 0xffffffff;
+      return;
+    }
+
+  else if (CONST_DOUBLE_P (op) && (mode == SFmode || mode == DFmode))
+    {
+      long high_low[2];
+      const struct real_value *rv = CONST_DOUBLE_REAL_VALUE (op);
+      REAL_VALUE_TO_TARGET_DOUBLE (*rv, high_low);
+
+      /* The double precision value is laid out in memory order.  We need to
+	 undo this for XXSPLTI32DX.  */
+      if (!BYTES_BIG_ENDIAN)
+	std::swap (high_low[0], high_low[1]);
+
+      *high_ptr = high_low[0] & 0xffffffff;
+      *low_ptr = high_low[1] & 0xffffffff;
+      return;
+    }
+
+  gcc_unreachable ();
+}
+
 /* Return the immediate value used in the XXSPLTIDP instruction.  */
 
 long
@@ -7234,6 +7287,9 @@ output_vec_const_move (rtx *operands)
 	  return "lxvkq %x0,%2";
 	}
 
+      if (easy_vector_constant_2insns (vec, mode))
+	return "#";
+
       if (TARGET_P9_VECTOR
 	  && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
 	{
@@ -14082,6 +14138,9 @@ rs6000_output_move_128bit (rtx operands[])
       return "lxvkq %x0,%2";
     }
 
+  else if (dest_vsx_p && easy_vector_constant_2insns (src, mode))
+    return "#";
+
   else if (dest_regno >= 0
 	   && (CONST_INT_P (src)
 	       || CONST_WIDE_INT_P (src)
@@ -26996,11 +27055,13 @@ prefixed_xxsplti_p (rtx_insn *insn)
     case E_DImode:
     case E_DFmode:
     case E_SFmode:
-      return easy_fp_constant_64bit_scalar (src, mode);
+      return (easy_fp_constant_64bit_scalar (src, mode)
+	      || easy_vector_constant_2insns (src, mode));
 
     case E_V2DImode:
     case E_V2DFmode:
-      return easy_vector_constant_64bit_element (src, mode);
+      return (easy_vector_constant_64bit_element (src, mode)
+	      || easy_vector_constant_2insns (src, mode));
 
     case E_V16QImode:
     case E_V8HImode:
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 8afc4b2756d..5c120ef1672 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -7764,17 +7764,17 @@
 ;;
 ;;	LWZ          LFS        LXSSP       LXSSPX     STFS       STXSSP
 ;;	STXSSPX      STW        XXLXOR      LI         FMR        XSCPSGNDP
-;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP
+;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP  XXSPLTI32DX
 
 (define_insn "movsf_hardfloat"
   [(set (match_operand:SF 0 "nonimmediate_operand"
 	 "=!r,       f,         v,          wa,        m,         wY,
 	  Z,         m,         wa,         !r,        f,         wa,
-	  !r,        *c*l,      !r,         *h,        wa")
+	  !r,        *c*l,      !r,         *h,        wa,        wa")
 	(match_operand:SF 1 "input_operand"
 	 "m,         m,         wY,         Z,         f,         v,
 	  wa,        r,         j,          j,         f,         wa,
-	  r,         r,         *h,         0,         eF"))]
+	  r,         r,         *h,         0,         eF,        eD"))]
   "(register_operand (operands[0], SFmode)
    || register_operand (operands[1], SFmode))
    && TARGET_HARD_FLOAT
@@ -7797,15 +7797,24 @@
    mt%0 %1
    mf%1 %0
    nop
+   #
    #"
   [(set_attr "type"
 	"load,       fpload,    fpload,     fpload,    fpstore,   fpstore,
 	 fpstore,    store,     veclogical, integer,   fpsimple,  fpsimple,
-	 *,          mtjmpr,    mfjmpr,     *,         vecperm")
+	 *,          mtjmpr,    mfjmpr,     *,         vecperm,   vecperm")
    (set_attr "isa"
 	"*,          *,         p9v,        p8v,       *,         p9v,
 	 p8v,        *,         *,          *,         *,         *,
-	 *,          *,         *,          *,         p10")])
+	 *,          *,         *,          *,         p10,       p10")
+   (set_attr "max_prefixed_insns"
+	"*,          *,         *,          *,         *,         *,
+	 *,          *,         *,          *,         *,         *,
+	 *,          *,         *,          *,         *,         2")
+   (set_attr "num_insns"
+	"*,          *,         *,          *,         *,         *,
+	 *,          *,         *,          *,         *,         *,
+	 *,          *,         *,          *,         *,         2")])
 
 ;;	LWZ          LFIWZX     STW        STFIWX     MTVSRWZ    MFVSRWZ
 ;;	FMR          MR         MT%0       MF%1       NOP
@@ -8065,18 +8074,18 @@
 
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSD         STXSD       XXLOR       XXLXOR      GPR<-0
-;;           LWZ          STW         MR          XXSPLTIDP
+;;           LWZ          STW         MR          XXSPLTIDP   XXSPLTI32DX
 
 
 (define_insn "*mov<mode>_hardfloat32"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
             "=m,          d,          d,          <f64_p9>,   wY,
               <f64_av>,   Z,          <f64_vsx>,  <f64_vsx>,  !r,
-              Y,          r,          !r,         wa")
+              Y,          r,          !r,         wa,         wa")
 	(match_operand:FMOVE64 1 "input_operand"
              "d,          m,          d,          wY,         <f64_p9>,
               Z,          <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
-              r,          Y,          r,          eF"))]
+              r,          Y,          r,          eF,         eD"))]
   "! TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8094,20 +8103,29 @@
    #
    #
    #
+   #
    #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, two,
-             store,       load,       two,        vecperm")
+             store,       load,       two,        vecperm,    vecperm")
    (set_attr "size" "64")
    (set_attr "length"
             "*,           *,          *,          *,          *,
              *,           *,          *,          *,          8,
-             8,           8,          8,          *")
+             8,           8,          8,          *,          *")
+   (set_attr "num_insns"
+            "*,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *,          2")
+   (set_attr "max_prefixed_insns"
+            "*,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *,          2")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
-             *,           *,          *,          p10")])
+             *,           *,          *,          p10,        p10")])
 
 ;;           STW      LWZ     MR      G-const H-const F-const
 
@@ -8134,19 +8152,19 @@
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSDX        STXSDX      XXLOR       XXLXOR      LI 0
 ;;           STD          LD          MR          MT{CTR,LR}  MF{CTR,LR}
-;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP
+;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP   XXSPLTI32DX
 
 (define_insn "*mov<mode>_hardfloat64"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
            "=m,           d,          d,          <f64_p9>,   wY,
              <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
              YZ,          r,          !r,         *c*l,       !r,
-            *h,           r,          <f64_dm>,   wa")
+            *h,           r,          <f64_dm>,   wa,         wa")
 	(match_operand:FMOVE64 1 "input_operand"
             "d,           m,          d,          wY,         <f64_p9>,
              Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
              r,           YZ,         r,          r,          *h,
-             0,           <f64_dm>,   r,          eF"))]
+             0,           <f64_dm>,   r,          eF,         eD"))]
   "TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8169,18 +8187,29 @@
    nop
    mfvsrd %0,%x1
    mtvsrd %x0,%1
+   #
    #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, integer,
              store,       load,       *,          mtjmpr,     mfjmpr,
-             *,           mfvsr,      mtvsr,      vecperm")
+             *,           mfvsr,      mtvsr,      vecperm,    vecperm")
    (set_attr "size" "64")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
              *,           *,          *,          *,          *,
-             *,           p8v,        p8v,        p10")])
+             *,           p8v,        p8v,        p10,        p10")
+   (set_attr "num_insns"
+            "*,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *,          2")
+   (set_attr "max_prefixed_insns"
+            "*,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *,          2")])
 
 ;;           STD      LD       MR      MT<SPR> MF<SPR> G-const
 ;;           H-const  F-const  Special
@@ -9228,7 +9257,7 @@
 ;; a gpr into a fpr instead of reloading an invalid 'Y' address
 
 ;;        GPR store  GPR load   GPR move   FPR store  FPR load   FPR move
-;;	  XXSPLTIDP
+;;	  XXSPLTIDP  XXSPLTI32DX
 ;;        GPR const  AVX store  AVX store  AVX load   AVX load   VSX move
 ;;        P9 0       P9 -1      AVX 0/-1   VSX 0      VSX -1     P9 const
 ;;        AVX const  
@@ -9236,13 +9265,13 @@
 (define_insn "*movdi_internal32"
   [(set (match_operand:DI 0 "nonimmediate_operand"
          "=Y,        r,         r,         m,         ^d,        ^d,
-          ^wa,
+          ^wa,       ^wa,
           r,         wY,        Z,         ^v,        $v,        ^wa,
           wa,        wa,        v,         wa,        *i,        v,
           v")
 	(match_operand:DI 1 "input_operand"
          "r,         Y,         r,         ^d,        m,         ^d,
-          eF,
+          eF,        eD,
           IJKnF,     ^v,        $v,        wY,        Z,         ^wa,
           Oj,        wM,        OjwM,      Oj,        wM,        wS,
           wB"))]
@@ -9258,6 +9287,7 @@
    fmr %0,%1
    #
    #
+   #
    stxsd %1,%0
    stxsdx %x1,%y0
    lxsd %0,%1
@@ -9272,20 +9302,32 @@
    #"
   [(set_attr "type"
          "store,     load,      *,         fpstore,   fpload,    fpsimple,
-          vecperm,
+          vecperm,   vecperm,
           *,         fpstore,   fpstore,   fpload,    fpload,    veclogical,
           vecsimple, vecsimple, vecsimple, veclogical,veclogical,vecsimple,
           vecsimple")
    (set_attr "size" "64")
    (set_attr "length"
          "8,         8,         8,         *,         *,         *,
-          *,
+          *,         *,
           16,        *,         *,         *,         *,         *,
           *,         *,         *,         *,         *,         8,
           *")
+   (set_attr "num_insns"
+         "*,         *,         *,         *,         *,         *,
+          *,         *,
+          *,         *,         *,         *,         *,         *,
+          *,         *,         *,         *,         *,         *,
+          *")
+   (set_attr "max_prefixed_insns"
+         "*,         *,         *,         *,         *,         *,
+          *,         *,
+          *,         *,         *,         *,         *,         *,
+          *,         *,         *,         *,         *,         *,
+          *")
    (set_attr "isa"
          "*,         *,         *,         *,         *,         *,
-          p10,
+          p10,       p10,
           *,         p9v,       p7v,       p9v,       p7v,       *,
           p9v,       p9v,       p7v,       *,         *,         p7v,
           p7v")])
@@ -9321,7 +9363,7 @@
 })
 
 ;;	   GPR store   GPR load    GPR move
-;;	   XXSPLTIDP
+;;	   XXSPLTIDP   XXSPLTI32DX
 ;;	   GPR li      GPR lis     GPR pli     GPR #
 ;;	   FPR store   FPR load    FPR move
 ;;	   AVX store   AVX store   AVX load    AVX load    VSX move
@@ -9332,7 +9374,7 @@
 (define_insn "*movdi_internal64"
   [(set (match_operand:DI 0 "nonimmediate_operand"
 	  "=YZ,        r,          r,
-	   ^wa,
+	   ^wa,        ^wa,
 	   r,          r,          r,          r,
 	   m,          ^d,         ^d,
 	   wY,         Z,          $v,         $v,         ^wa,
@@ -9342,7 +9384,7 @@
 	   ?r,         ?wa")
 	(match_operand:DI 1 "input_operand"
 	  "r,          YZ,         r,
-	   eF,
+	   eF,         eD,
 	   I,          L,          eI,         nF,
 	   ^d,         m,          ^d,
 	   ^v,         $v,         wY,         Z,          ^wa,
@@ -9358,6 +9400,7 @@
    ld%U1%X1 %0,%1
    mr %0,%1
    #
+   #
    li %0,%1
    lis %0,%v1
    li %0,%1
@@ -9384,7 +9427,7 @@
    mtvsrd %x0,%1"
   [(set_attr "type"
 	  "store,      load,       *,
-	   vecperm,
+	   vecperm,    vecperm,
 	   *,          *,          *,          *,
 	   fpstore,    fpload,     fpsimple,
 	   fpstore,    fpstore,    fpload,     fpload,     veclogical,
@@ -9395,7 +9438,7 @@
    (set_attr "size" "64")
    (set_attr "length"
 	  "*,          *,          *,
-	   *,
+	   *,          *,
 	   *,          *,          *,          20,
 	   *,          *,          *,
 	   *,          *,          *,          *,          *,
@@ -9403,9 +9446,29 @@
 	   8,          *,
 	   *,          *,          *,
 	   *,          *")
+   (set_attr "num_insns"
+	  "*,          *,          *,
+	   *,          2,
+	   *,          *,          *,          *,
+	   *,          *,          *,
+	   *,          *,          *,          *,          *,
+	   *,          *,          *,          *,          *,
+	   8,          *,
+	   *,          *,          *,
+	   *,          *")
+   (set_attr "max_prefixed_insns"
+	  "*,          *,          *,
+	   *,          2,
+	   *,          *,          *,          *,
+	   *,          *,          *,
+	   *,          *,          *,          *,          *,
+	   *,          *,          *,          *,          *,
+	   8,          *,
+	   *,          *,          *,
+	   *,          *")
    (set_attr "isa"
 	  "*,          *,          *,
-	   p10,
+	   p10,        p10,
 	   *,          *,          p10,        *,
 	   *,          *,          *,
 	   p9v,        p7v,        p9v,        p7v,        *,
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index a53aad72547..898bc4e9e6e 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -640,6 +640,11 @@ mprivileged
 Target Var(rs6000_privileged) Init(0)
 Generate code that will run in privileged state.
 
+;; Do not enable at this time.
+mxxsplti32dx
+Target Undocumented Var(TARGET_XXSPLTI32DX) Init(0) Save
+Generate (do not generate) XXSPLTI32DX instructions.
+
 mxxspltidp
 Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
 Generate (do not generate) XXSPLTIDP instructions.
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 712e5df0c02..cc21c454491 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -376,6 +376,7 @@
    UNSPEC_XXSPLTIW
    UNSPEC_XXSPLTIDP
    UNSPEC_XXSPLTI32DX
+   UNSPEC_XXSPLTI32DX_CONST
    UNSPEC_XXBLEND
    UNSPEC_XXPERMX
   ])
@@ -1191,19 +1192,19 @@
 ;; instruction). But generate XXLXOR/XXLORC if it will avoid a register move.
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
-;;              XXSPLTIDP  XXSPLTIW   LXVKQ
+;;              XXSPLTIDP  XXSPLTIW   LXVKQ     XXSPLTI32DX
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        r,         we,        ?wQ,
-                wa,        wa,        wa,
+                wa,        wa,        wa,        wa,
                 ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
                 ?wa,       v,         <??r>,     wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        we,        r,         r,
-                eV,        eW,        eQ,
+                eV,        eW,        eQ,        eD,
                 wQ,        Y,         r,         r,         wE,        jwM,
                 ?jwM,      W,         <nW>,      v,         wZ"))]
 
@@ -1215,44 +1216,44 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
-                vecperm,   vecperm,   vecperm,
+                vecperm,   vecperm,   vecperm,   vecperm,
                 store,     load,      store,     *,         vecsimple, vecsimple,
                 vecsimple, *,         *,         vecstore,  vecload")
    (set_attr "num_insns"
                "*,         *,         *,         2,         *,         2,
-                *,         *,         *,
+                *,         *,         *,         2,
                 2,         2,         2,         2,         *,         *,
                 *,         5,         2,         *,         *")
    (set_attr "max_prefixed_insns"
                "*,         *,         *,         *,         *,         2,
-                *,         *,         *,
+                *,         *,         *,         2,
                 2,         2,         2,         2,         *,         *,
                 *,         *,         *,         *,         *")
    (set_attr "length"
                "*,         *,         *,         8,         *,         8,
-                *,         *,         *,
+                *,         *,         *,         *,
                 8,         8,         8,         8,         *,         *,
                 *,         20,        8,         *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
-                p10,       p10,       p10,
+                p10,       p10,       p10,       p10,
                 *,         *,         *,         *,         p9v,       *,
                 <VSisa>,   *,         *,         *,         *")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
-;;              XXSPLTIDP  XXSPLTIW   LXVKQ
+;;              XXSPLTIDP  XXSPLTIW   LXVKQ      XXSPLTI32DX
 ;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
-                wa,        wa,        wa,
+                wa,        wa,        wa,        wa,
                 wa,        v,         ?wa,       v,         <??r>,
                 wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        Y,         r,         r,
-                eV,        eW,        eQ,
+                eV,        eW,        eQ,        eD,
                 wE,        jwM,       ?jwM,      W,         <nW>,
                 v,         wZ"))]
 
@@ -1264,17 +1265,27 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, load,      store,    *,
-                vecperm,   vecperm,   vecperm,
+                vecperm,   vecperm,   vecperm,   vecperm,
                 vecsimple, vecsimple, vecsimple, *,         *,
                 vecstore,  vecload")
    (set_attr "length"
                "*,         *,         *,         16,        16,        16,
-                *,         *,         *,
+                *,         *,         *,         *,
                 *,         *,         *,         20,        16,
                 *,         *")
+   (set_attr "num_insns"
+               "*,         *,         *,         *,         *,         *,
+                *,         *,         *,         2,
+                *,         *,         *,         *,         *,
+                *,         *")
+   (set_attr "length"
+               "*,         *,         *,         *,         *,         *,
+                *,         *,         *,         2,
+                *,         *,         *,         *,         *,
+                *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
-                p10,       p10,       p10,
+                p10,       p10,       p10,       p10,
                 p9v,       *,         <VSisa>,   *,         *,
                 *,         *")])
 
@@ -6570,6 +6581,74 @@
   [(set_attr "type" "vecperm")
    (set_attr "prefixed" "yes")])
 
+;; XXSPLTI32DX used to create 64-bit constants or vector constants where the
+;; even elements match and the odd elements match.
+(define_mode_iterator XXSPLTI32DX [DI SF DF V2DF V2DI])
+
+;; Don't split DImode before register allocation, so that it has a better
+;; chance of winding up in a GPR register.
+(define_split
+  [(set (match_operand:XXSPLTI32DX 0 "vsx_register_operand")
+	(match_operand:XXSPLTI32DX 1 "easy_vector_constant_2insns"))]
+  "TARGET_POWER10 && (reload_completed || <MODE>mode != DImode)"
+  [(set (match_dup 0)
+	(unspec:XXSPLTI32DX [(match_dup 2)
+			     (match_dup 3)] UNSPEC_XXSPLTI32DX_CONST))
+   (set (match_dup 0)
+	(unspec:XXSPLTI32DX [(match_dup 0)
+			     (match_dup 4)
+			     (match_dup 5)] UNSPEC_XXSPLTI32DX_CONST))]
+{
+  long high = 0, low = 0;
+
+  xxsplti32dx_constant_immediate (operands[1], <MODE>mode, &high, &low);
+
+  /* If the low bits are 0 or all 1s, initialize that word first.  This way we
+     can use a smaller XXSPLTIB/XXLXOR/XXLORC instruction instead the first
+     XXSPLTI32DX.  */
+  if (low == 0 || low ==  -1)
+    {
+      operands[2] = const1_rtx;
+      operands[3] = GEN_INT (low);
+      operands[4] = const0_rtx;
+      operands[5] = GEN_INT (high);
+    }
+  else
+    {
+      operands[2] = const0_rtx;
+      operands[3] = GEN_INT (high);
+      operands[4] = const1_rtx;
+      operands[5] = GEN_INT (low);
+    }
+})
+
+;; First word of XXSPLTI32DX
+(define_insn "*xxsplti32dx_<mode>_first"
+  [(set (match_operand:XXSPLTI32DX 0 "vsx_register_operand" "=wa,wa,wa")
+	(unspec:XXSPLTI32DX [(match_operand 1 "u1bit_cint_operand" "n,n,n")
+			     (match_operand 2 "const_int_operand" "O,wM,n")]
+			    UNSPEC_XXSPLTI32DX_CONST))]
+  "TARGET_XXSPLTI32DX"
+  "@
+   xxlxor %x0,%x0,%x0
+   xxlorc %x0,%x0,%x0
+   xxsplti32dx %x0,%1,%2"
+  [(set_attr "type" "veclogical,veclogical,vecperm")
+   (set_attr "prefixed" "*,*,yes")])
+
+;; Second word of XXSPLTI32DX
+(define_insn "*xxsplti32dx_<mode>_second"
+  [(set (match_operand:XXSPLTI32DX 0 "vsx_register_operand" "=wa")
+	(unspec:XXSPLTI32DX [(match_operand:XXSPLTI32DX 1 "vsx_register_operand" "0")
+			     (match_operand 2 "u1bit_cint_operand" "n")
+			     (match_operand 3 "const_int_operand" "n")]
+			    UNSPEC_XXSPLTI32DX_CONST))]
+  "TARGET_XXSPLTI32DX"
+  "xxsplti32dx %x0,%2,%3"
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
+
 ;; XXBLEND built-in function support
 (define_insn "xxblend_<mode>"
   [(set (match_operand:VM3 0 "register_operand" "=wa")
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 4ad0e745c94..feaa205291a 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3333,6 +3333,9 @@ The integer constant zero.
 A constant whose negation is a signed 16-bit constant.
 @end ifset
 
+@item eD
+A constant that can be loaded with a pair of XXSPLTI32DX instructions.
+
 @item eF
 A 64-bit scalar constant that can be loaded with the XXSPLTIDP instruction.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [gcc(refs/users/meissner/heads/work070)] Generate XXSPLTI32DX on power10.
@ 2021-10-04 22:22 Michael Meissner
  0 siblings, 0 replies; 4+ messages in thread
From: Michael Meissner @ 2021-10-04 22:22 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:0a4de651122984294d43f1bdae9e8517f2eee78d

commit 0a4de651122984294d43f1bdae9e8517f2eee78d
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Mon Oct 4 18:21:49 2021 -0400

    Generate XXSPLTI32DX on power10.
    
    This patch generates XXSPLTI32DX for SF/DF floating point constants that
    cannot be generated with the XXSPLTIDP instruction.  In addition, it adds
    support for using XXSPLTI32DX to load up V2DF constants, where both constants
    are the same.
    
    At the present time, XXSPLTI32DX is not enabled by default.
    
    2021-10-04  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/constraint.md (eD): New constraint.
            * config/rs6000/predicates.md (easy_fp_constant): If the constant
            can be loaded with XXSPLTI32DX, it is easy.
            (easy_vector_constant_2insns): New predicate.
            (easy_vector_constant): If the constant can be loaded with
            XXSPLTI32DX, it is easy.
            * config/rs6000/rs6000-protos.h (xxsplti32dx_constant_immediate):
            New declaration.
            * config/rs6000/rs6000.c (xxsplti32dx_constant_immediate): New
            helper function.
            (output_vec_const_move): If the operand can be loaded with
            XXSPLTI32DX, split it.
            (rs6000_output_move_128bit): Likewise.
            (prefixed_xxsplti_p): Constants loaded with XXSPLTI32DX are
            prefixed.
            * config/rs6000/rs6000.md (movsf_hardfloat): Add support for
            constants loaded with XXSPLTI32DX.
            (mov<mode>_hardfloat32, FMOVE64 iterator):  Likewise.
            (mov<mode>_hardfloat64, FMOVE64 iterator): Likewise.
            (movdi_internal32): Likewise.
            (movdi_internal64): Likewise.
            * config/rs6000/rs6000.opt (-mxxsplti32dx): New option.
            * config/rs6000/vsx.md (UNSPEC_XXSPLTI32DX_CONST): New unspec.
            (vsx_mov<mode>_64bit): Add support for constants loaded with
            XXSPLTI32DX.
            (vsx_mov<mode>_32bit): Likewise.
            (XXSPLTI32DX): New mode iterator.
            (splitter for XXSPLTI32DX): Add splitter for constants loaded with
            XXSPLTI32DX.
            (xxsplti32dx_<mode>_first): New insns.
            (xxsplti32dx_<mode>_second): New insns.
            * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
            eD constraint.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/vec-splat-constant-df-2.c: New test.
            * gcc.target/powerpc/vec-splat-constant-di-2.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2df-2.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2di-2.c: New test.

Diff:
---
 gcc/config/rs6000/constraints.md                   |   6 ++
 gcc/config/rs6000/predicates.md                    |  62 +++++++++++
 gcc/config/rs6000/rs6000-protos.h                  |   1 +
 gcc/config/rs6000/rs6000.c                         |  65 ++++++++++-
 gcc/config/rs6000/rs6000.md                        | 119 ++++++++++++++++-----
 gcc/config/rs6000/rs6000.opt                       |   5 +
 gcc/config/rs6000/vsx.md                           | 107 +++++++++++++++---
 gcc/doc/md.texi                                    |   3 +
 .../gcc.target/powerpc/vec-splat-constant-df-2.c   |  24 +++++
 .../gcc.target/powerpc/vec-splat-constant-di-2.c   |  38 +++++++
 .../gcc.target/powerpc/vec-splat-constant-v2df-2.c |  24 +++++
 .../gcc.target/powerpc/vec-splat-constant-v2di-2.c |  29 +++++
 12 files changed, 439 insertions(+), 44 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index 46daeb0861c..f9d1d1ab446 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -213,6 +213,12 @@
   "A 64-bit scalar constant that can be loaded with the XXSPLTIDP instruction."
   (match_operand 0 "easy_fp_constant_64bit_scalar"))
 
+;; DImode, DFmode, V2DImode, V2DFmode constant that can be loaded with 2
+;; XXSPLTI32DX instruction.
+(define_constraint "eD"
+  "A constant that can be loaded with a pair of XXSPLTI32DX instructions."
+  (match_operand 0 "easy_vector_constant_2insns"))
+
 ;; 34-bit signed integer constant
 (define_constraint "eI"
   "A signed 34-bit integer constant if prefixed instructions are supported."
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 78e64a8a1d4..c8f0d62d75b 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -611,6 +611,11 @@
   if (easy_fp_constant_ieee128 (op, mode))
     return 1;
 
+  /* If we have the ISA 3.1 XXSPLTI32DX instruction, see if the constant can be
+     loaded with a pair of instructions.  */
+  if (easy_vector_constant_2insns (op, mode))
+    return 1;
+
   /* Otherwise consider floating point constants hard, so that the
      constant gets pushed to memory during the early RTL phases.  This
      has the advantage that double precision constants that can be
@@ -751,6 +756,60 @@
   return easy_fp_constant_64bit_scalar (op, GET_MODE_INNER (mode));
 })
 
+;; Return 1 if the operand is either a DImode/DFmode scalar constant or
+;; V2DImode/V2DFmode vector constant that needs 2 XXSPLTI32DX instructions to
+;; load the value
+
+(define_predicate "easy_vector_constant_2insns"
+  (match_code "const_vector,vec_duplicate,const_int,const_double")
+{
+  /* Can we do the XXSPLTI32DX instruction?  */
+  if (!TARGET_XXSPLTI32DX || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  if (mode == VOIDmode)
+    mode = GET_MODE (op);
+
+  /* Convert vector constant/duplicate into a scalar.  */
+  if (CONST_VECTOR_P (op))
+    {
+      if (!CONST_VECTOR_DUPLICATE_P (op))
+	return false;
+
+      op = CONST_VECTOR_ELT (op, 0);
+      mode = GET_MODE_INNER (mode);
+    }
+
+  else if (GET_CODE (op) == VEC_DUPLICATE)
+    {
+      op = XEXP (op, 0);
+      mode = GET_MODE_INNER (mode);
+    }
+
+  if (GET_MODE_SIZE (mode) > 8)
+    return false;
+
+  /* 0.0 or 0 is easy to generate.  */
+  if (op == CONST0_RTX (mode))
+    return false;
+
+  /* If we can load up the constant in other ways (either a single load
+     constant and a direct move or XXSPLTIDP), don't generate the
+     XXSPLTI32DX.  */
+  if (CONST_INT_P (op))
+    return !(satisfies_constraint_I (op)
+             || satisfies_constraint_L (op)
+             || satisfies_constraint_eI (op)
+             || easy_fp_constant_64bit_scalar (op, mode));
+
+  /* For floating point, if we can use XXSPLTIDP, we don't want to
+     generate XXSPLTI32DX's.  */
+  else if (CONST_DOUBLE_P (op) && (mode == SFmode || mode == DFmode))
+    return !easy_fp_constant_64bit_scalar (op, mode);
+
+  return false;
+})
+
 ;; Return 1 if the operand is a constant that can be loaded with the XXSPLTIW
 ;; instruction that loads up a 32-bit immediate and splats it into the vector.
 
@@ -975,6 +1034,9 @@
       if (easy_vector_constant_splat_word (op, mode))
 	return true;
 
+      if (easy_vector_constant_2insns (op, mode))
+	return 1;
+
       if (TARGET_P9_VECTOR
           && xxspltib_constant_p (op, mode, &num_insns, &value))
 	return true;
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 540c401e7ad..f517624cc56 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -32,6 +32,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, int, int, int,
 
 extern int easy_altivec_constant (rtx, machine_mode);
 extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
+extern void xxsplti32dx_constant_immediate (rtx, machine_mode, long *, long *);
 extern long xxspltidp_constant_immediate (rtx, machine_mode);
 extern long xxspltiw_constant_immediate (rtx, machine_mode);
 extern int lxvkq_constant_immediate (rtx, machine_mode);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 79123f4e834..f5ca8eb1703 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6946,6 +6946,59 @@ xxspltib_constant_p (rtx op,
   return true;
 }
 
+/* Return the two 32-bit constants to use in the two XXSPLTI32DX instructions
+   via HIGH_PTR and LOW_PTR.  */
+
+void
+xxsplti32dx_constant_immediate (rtx op,
+				machine_mode mode,
+				long *high_ptr,
+				long *low_ptr)
+{
+  gcc_assert (easy_vector_constant_2insns (op, mode));
+
+  if (mode == VOIDmode)
+    mode = GET_MODE (op);
+
+  if (CONST_VECTOR_P (op))
+    {
+      op = CONST_VECTOR_ELT (op, 0);
+      mode = GET_MODE_INNER (mode);
+    }
+
+  else if (GET_CODE (op) == VEC_DUPLICATE)
+    {
+      op = XEXP (op, 0);
+      mode = GET_MODE_INNER (mode);
+    }
+
+  if (CONST_INT_P (op))
+    {
+      HOST_WIDE_INT value = INTVAL (op);
+      *high_ptr = (value >> 32) & 0xffffffff;
+      *low_ptr = value & 0xffffffff;
+      return;
+    }
+
+  else if (CONST_DOUBLE_P (op) && (mode == SFmode || mode == DFmode))
+    {
+      long high_low[2];
+      const struct real_value *rv = CONST_DOUBLE_REAL_VALUE (op);
+      REAL_VALUE_TO_TARGET_DOUBLE (*rv, high_low);
+
+      /* The double precision value is laid out in memory order.  We need to
+	 undo this for XXSPLTI32DX.  */
+      if (!BYTES_BIG_ENDIAN)
+	std::swap (high_low[0], high_low[1]);
+
+      *high_ptr = high_low[0] & 0xffffffff;
+      *low_ptr = high_low[1] & 0xffffffff;
+      return;
+    }
+
+  gcc_unreachable ();
+}
+
 /* Return the immediate value used in the XXSPLTIDP instruction.  */
 
 long
@@ -7229,6 +7282,9 @@ output_vec_const_move (rtx *operands)
 	  return "lxvkq %x0,%2";
 	}
 
+      if (easy_vector_constant_2insns (vec, mode))
+	return "#";
+
       if (TARGET_P9_VECTOR
 	  && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
 	{
@@ -14077,6 +14133,9 @@ rs6000_output_move_128bit (rtx operands[])
       return "lxvkq %x0,%2";
     }
 
+  else if (dest_vsx_p && easy_vector_constant_2insns (src, mode))
+    return "#";
+
   else if (dest_regno >= 0
 	   && (CONST_INT_P (src)
 	       || CONST_WIDE_INT_P (src)
@@ -26991,11 +27050,13 @@ prefixed_xxsplti_p (rtx_insn *insn)
     case E_DImode:
     case E_DFmode:
     case E_SFmode:
-      return easy_fp_constant_64bit_scalar (src, mode);
+      return (easy_fp_constant_64bit_scalar (src, mode)
+	      || easy_vector_constant_2insns (src, mode));
 
     case E_V2DImode:
     case E_V2DFmode:
-      return easy_vector_constant_64bit_element (src, mode);
+      return (easy_vector_constant_64bit_element (src, mode)
+	      || easy_vector_constant_2insns (src, mode));
 
     case E_V16QImode:
     case E_V8HImode:
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 8afc4b2756d..5c120ef1672 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -7764,17 +7764,17 @@
 ;;
 ;;	LWZ          LFS        LXSSP       LXSSPX     STFS       STXSSP
 ;;	STXSSPX      STW        XXLXOR      LI         FMR        XSCPSGNDP
-;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP
+;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP  XXSPLTI32DX
 
 (define_insn "movsf_hardfloat"
   [(set (match_operand:SF 0 "nonimmediate_operand"
 	 "=!r,       f,         v,          wa,        m,         wY,
 	  Z,         m,         wa,         !r,        f,         wa,
-	  !r,        *c*l,      !r,         *h,        wa")
+	  !r,        *c*l,      !r,         *h,        wa,        wa")
 	(match_operand:SF 1 "input_operand"
 	 "m,         m,         wY,         Z,         f,         v,
 	  wa,        r,         j,          j,         f,         wa,
-	  r,         r,         *h,         0,         eF"))]
+	  r,         r,         *h,         0,         eF,        eD"))]
   "(register_operand (operands[0], SFmode)
    || register_operand (operands[1], SFmode))
    && TARGET_HARD_FLOAT
@@ -7797,15 +7797,24 @@
    mt%0 %1
    mf%1 %0
    nop
+   #
    #"
   [(set_attr "type"
 	"load,       fpload,    fpload,     fpload,    fpstore,   fpstore,
 	 fpstore,    store,     veclogical, integer,   fpsimple,  fpsimple,
-	 *,          mtjmpr,    mfjmpr,     *,         vecperm")
+	 *,          mtjmpr,    mfjmpr,     *,         vecperm,   vecperm")
    (set_attr "isa"
 	"*,          *,         p9v,        p8v,       *,         p9v,
 	 p8v,        *,         *,          *,         *,         *,
-	 *,          *,         *,          *,         p10")])
+	 *,          *,         *,          *,         p10,       p10")
+   (set_attr "max_prefixed_insns"
+	"*,          *,         *,          *,         *,         *,
+	 *,          *,         *,          *,         *,         *,
+	 *,          *,         *,          *,         *,         2")
+   (set_attr "num_insns"
+	"*,          *,         *,          *,         *,         *,
+	 *,          *,         *,          *,         *,         *,
+	 *,          *,         *,          *,         *,         2")])
 
 ;;	LWZ          LFIWZX     STW        STFIWX     MTVSRWZ    MFVSRWZ
 ;;	FMR          MR         MT%0       MF%1       NOP
@@ -8065,18 +8074,18 @@
 
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSD         STXSD       XXLOR       XXLXOR      GPR<-0
-;;           LWZ          STW         MR          XXSPLTIDP
+;;           LWZ          STW         MR          XXSPLTIDP   XXSPLTI32DX
 
 
 (define_insn "*mov<mode>_hardfloat32"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
             "=m,          d,          d,          <f64_p9>,   wY,
               <f64_av>,   Z,          <f64_vsx>,  <f64_vsx>,  !r,
-              Y,          r,          !r,         wa")
+              Y,          r,          !r,         wa,         wa")
 	(match_operand:FMOVE64 1 "input_operand"
              "d,          m,          d,          wY,         <f64_p9>,
               Z,          <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
-              r,          Y,          r,          eF"))]
+              r,          Y,          r,          eF,         eD"))]
   "! TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8094,20 +8103,29 @@
    #
    #
    #
+   #
    #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, two,
-             store,       load,       two,        vecperm")
+             store,       load,       two,        vecperm,    vecperm")
    (set_attr "size" "64")
    (set_attr "length"
             "*,           *,          *,          *,          *,
              *,           *,          *,          *,          8,
-             8,           8,          8,          *")
+             8,           8,          8,          *,          *")
+   (set_attr "num_insns"
+            "*,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *,          2")
+   (set_attr "max_prefixed_insns"
+            "*,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *,          2")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
-             *,           *,          *,          p10")])
+             *,           *,          *,          p10,        p10")])
 
 ;;           STW      LWZ     MR      G-const H-const F-const
 
@@ -8134,19 +8152,19 @@
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSDX        STXSDX      XXLOR       XXLXOR      LI 0
 ;;           STD          LD          MR          MT{CTR,LR}  MF{CTR,LR}
-;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP
+;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP   XXSPLTI32DX
 
 (define_insn "*mov<mode>_hardfloat64"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
            "=m,           d,          d,          <f64_p9>,   wY,
              <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
              YZ,          r,          !r,         *c*l,       !r,
-            *h,           r,          <f64_dm>,   wa")
+            *h,           r,          <f64_dm>,   wa,         wa")
 	(match_operand:FMOVE64 1 "input_operand"
             "d,           m,          d,          wY,         <f64_p9>,
              Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
              r,           YZ,         r,          r,          *h,
-             0,           <f64_dm>,   r,          eF"))]
+             0,           <f64_dm>,   r,          eF,         eD"))]
   "TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8169,18 +8187,29 @@
    nop
    mfvsrd %0,%x1
    mtvsrd %x0,%1
+   #
    #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, integer,
              store,       load,       *,          mtjmpr,     mfjmpr,
-             *,           mfvsr,      mtvsr,      vecperm")
+             *,           mfvsr,      mtvsr,      vecperm,    vecperm")
    (set_attr "size" "64")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
              *,           *,          *,          *,          *,
-             *,           p8v,        p8v,        p10")])
+             *,           p8v,        p8v,        p10,        p10")
+   (set_attr "num_insns"
+            "*,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *,          2")
+   (set_attr "max_prefixed_insns"
+            "*,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *,          2")])
 
 ;;           STD      LD       MR      MT<SPR> MF<SPR> G-const
 ;;           H-const  F-const  Special
@@ -9228,7 +9257,7 @@
 ;; a gpr into a fpr instead of reloading an invalid 'Y' address
 
 ;;        GPR store  GPR load   GPR move   FPR store  FPR load   FPR move
-;;	  XXSPLTIDP
+;;	  XXSPLTIDP  XXSPLTI32DX
 ;;        GPR const  AVX store  AVX store  AVX load   AVX load   VSX move
 ;;        P9 0       P9 -1      AVX 0/-1   VSX 0      VSX -1     P9 const
 ;;        AVX const  
@@ -9236,13 +9265,13 @@
 (define_insn "*movdi_internal32"
   [(set (match_operand:DI 0 "nonimmediate_operand"
          "=Y,        r,         r,         m,         ^d,        ^d,
-          ^wa,
+          ^wa,       ^wa,
           r,         wY,        Z,         ^v,        $v,        ^wa,
           wa,        wa,        v,         wa,        *i,        v,
           v")
 	(match_operand:DI 1 "input_operand"
          "r,         Y,         r,         ^d,        m,         ^d,
-          eF,
+          eF,        eD,
           IJKnF,     ^v,        $v,        wY,        Z,         ^wa,
           Oj,        wM,        OjwM,      Oj,        wM,        wS,
           wB"))]
@@ -9258,6 +9287,7 @@
    fmr %0,%1
    #
    #
+   #
    stxsd %1,%0
    stxsdx %x1,%y0
    lxsd %0,%1
@@ -9272,20 +9302,32 @@
    #"
   [(set_attr "type"
          "store,     load,      *,         fpstore,   fpload,    fpsimple,
-          vecperm,
+          vecperm,   vecperm,
           *,         fpstore,   fpstore,   fpload,    fpload,    veclogical,
           vecsimple, vecsimple, vecsimple, veclogical,veclogical,vecsimple,
           vecsimple")
    (set_attr "size" "64")
    (set_attr "length"
          "8,         8,         8,         *,         *,         *,
-          *,
+          *,         *,
           16,        *,         *,         *,         *,         *,
           *,         *,         *,         *,         *,         8,
           *")
+   (set_attr "num_insns"
+         "*,         *,         *,         *,         *,         *,
+          *,         *,
+          *,         *,         *,         *,         *,         *,
+          *,         *,         *,         *,         *,         *,
+          *")
+   (set_attr "max_prefixed_insns"
+         "*,         *,         *,         *,         *,         *,
+          *,         *,
+          *,         *,         *,         *,         *,         *,
+          *,         *,         *,         *,         *,         *,
+          *")
    (set_attr "isa"
          "*,         *,         *,         *,         *,         *,
-          p10,
+          p10,       p10,
           *,         p9v,       p7v,       p9v,       p7v,       *,
           p9v,       p9v,       p7v,       *,         *,         p7v,
           p7v")])
@@ -9321,7 +9363,7 @@
 })
 
 ;;	   GPR store   GPR load    GPR move
-;;	   XXSPLTIDP
+;;	   XXSPLTIDP   XXSPLTI32DX
 ;;	   GPR li      GPR lis     GPR pli     GPR #
 ;;	   FPR store   FPR load    FPR move
 ;;	   AVX store   AVX store   AVX load    AVX load    VSX move
@@ -9332,7 +9374,7 @@
 (define_insn "*movdi_internal64"
   [(set (match_operand:DI 0 "nonimmediate_operand"
 	  "=YZ,        r,          r,
-	   ^wa,
+	   ^wa,        ^wa,
 	   r,          r,          r,          r,
 	   m,          ^d,         ^d,
 	   wY,         Z,          $v,         $v,         ^wa,
@@ -9342,7 +9384,7 @@
 	   ?r,         ?wa")
 	(match_operand:DI 1 "input_operand"
 	  "r,          YZ,         r,
-	   eF,
+	   eF,         eD,
 	   I,          L,          eI,         nF,
 	   ^d,         m,          ^d,
 	   ^v,         $v,         wY,         Z,          ^wa,
@@ -9358,6 +9400,7 @@
    ld%U1%X1 %0,%1
    mr %0,%1
    #
+   #
    li %0,%1
    lis %0,%v1
    li %0,%1
@@ -9384,7 +9427,7 @@
    mtvsrd %x0,%1"
   [(set_attr "type"
 	  "store,      load,       *,
-	   vecperm,
+	   vecperm,    vecperm,
 	   *,          *,          *,          *,
 	   fpstore,    fpload,     fpsimple,
 	   fpstore,    fpstore,    fpload,     fpload,     veclogical,
@@ -9395,7 +9438,7 @@
    (set_attr "size" "64")
    (set_attr "length"
 	  "*,          *,          *,
-	   *,
+	   *,          *,
 	   *,          *,          *,          20,
 	   *,          *,          *,
 	   *,          *,          *,          *,          *,
@@ -9403,9 +9446,29 @@
 	   8,          *,
 	   *,          *,          *,
 	   *,          *")
+   (set_attr "num_insns"
+	  "*,          *,          *,
+	   *,          2,
+	   *,          *,          *,          *,
+	   *,          *,          *,
+	   *,          *,          *,          *,          *,
+	   *,          *,          *,          *,          *,
+	   8,          *,
+	   *,          *,          *,
+	   *,          *")
+   (set_attr "max_prefixed_insns"
+	  "*,          *,          *,
+	   *,          2,
+	   *,          *,          *,          *,
+	   *,          *,          *,
+	   *,          *,          *,          *,          *,
+	   *,          *,          *,          *,          *,
+	   8,          *,
+	   *,          *,          *,
+	   *,          *")
    (set_attr "isa"
 	  "*,          *,          *,
-	   p10,
+	   p10,        p10,
 	   *,          *,          p10,        *,
 	   *,          *,          *,
 	   p9v,        p7v,        p9v,        p7v,        *,
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index a53aad72547..898bc4e9e6e 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -640,6 +640,11 @@ mprivileged
 Target Var(rs6000_privileged) Init(0)
 Generate code that will run in privileged state.
 
+;; Do not enable at this time.
+mxxsplti32dx
+Target Undocumented Var(TARGET_XXSPLTI32DX) Init(0) Save
+Generate (do not generate) XXSPLTI32DX instructions.
+
 mxxspltidp
 Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
 Generate (do not generate) XXSPLTIDP instructions.
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 712e5df0c02..cc21c454491 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -376,6 +376,7 @@
    UNSPEC_XXSPLTIW
    UNSPEC_XXSPLTIDP
    UNSPEC_XXSPLTI32DX
+   UNSPEC_XXSPLTI32DX_CONST
    UNSPEC_XXBLEND
    UNSPEC_XXPERMX
   ])
@@ -1191,19 +1192,19 @@
 ;; instruction). But generate XXLXOR/XXLORC if it will avoid a register move.
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
-;;              XXSPLTIDP  XXSPLTIW   LXVKQ
+;;              XXSPLTIDP  XXSPLTIW   LXVKQ     XXSPLTI32DX
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        r,         we,        ?wQ,
-                wa,        wa,        wa,
+                wa,        wa,        wa,        wa,
                 ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
                 ?wa,       v,         <??r>,     wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        we,        r,         r,
-                eV,        eW,        eQ,
+                eV,        eW,        eQ,        eD,
                 wQ,        Y,         r,         r,         wE,        jwM,
                 ?jwM,      W,         <nW>,      v,         wZ"))]
 
@@ -1215,44 +1216,44 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
-                vecperm,   vecperm,   vecperm,
+                vecperm,   vecperm,   vecperm,   vecperm,
                 store,     load,      store,     *,         vecsimple, vecsimple,
                 vecsimple, *,         *,         vecstore,  vecload")
    (set_attr "num_insns"
                "*,         *,         *,         2,         *,         2,
-                *,         *,         *,
+                *,         *,         *,         2,
                 2,         2,         2,         2,         *,         *,
                 *,         5,         2,         *,         *")
    (set_attr "max_prefixed_insns"
                "*,         *,         *,         *,         *,         2,
-                *,         *,         *,
+                *,         *,         *,         2,
                 2,         2,         2,         2,         *,         *,
                 *,         *,         *,         *,         *")
    (set_attr "length"
                "*,         *,         *,         8,         *,         8,
-                *,         *,         *,
+                *,         *,         *,         *,
                 8,         8,         8,         8,         *,         *,
                 *,         20,        8,         *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
-                p10,       p10,       p10,
+                p10,       p10,       p10,       p10,
                 *,         *,         *,         *,         p9v,       *,
                 <VSisa>,   *,         *,         *,         *")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
-;;              XXSPLTIDP  XXSPLTIW   LXVKQ
+;;              XXSPLTIDP  XXSPLTIW   LXVKQ      XXSPLTI32DX
 ;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
-                wa,        wa,        wa,
+                wa,        wa,        wa,        wa,
                 wa,        v,         ?wa,       v,         <??r>,
                 wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        Y,         r,         r,
-                eV,        eW,        eQ,
+                eV,        eW,        eQ,        eD,
                 wE,        jwM,       ?jwM,      W,         <nW>,
                 v,         wZ"))]
 
@@ -1264,17 +1265,27 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, load,      store,    *,
-                vecperm,   vecperm,   vecperm,
+                vecperm,   vecperm,   vecperm,   vecperm,
                 vecsimple, vecsimple, vecsimple, *,         *,
                 vecstore,  vecload")
    (set_attr "length"
                "*,         *,         *,         16,        16,        16,
-                *,         *,         *,
+                *,         *,         *,         *,
                 *,         *,         *,         20,        16,
                 *,         *")
+   (set_attr "num_insns"
+               "*,         *,         *,         *,         *,         *,
+                *,         *,         *,         2,
+                *,         *,         *,         *,         *,
+                *,         *")
+   (set_attr "length"
+               "*,         *,         *,         *,         *,         *,
+                *,         *,         *,         2,
+                *,         *,         *,         *,         *,
+                *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
-                p10,       p10,       p10,
+                p10,       p10,       p10,       p10,
                 p9v,       *,         <VSisa>,   *,         *,
                 *,         *")])
 
@@ -6570,6 +6581,74 @@
   [(set_attr "type" "vecperm")
    (set_attr "prefixed" "yes")])
 
+;; XXSPLTI32DX used to create 64-bit constants or vector constants where the
+;; even elements match and the odd elements match.
+(define_mode_iterator XXSPLTI32DX [DI SF DF V2DF V2DI])
+
+;; Don't split DImode before register allocation, so that it has a better
+;; chance of winding up in a GPR register.
+(define_split
+  [(set (match_operand:XXSPLTI32DX 0 "vsx_register_operand")
+	(match_operand:XXSPLTI32DX 1 "easy_vector_constant_2insns"))]
+  "TARGET_POWER10 && (reload_completed || <MODE>mode != DImode)"
+  [(set (match_dup 0)
+	(unspec:XXSPLTI32DX [(match_dup 2)
+			     (match_dup 3)] UNSPEC_XXSPLTI32DX_CONST))
+   (set (match_dup 0)
+	(unspec:XXSPLTI32DX [(match_dup 0)
+			     (match_dup 4)
+			     (match_dup 5)] UNSPEC_XXSPLTI32DX_CONST))]
+{
+  long high = 0, low = 0;
+
+  xxsplti32dx_constant_immediate (operands[1], <MODE>mode, &high, &low);
+
+  /* If the low bits are 0 or all 1s, initialize that word first.  This way we
+     can use a smaller XXSPLTIB/XXLXOR/XXLORC instruction instead the first
+     XXSPLTI32DX.  */
+  if (low == 0 || low ==  -1)
+    {
+      operands[2] = const1_rtx;
+      operands[3] = GEN_INT (low);
+      operands[4] = const0_rtx;
+      operands[5] = GEN_INT (high);
+    }
+  else
+    {
+      operands[2] = const0_rtx;
+      operands[3] = GEN_INT (high);
+      operands[4] = const1_rtx;
+      operands[5] = GEN_INT (low);
+    }
+})
+
+;; First word of XXSPLTI32DX
+(define_insn "*xxsplti32dx_<mode>_first"
+  [(set (match_operand:XXSPLTI32DX 0 "vsx_register_operand" "=wa,wa,wa")
+	(unspec:XXSPLTI32DX [(match_operand 1 "u1bit_cint_operand" "n,n,n")
+			     (match_operand 2 "const_int_operand" "O,wM,n")]
+			    UNSPEC_XXSPLTI32DX_CONST))]
+  "TARGET_XXSPLTI32DX"
+  "@
+   xxlxor %x0,%x0,%x0
+   xxlorc %x0,%x0,%x0
+   xxsplti32dx %x0,%1,%2"
+  [(set_attr "type" "veclogical,veclogical,vecperm")
+   (set_attr "prefixed" "*,*,yes")])
+
+;; Second word of XXSPLTI32DX
+(define_insn "*xxsplti32dx_<mode>_second"
+  [(set (match_operand:XXSPLTI32DX 0 "vsx_register_operand" "=wa")
+	(unspec:XXSPLTI32DX [(match_operand:XXSPLTI32DX 1 "vsx_register_operand" "0")
+			     (match_operand 2 "u1bit_cint_operand" "n")
+			     (match_operand 3 "const_int_operand" "n")]
+			    UNSPEC_XXSPLTI32DX_CONST))]
+  "TARGET_XXSPLTI32DX"
+  "xxsplti32dx %x0,%2,%3"
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
+
 ;; XXBLEND built-in function support
 (define_insn "xxblend_<mode>"
   [(set (match_operand:VM3 0 "register_operand" "=wa")
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 4ad0e745c94..feaa205291a 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3333,6 +3333,9 @@ The integer constant zero.
 A constant whose negation is a signed 16-bit constant.
 @end ifset
 
+@item eD
+A constant that can be loaded with a pair of XXSPLTI32DX instructions.
+
 @item eF
 A 64-bit scalar constant that can be loaded with the XXSPLTIDP instruction.
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df-2.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df-2.c
new file mode 100644
index 00000000000..34ec3caa594
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df-2.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxsplti32dx" } */
+
+#define M_PI		3.14159265358979323846
+#define SUBNORMAL	0x1p-149f
+
+/* Test generation of floating point constants with XXSPLTI32DX.  */
+
+double
+df_double_pi (void)
+{
+  return M_PI;			/* 2x XXSPLTI32DX.  */
+}
+
+/* This float subnormal cannot be loaded with XXSPLTIDP.  */
+
+double
+v2df_double_denorm (void)
+{
+  return SUBNORMAL;		/* XXLXOR, XXSPLTI32DX.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di-2.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di-2.c
new file mode 100644
index 00000000000..41b1d703fe7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di-2.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxsplti32dx" } */
+
+/* Test generation of integer constants loaded into the vector registers with
+   the ISA 3.1 (power10) instruction XXSPLTI32DX.  We use asm to force the
+   value into vector registers.  */
+
+#define LARGE_BITS	0x12345678ABCDEF01LL
+#define SUBNORMAL	0x8000000000000001LL
+
+/* 0x8000000000000001LL is the bit pattern for a negative subnormal value can
+   be generated with XXSPLTI32DX but not XXSLTIDP.  */
+double
+scalar_float_subnormal (void)
+{
+  /* 2x XXSPLTI32DX.  */
+  double d;
+  long long ll = SUBNORMAL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x12345678ABCDEF01LL is a large constant that can be loaded with 2x
+   XXSPLTI32DX instructions.  */
+double
+scalar_large_constant (void)
+{
+  /* 2x XXSPLTI32DX.  */
+  double d;
+  long long ll = LARGE_BITS;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 4 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df-2.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df-2.c
new file mode 100644
index 00000000000..3f7b0a00655
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df-2.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxsplti32dx" } */
+
+#define M_PI		3.14159265358979323846
+#define SUBNORMAL	0x1p-149f
+
+/* Test generation of floating point constants with XXSPLTI32DX.  */
+
+vector double
+v2df_double_pi (void)
+{
+  /* 2x XXSPLTI32DX.  */
+  return (vector double) { M_PI, M_PI };
+}
+
+vector double
+v2df_double_denorm (void)
+{
+  /* XXLXOR, XXSPLTI32DX.  */
+  return (vector double) { SUBNORMAL, SUBNORMAL };
+}
+
+/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di-2.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di-2.c
new file mode 100644
index 00000000000..90027378012
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di-2.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxsplti32dx" } */
+
+/* Test generation of integer constants loaded into the vector registers with
+   the ISA 3.1 (power10) instruction XXSPLTI32DX.  */
+
+#define LARGE_BITS	0x12345678ABCDEF01LL
+#define SUBNORMAL	0x8000000000000001LL
+
+/* 0x8000000000000001LL is the bit pattern for a negative subnormal value can
+   be generated with XXSPLTI32DX but not XXSLTIDP.  */
+vector long long
+vector_float_subnormal (void)
+{
+  /* 2x XXSPLTI32DX.  */
+  return (vector long long) { SUBNORMAL, SUBNORMAL };
+}
+
+/* 0x12345678ABCDEF01LL is a large constant that can be loaded with 2x
+   XXSPLTI32DX instructions.  */
+vector long long
+vector_large_constant (void)
+{
+  /* 2x XXSPLTI32DX.  */
+  return (vector long long) { LARGE_BITS, LARGE_BITS };
+}
+
+/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 4 } } */


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2021-10-05 23:31 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-05 17:57 [gcc(refs/users/meissner/heads/work070)] Generate XXSPLTI32DX on power10 Michael Meissner
  -- strict thread matches above, loose matches on Subject: below --
2021-10-05 23:31 Michael Meissner
2021-10-05 22:15 Michael Meissner
2021-10-04 22:22 Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).