public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc(refs/users/meissner/heads/work048)] Use XXSPLTI32DX to generate some constants.
@ 2021-04-15 17:56 Michael Meissner
  0 siblings, 0 replies; only message in thread
From: Michael Meissner @ 2021-04-15 17:56 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:e0dfd4a9c73fefb98aaf5102d025d8d7d3cde709

commit e0dfd4a9c73fefb98aaf5102d025d8d7d3cde709
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Thu Apr 15 13:56:07 2021 -0400

    Use XXSPLTI32DX to generate some constants.
    
    This patch generates a pair of XXSPLTI32DX instructions to load 64-bit
    scalar or 128-bit vector constants into the vector registers.
    
    I added a new constraint (eD) for constants that can be loaded with
    XXSPLTI32DX, but cannot be loaded with the XXSPLTIDP or XXSPLTIW
    instructions.
    
    I added a debug switch (-mxxsplti32dx) to control whether this behavior is
    on or off.
    
    For vector moves, I bumped up the size of expanding a vector constant from
    5 instructions (20 bytes) to 6 instructions (24 bytes).  This is to
    accomidate the size of two prefixed instructions.
    
    gcc/
    2021-04-15  Michael Meissner  <meissner@linux.ibm.com>
    
            * config/rs6000/altivec.me (UNSPEC_XXSPLTI32DX): Move to vsx.md.
            (xxsplti32dx_v4si): Move to vsx.md.
            (xxsplti32dx_v4si_inst): Move to vsx.md.
            (xxsplti32dx_v4sf): Move to vsx.md.
            (xxsplti32dx_v4sf_inst): Move to vsx.md.
            * config/rs6000/contraints.md (eD): New constraint.
            * config/rs6000/predicates.md (easy_fp_constant): If we can load
            the constant with a pair of XXSPLTI32DX instructions, it is easy.
            (xxsplti32dx_operand): New predicate.
            (easy_vector_constant): If we can load the constant with a pair of
            XXSPLTI32DX instructions, it is easy.
            * config/rs6000/rs6000-cpus.def (OTHER_POWER10_MASKS): Add
            -mxxsplti32dx.
            (POWERPC_MASKS): Add -mxxsplti32dx.
            * config/rs6000/rs6000-protos.h (xxsplti32dx_constant_p): New
            declaration.
            * config/rs6000/rs6000.c (rs6000_option_override_internal): Add
            -mxxsplti32dx support.
            (xxsplti32dx_constant_p): New helper function.
            (output_vec_const_move): Split constants that need XXSPLTI32DX.
            (rs6000_opt_masks): Add -mxxsplti32dx.
            * config/rs6000/rs6000.md (movsf_hardfloat): Add support for
            loading constants with XXSPLTI32DX.
            (mov<mode>_hardfloat32, FMOVE64 iterator): Add support for loading
            constants with XXSPLTI32DX.
            (mov<mode>_hardfloat64, FMOVE64 iterator): Add support for loading
            constants with XXSPLTI32DX.
            * config/rs6000/rs6000.opt (-mxxsplti32dx): New switch.
            * config/rs6000/vsx.md (UNSPEC_XXSPLTI32DX): Move unspec here from
            altivec.md.
            (UNSPEC_XXSPLTI32DX_CONST): New unspec.
            (vsx_mov<mode>_64bit): Bump up size of 'W' vector constants to
            accomidate a pair of XXSPLTI32DX instructions.
            (vsx_mov<mode>_32bit): Bump up size of 'W' vector constants to
            accomidate a pair of XXSPLTI32DX instructions.
            (XXSPLTI32DX): New mode iterator.
            (xxsplti32dx_<mode>): New insn and splits.
            (xxsplti32dx_<mode>_first): New insns.
            (xxsplti32dx_<mode>_second): New insns.
            (xxsplti32dx_v4si): Move here from altivec.md.
            (xxsplti32dx_v4si_inst): Move here from altivec.md.
            (xxsplti32dx_v4sf): Move here from altivec.md.
            (xxsplti32dx_v4sf_inst): Move here from altivec.md.
    
    gcc/testsuite/
    2021-04-15  Michael Meissner  <meissner@linux.ibm.com>
    
            * gcc.target/powerpc/vec-splati-runnable.c: Update insn count.
            * gcc.target/powerpc/vec-splat-constant-sf.c: Update insn count.
            * gcc.target/powerpc/vec-splat-constant-df.c: Update insn count.
            * gcc.target/powerpc/vec-splat-constant-v2df.c: Update insn
            count.

Diff:
---
 gcc/config/rs6000/altivec.md                       |  60 ---------
 gcc/config/rs6000/constraints.md                   |   6 +
 gcc/config/rs6000/predicates.md                    |  22 ++++
 gcc/config/rs6000/rs6000-cpus.def                  |   2 +
 gcc/config/rs6000/rs6000-protos.h                  |   1 +
 gcc/config/rs6000/rs6000.c                         | 133 ++++++++++++++++++-
 gcc/config/rs6000/rs6000.md                        |  44 +++++--
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           | 142 ++++++++++++++++++++-
 .../gcc.target/powerpc/vec-splat-constant-df.c     |   9 +-
 .../gcc.target/powerpc/vec-splat-constant-sf.c     |   5 +-
 .../gcc.target/powerpc/vec-splat-constant-v2df.c   |  10 +-
 .../gcc.target/powerpc/vec-splati-runnable.c       |   2 +-
 13 files changed, 354 insertions(+), 86 deletions(-)

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index ad6ead04cfa..9af71e036ab 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -176,7 +176,6 @@
    UNSPEC_VSTRIL
    UNSPEC_SLDB
    UNSPEC_SRDB
-   UNSPEC_XXSPLTI32DX
    UNSPEC_XXBLEND
    UNSPEC_XXPERMX
 ])
@@ -818,65 +817,6 @@
   "vs<SLDB_lr>dbi %0,%1,%2,%3"
   [(set_attr "type" "vecsimple")])
 
-(define_expand "xxsplti32dx_v4si"
-  [(set (match_operand:V4SI 0 "register_operand" "=wa")
-	(unspec:V4SI [(match_operand:V4SI 1 "register_operand" "0")
-		      (match_operand:QI 2 "u1bit_cint_operand" "n")
-		      (match_operand:SI 3 "s32bit_cint_operand" "n")]
-		     UNSPEC_XXSPLTI32DX))]
- "TARGET_POWER10"
-{
-  int index = INTVAL (operands[2]);
-
-  if (!BYTES_BIG_ENDIAN)
-    index = 1 - index;
-
-   emit_insn (gen_xxsplti32dx_v4si_inst (operands[0], operands[1],
-					 GEN_INT (index), operands[3]));
-   DONE;
-}
- [(set_attr "type" "vecsimple")])
-
-(define_insn "xxsplti32dx_v4si_inst"
-  [(set (match_operand:V4SI 0 "register_operand" "=wa")
-	(unspec:V4SI [(match_operand:V4SI 1 "register_operand" "0")
-		      (match_operand:QI 2 "u1bit_cint_operand" "n")
-		      (match_operand:SI 3 "s32bit_cint_operand" "n")]
-		     UNSPEC_XXSPLTI32DX))]
-  "TARGET_POWER10"
-  "xxsplti32dx %x0,%2,%3"
-  [(set_attr "type" "vecsimple")
-   (set_attr "prefixed" "yes")])
-
-(define_expand "xxsplti32dx_v4sf"
-  [(set (match_operand:V4SF 0 "register_operand" "=wa")
-	(unspec:V4SF [(match_operand:V4SF 1 "register_operand" "0")
-		      (match_operand:QI 2 "u1bit_cint_operand" "n")
-		      (match_operand:SF 3 "const_double_operand" "n")]
-		     UNSPEC_XXSPLTI32DX))]
-  "TARGET_POWER10"
-{
-  int index = INTVAL (operands[2]);
-  long value = rs6000_const_f32_to_i32 (operands[3]);
-  if (!BYTES_BIG_ENDIAN)
-    index = 1 - index;
-
-   emit_insn (gen_xxsplti32dx_v4sf_inst (operands[0], operands[1],
-					 GEN_INT (index), GEN_INT (value)));
-   DONE;
-})
-
-(define_insn "xxsplti32dx_v4sf_inst"
-  [(set (match_operand:V4SF 0 "register_operand" "=wa")
-	(unspec:V4SF [(match_operand:V4SF 1 "register_operand" "0")
-		      (match_operand:QI 2 "u1bit_cint_operand" "n")
-		      (match_operand:SI 3 "s32bit_cint_operand" "n")]
-		     UNSPEC_XXSPLTI32DX))]
-  "TARGET_POWER10"
-  "xxsplti32dx %x0,%2,%3"
-  [(set_attr "type" "vecsimple")
-   (set_attr "prefixed" "yes")])
-
 (define_insn "xxblend_<mode>"
   [(set (match_operand:VM3 0 "register_operand" "=wa")
 	(unspec:VM3 [(match_operand:VM3 1 "register_operand" "wa")
diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index e1fadd63580..d665e2a94db 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -208,6 +208,12 @@
   (and (match_code "const_int")
        (match_test "((- (unsigned HOST_WIDE_INT) ival) + 0x8000) < 0x10000")))
 
+;; SF/DF/V2DF/DI/V2DI scalar or vector constant that can be loaded with a pair
+;; of XXSPLTI32DX instructions.
+(define_constraint "eD"
+  "A vector constant that can be loaded with XXSPLTI32DX instructions."
+  (match_operand 0 "xxsplti32dx_operand"))
+
 ;; SF/DF/V2DF scalar or vector constant that can be loaded with XXSPLTIDP
 (define_constraint "eF"
   "A vector constant that can be loaded with the XXSPLTIDP instruction."
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 8c461ba2b76..01e5e09e0a6 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -606,6 +606,11 @@
   if (xxspltidp_operand (op, mode))
     return 1;
 
+  /* If we have the ISA 3.1 XXSPLTI32DX instruction, see if the constant can
+     be loaded with a pair of those instructions.  */
+  if (xxsplti32dx_operand (op, mode))
+    return 1;
+
   /* Otherwise consider floating point constants hard, so that the
      constant gets pushed to memory during the early RTL phases.  This
      has the advantage that double precision constants that can be
@@ -684,6 +689,20 @@
   return xxspltidp_constant_p (op, mode, &value);
 })
 
+;; Return 1 if operand is a SF/DF CONST_DOUBLE or V2DF CONST_VECTOR that can be
+;; loaded via a pair f ISA 3.1 XXSPLTI32DX instructions.  Do not return true if
+;; the value is 0.0 or it can be loaded with XXSPLTIDP, since that is easy to
+;; generate without using XXSPLTI32DX.
+(define_predicate "xxsplti32dx_operand"
+  (match_code "const_double,const_int,const_vector,vec_duplicate")
+{
+  if (op == CONST0_RTX (mode))
+    return false;
+
+  HOST_WIDE_INT value = 0;
+  return xxsplti32dx_constant_p (op, mode, &value);
+})
+
 ;; Return 1 if the operand is a CONST_VECTOR and can be loaded into a
 ;; vector register without using memory.
 (define_predicate "easy_vector_constant"
@@ -703,6 +722,9 @@
       if (xxspltidp_operand (op, mode))
 	return true;
 
+      if (xxsplti32dx_operand (op, mode))
+	return true;
+
       if (TARGET_P9_VECTOR
           && xxspltib_constant_p (op, mode, &num_insns, &value))
 	return true;
diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def
index 3b657e490b1..b0821b34a69 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -86,6 +86,7 @@
 				 | OPTION_MASK_P10_FUSION		\
 				 | OPTION_MASK_P10_FUSION_LD_CMPI	\
 				 | OPTION_MASK_P10_FUSION_2LOGICAL	\
+				 | OPTION_MASK_XXSPLTI32DX		\
 				 | OPTION_MASK_XXSPLTIDP		\
 				 | OPTION_MASK_XXSPLTIW)
 
@@ -163,6 +164,7 @@
 				 | OPTION_MASK_SOFT_FLOAT		\
 				 | OPTION_MASK_STRICT_ALIGN_OPTIONAL	\
 				 | OPTION_MASK_VSX			\
+				 | OPTION_MASK_XXSPLTI32DX		\
 				 | OPTION_MASK_XXSPLTIDP		\
 				 | OPTION_MASK_XXSPLTIW)
 #endif
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index e87a51f42de..a13da44cc3c 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -33,6 +33,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, int, int, int,
 extern bool easy_altivec_constant (rtx, machine_mode);
 extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
 extern bool xxspltidp_constant_p (rtx, machine_mode, HOST_WIDE_INT *);
+extern bool xxsplti32dx_constant_p (rtx, machine_mode, HOST_WIDE_INT *);
 extern int vspltis_shifted (rtx);
 extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
 extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 024eaeecfca..f7f91db5f62 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4481,6 +4481,9 @@ rs6000_option_override_internal (bool global_init_p)
 
   if (TARGET_POWER10 && TARGET_VSX)
     {
+      if ((rs6000_isa_flags_explicit & OPTION_MASK_XXSPLTI32DX) == 0)
+	rs6000_isa_flags |= OPTION_MASK_XXSPLTI32DX;
+
       if ((rs6000_isa_flags_explicit & OPTION_MASK_XXSPLTIW) == 0)
 	rs6000_isa_flags |= OPTION_MASK_XXSPLTIW;
 
@@ -4488,7 +4491,9 @@ rs6000_option_override_internal (bool global_init_p)
 	rs6000_isa_flags |= OPTION_MASK_XXSPLTIDP;
     }
   else
-    rs6000_isa_flags &= ~(OPTION_MASK_XXSPLTIW | OPTION_MASK_XXSPLTIDP);
+    rs6000_isa_flags &= ~(OPTION_MASK_XXSPLTIW
+			  | OPTION_MASK_XXSPLTIDP
+			  | OPTION_MASK_XXSPLTI32DX);
 
   if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET)
     rs6000_print_isa_options (stderr, 0, "after subtarget", rs6000_isa_flags);
@@ -6549,6 +6554,128 @@ xxspltidp_constant_p (rtx op,
   return true;
 }
 
+/* Return true if OP is of the given MODE and can be synthesized with ISA 3.1
+   XXSPLTI32DX instruction.  If the instruction can be synthesized with
+   XXSPLTIDP or is 0/-1, return false.
+
+   Return the 64-bit constant to use in the two XXSPLTI32DX instructions via
+   CONSTANT_PTR.  */
+
+bool
+xxsplti32dx_constant_p (rtx op,
+			machine_mode mode,
+			HOST_WIDE_INT *constant_ptr)
+{
+  *constant_ptr = 0;
+
+  if (!TARGET_XXSPLTI32DX)
+    return false;
+
+  if (mode == VOIDmode)
+    mode = GET_MODE (op);
+
+  if (op == CONST0_RTX (mode))
+    return false;
+
+  rtx element = op;
+  if (mode == V2DFmode || mode == V2DImode)
+    {
+      /* Handle VEC_DUPLICATE and CONST_VECTOR.  */
+      if (GET_CODE (op) == VEC_DUPLICATE)
+	element = XEXP (op, 0);
+
+      else if (GET_CODE (op) == CONST_VECTOR)
+       {
+         element = CONST_VECTOR_ELT (op, 0);
+         if (!rtx_equal_p (element, CONST_VECTOR_ELT (op, 1)))
+           return false;
+       }
+
+      else
+       return false;
+
+      mode = GET_MODE_INNER (mode);
+    }
+
+  else if (mode == V4SImode || mode == V4SFmode)
+    {
+      /* For V4SI/V4SF, the XXSPLTI32DX instruction pair can represent vectors
+	 where the two even elements are equal and the two odd elements are
+	 equal.  */
+      if (GET_CODE (op) != CONST_VECTOR)
+	return false;
+
+      rtx op0 = CONST_VECTOR_ELT (op, 0);
+      if (!rtx_equal_p (op0, CONST_VECTOR_ELT (op, 2)))
+	return false;
+
+      rtx op1 = CONST_VECTOR_ELT (op, 1);
+      if (!rtx_equal_p (op1, CONST_VECTOR_ELT (op, 3)))
+	return false;
+
+      if (rtx_equal_p (op0, op1))
+	return false;
+
+      long op0_value;
+      long op1_value;
+      if (mode == V4SImode)
+	{
+	  op0_value = INTVAL (op0);
+	  op1_value = INTVAL (op1);
+	}
+      else
+	{
+	  op0_value = rs6000_const_f32_to_i32 (op0);
+	  op1_value = rs6000_const_f32_to_i32 (op1);
+	}
+
+      *constant_ptr = (op0_value << 32) | (op1_value & 0xffffffff);
+      return true;
+    }
+
+  if (GET_MODE (element) != mode)
+    return false;
+
+  /* Handle floating point constants.  */
+  if (mode == SFmode || mode == DFmode)
+    {
+      HOST_WIDE_INT xxspltidp_value = 0;
+
+      if (!CONST_DOUBLE_P (element))
+	return false;
+
+      if (xxspltidp_constant_p (element, mode, &xxspltidp_value))
+	return false;
+
+      long high_low[2];
+      const struct real_value *rv = CONST_DOUBLE_REAL_VALUE (element);
+      REAL_VALUE_TO_TARGET_DOUBLE (*rv, high_low);
+
+      if (!BYTES_BIG_ENDIAN)
+	std::swap (high_low[0], high_low[1]);
+
+      *constant_ptr = (high_low[0] << 32) | (high_low[1] & 0xffffffff);
+      return true;
+    }
+
+  /* Handle integer constants.  */
+  else if (mode == DImode)
+    {
+      if (!CONST_INT_P (element))
+	return false;
+
+      HOST_WIDE_INT value = INTVAL (element);
+      if (value == -1)
+	return false;
+
+      *constant_ptr = value;
+      return true;
+    }
+
+  else
+    return false;
+}
+
 const char *
 output_vec_const_move (rtx *operands)
 {
@@ -6597,6 +6724,9 @@ output_vec_const_move (rtx *operands)
 	  || xxspltidp_operand (vec, mode))
 	return "#";
 
+      if (xxsplti32dx_operand (vec, mode))
+	return "#";
+
       if (TARGET_P9_VECTOR
 	  && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
 	{
@@ -24094,6 +24224,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] =
   { "string",			0,				false, true  },
   { "update",			OPTION_MASK_NO_UPDATE,		true , true  },
   { "vsx",			OPTION_MASK_VSX,		false, true  },
+  { "xxsplti32dx",		OPTION_MASK_XXSPLTI32DX,	false, true  },
   { "xxspltiw",			OPTION_MASK_XXSPLTIW,		false, true  },
   { "xxspltidp",		OPTION_MASK_XXSPLTIDP,		false, true  },
 #ifdef OPTION_MASK_64BIT
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 3d4dc820bdd..9e7f507c440 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -7612,17 +7612,17 @@
 ;;
 ;;	LWZ          LFS        LXSSP       LXSSPX     STFS       STXSSP
 ;;	STXSSPX      STW        XXLXOR      LI         FMR        XSCPSGNDP
-;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP
+;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP  XXSPLTI32DX
 
 (define_insn "movsf_hardfloat"
   [(set (match_operand:SF 0 "nonimmediate_operand"
 	 "=!r,       f,         v,          wa,        m,         wY,
 	  Z,         m,         wa,         !r,        f,         wa,
-	  !r,        *c*l,      !r,         *h,        wa")
+	  !r,        *c*l,      !r,         *h,        wa,        wa")
 	(match_operand:SF 1 "input_operand"
 	 "m,         m,         wY,         Z,         f,         v,
 	  wa,        r,         j,          j,         f,         wa,
-	  r,         r,         *h,         0,         eF"))]
+	  r,         r,         *h,         0,         eF,        eD"))]
   "(register_operand (operands[0], SFmode)
    || register_operand (operands[1], SFmode))
    && TARGET_HARD_FLOAT
@@ -7645,19 +7645,28 @@
    mt%0 %1
    mf%1 %0
    nop
+   #
    #"
   [(set_attr "type"
 	"load,       fpload,    fpload,     fpload,    fpstore,   fpstore,
 	 fpstore,    store,     veclogical, integer,   fpsimple,  fpsimple,
-	 *,          mtjmpr,    mfjmpr,     *,         vecperm")
+	 *,          mtjmpr,    mfjmpr,     *,         vecperm,   vecperm")
    (set_attr "isa"
 	"*,          *,         p9v,        p8v,       *,         p9v,
 	 p8v,        *,         *,          *,         *,         *,
-	 *,          *,         *,          *,         p10")
+	 *,          *,         *,          *,         p10,       p10")
    (set_attr "prefixed"
 	"*,          *,         *,          *,         *,         *,
 	 *,          *,         *,          *,         *,         *,
-	 *,          *,         *,          *,         yes")])
+	 *,          *,         *,          *,         yes,       yes")
+   (set_attr "max_prefixed_insns"
+	"*,          *,         *,          *,         *,         *,
+	 *,          *,         *,          *,         *,         *,
+	 *,          *,         *,          *,         *,         2")
+   (set_attr "num_insns"
+	"*,          *,         *,          *,         *,         *,
+	 *,          *,         *,          *,         *,         *,
+	 *,          *,         *,          *,         *,         2")])
 
 ;;	LWZ          LFIWZX     STW        STFIWX     MTVSRWZ    MFVSRWZ
 ;;	FMR          MR         MT%0       MF%1       NOP
@@ -7917,18 +7926,18 @@
 
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSD         STXSD       XXLOR       XXLXOR      GPR<-0
-;;           LWZ          STW         MR          XXSPLTIDP
+;;           LWZ          STW         MR          XXSPLTIDP   XXSPLTI32DX
 
 
 (define_insn "*mov<mode>_hardfloat32"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
             "=m,          d,          d,          <f64_p9>,   wY,
               <f64_av>,   Z,          <f64_vsx>,  <f64_vsx>,  !r,
-              Y,          r,          !r,         wa")
+              Y,          r,          !r,         wa,         wa")
 	(match_operand:FMOVE64 1 "input_operand"
              "d,          m,          d,          wY,         <f64_p9>,
               Z,          <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
-              r,          Y,          r,          eF"))]
+              r,          Y,          r,          eF,         eD"))]
   "! TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -7946,24 +7955,33 @@
    #
    #
    #
+   #
    #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, two,
-             store,       load,       two,        vecperm")
+             store,       load,       two,        vecperm,    vecperm")
    (set_attr "size" "64")
    (set_attr "length"
             "*,           *,          *,          *,          *,
              *,           *,          *,          *,          8,
-             8,           8,          8,          *")
+             8,           8,          8,          *,          *")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
-             *,           *,          *,          p10")
+             *,           *,          *,          p10,        p10")
    (set_attr "prefixed"
             "*,           *,          *,          *,          *,
              *,           *,          *,          *,          *,
-             *,           *,          *,          yes")])
+             *,           *,          *,          yes,        yes")
+   (set_attr "max_prefixed_insns"
+            "*,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *,          2")
+   (set_attr "num_insns"
+            "*,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *,          2")])
 
 ;;           STW      LWZ     MR      G-const H-const F-const
 
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 6620cdb7716..bd269369ca0 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -627,3 +627,7 @@ Generate (do not generate) the XXSPLTIW instruction.
 mxxspltidp
 Target Undocumented Mask(XXSPLTIDP) Var(rs6000_isa_flags)
 Generate (do not generate) the XXSPLTIDP instruction.
+
+mxxsplti32dx
+Target Undocumented Mask(XXSPLTI32DX) Var(rs6000_isa_flags)
+Generate (do not generate) the XXSPLTI32DX instruction.
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index f2dfef88c90..16d0555a684 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -370,6 +370,8 @@
    UNSPEC_VDIVES
    UNSPEC_VDIVEU
    UNSPEC_XXSPLTIDP
+   UNSPEC_XXSPLTI32DX
+   UNSPEC_XXSPLTI32DX_CONST
   ])
 
 (define_int_iterator XVCVBF16	[UNSPEC_VSX_XVCVSPBF16
@@ -1193,7 +1195,7 @@
    (set_attr "num_insns"
                "*,         *,         *,         2,         *,         2,
                 2,         2,         2,         2,         *,         *,
-                *,         5,         2,         *,         *")
+                *,         6,         2,         *,         *")
    (set_attr "max_prefixed_insns"
                "*,         *,         *,         *,         *,         2,
                 2,         2,         2,         2,         *,         *,
@@ -1201,7 +1203,7 @@
    (set_attr "length"
                "*,         *,         *,         8,         *,         8,
                 8,         8,         8,         8,         *,         *,
-                *,         20,        8,         *,         *")
+                *,         24,        8,         *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
                 *,         *,         *,         *,         p9v,       *,
@@ -1233,7 +1235,7 @@
                 vecstore,  vecload")
    (set_attr "length"
                "*,         *,         *,         16,        16,        16,
-                *,         *,         *,         20,        16,
+                *,         *,         *,         24,        16,
                 *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
@@ -6326,3 +6328,137 @@
   rs6000_emit_xxspltidp_v2df (operands[0], value);
   DONE;
 })
+
+;; XXSPLTI32DX used to create 64-bit constants or 32-bit vector constants where
+;; the even elements match and the odd elements match.
+(define_mode_iterator XXSPLTI32DX [SF DF V4SF V4SI V2DF V2DI])
+
+(define_insn_and_split "*xxsplti32dx_<mode>"
+  [(set (match_operand:XXSPLTI32DX 0 "vsx_register_operand" "=wa")
+	(match_operand:XXSPLTI32DX 1 "xxsplti32dx_operand"))]
+  "TARGET_XXSPLTI32DX"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+	(unspec:XXSPLTI32DX [(match_dup 2)
+			     (match_dup 3)] UNSPEC_XXSPLTI32DX_CONST))
+   (set (match_dup 0)
+	(unspec:XXSPLTI32DX [(match_dup 0)
+			     (match_dup 4)
+			     (match_dup 5)] UNSPEC_XXSPLTI32DX_CONST))]
+{
+  HOST_WIDE_INT value = 0;
+
+  if (!xxsplti32dx_constant_p (operands[1], <MODE>mode, &value))
+    gcc_unreachable ();
+
+  HOST_WIDE_INT high = value >> 32;
+  HOST_WIDE_INT low = ((value & 0xffffffff) ^ 0x80000000) - 0x80000000;
+
+  /* If the low bits are 0 or all 1s, initialize that word first.  This way we
+     can use a smaller XXSPLTIB instruction instead the first XXSPLTI32DX.  */
+  if (low == 0 || low ==  -1)
+    {
+      operands[2] = const1_rtx;
+      operands[3] = GEN_INT (low);
+      operands[4] = const0_rtx;
+      operands[5] = GEN_INT (high);
+    }
+  else
+    {
+      operands[2] = const0_rtx;
+      operands[3] = GEN_INT (high);
+      operands[4] = const1_rtx;
+      operands[5] = GEN_INT (low);
+    }
+}
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")
+   (set_attr "num_insns" "2")
+   (set_attr "max_prefixed_insns" "2")])
+
+;; First word of XXSPLTI32DX
+(define_insn "*xxsplti32dx_<mode>_first"
+  [(set (match_operand:XXSPLTI32DX 0 "vsx_register_operand" "=wa,wa,wa")
+	(unspec:XXSPLTI32DX [(match_operand 1 "u1bit_cint_operand" "n,n,n")
+			     (match_operand 2 "const_int_operand" "O,wM,n")]
+			    UNSPEC_XXSPLTI32DX_CONST))]
+  "TARGET_XXSPLTI32DX"
+  "@
+   xxspltib %x0,0
+   xxspltib %x0,255
+   xxsplti32dx %x0,%1,%2"
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "*,*,yes")])
+
+;; Second word of XXSPLTI32DX
+(define_insn "*xxsplti32dx_<mode>_second"
+  [(set (match_operand:XXSPLTI32DX 0 "vsx_register_operand" "=wa")
+	(unspec:XXSPLTI32DX [(match_operand:XXSPLTI32DX 1 "vsx_register_operand" "0")
+			     (match_operand 2 "u1bit_cint_operand" "n")
+			     (match_operand 3 "const_int_operand" "n")]
+			    UNSPEC_XXSPLTI32DX_CONST))]
+  "TARGET_XXSPLTI32DX"
+  "xxsplti32dx %x0,%2,%3"
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
+;; XXSPLTI32DX built-in support.
+(define_expand "xxsplti32dx_v4si"
+  [(set (match_operand:V4SI 0 "register_operand" "=wa")
+	(unspec:V4SI [(match_operand:V4SI 1 "register_operand" "0")
+		      (match_operand:QI 2 "u1bit_cint_operand" "n")
+		      (match_operand:SI 3 "s32bit_cint_operand" "n")]
+		     UNSPEC_XXSPLTI32DX))]
+ "TARGET_POWER10"
+{
+  int index = INTVAL (operands[2]);
+
+  if (!BYTES_BIG_ENDIAN)
+    index = 1 - index;
+
+   emit_insn (gen_xxsplti32dx_v4si_inst (operands[0], operands[1],
+					 GEN_INT (index), operands[3]));
+   DONE;
+}
+ [(set_attr "type" "vecsimple")])
+
+(define_insn "xxsplti32dx_v4si_inst"
+  [(set (match_operand:V4SI 0 "register_operand" "=wa")
+	(unspec:V4SI [(match_operand:V4SI 1 "register_operand" "0")
+		      (match_operand:QI 2 "u1bit_cint_operand" "n")
+		      (match_operand:SI 3 "s32bit_cint_operand" "n")]
+		     UNSPEC_XXSPLTI32DX))]
+  "TARGET_POWER10"
+  "xxsplti32dx %x0,%2,%3"
+  [(set_attr "type" "vecsimple")
+   (set_attr "prefixed" "yes")])
+
+(define_expand "xxsplti32dx_v4sf"
+  [(set (match_operand:V4SF 0 "register_operand" "=wa")
+	(unspec:V4SF [(match_operand:V4SF 1 "register_operand" "0")
+		      (match_operand:QI 2 "u1bit_cint_operand" "n")
+		      (match_operand:SF 3 "const_double_operand" "n")]
+		     UNSPEC_XXSPLTI32DX))]
+  "TARGET_POWER10"
+{
+  int index = INTVAL (operands[2]);
+  long value = rs6000_const_f32_to_i32 (operands[3]);
+  if (!BYTES_BIG_ENDIAN)
+    index = 1 - index;
+
+   emit_insn (gen_xxsplti32dx_v4sf_inst (operands[0], operands[1],
+					 GEN_INT (index), GEN_INT (value)));
+   DONE;
+})
+
+(define_insn "xxsplti32dx_v4sf_inst"
+  [(set (match_operand:V4SF 0 "register_operand" "=wa")
+	(unspec:V4SF [(match_operand:V4SF 1 "register_operand" "0")
+		      (match_operand:QI 2 "u1bit_cint_operand" "n")
+		      (match_operand:SI 3 "s32bit_cint_operand" "n")]
+		     UNSPEC_XXSPLTI32DX))]
+  "TARGET_POWER10"
+  "xxsplti32dx %x0,%2,%3"
+  [(set_attr "type" "vecsimple")
+   (set_attr "prefixed" "yes")])
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
index 8f6e176f9af..1435ef4ef4f 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
@@ -48,13 +48,16 @@ scalar_double_m_inf (void)	/* XXSPLTIDP.  */
 double
 scalar_double_pi (void)
 {
-  return M_PI;			/* PLFD.  */
+  return M_PI;			/* 2x XXSPLTI32DX.  */
 }
 
 double
 scalar_double_denorm (void)
 {
-  return 0x1p-149f;		/* PLFD.  */
+  return 0x1p-149f;		/* XXSPLTIB, XXSPLTI32DX.  */
 }
 
-/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
+/* { dg-final { scan-assembler-times {\mxxspltidp\M}   5 } } */
+/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */
+/* { dg-final { scan-assembler-not   {\mplfd\M}          } } */
+/* { dg-final { scan-assembler-not   {\mplxsd\M}         } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
index 72504bdfbbd..e9a45d5159d 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
@@ -57,4 +57,7 @@ scalar_float_denorm (void)
   return 0x1p-149f;		/* PLFS.  */
 }
 
-/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */
+/* { dg-final { scan-assembler-times {\mxxspltidp\M}   6 } } */
+/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 1 } } */
+/* { dg-final { scan-assembler-not   {\mplfs\M}          } } */
+/* { dg-final { scan-assembler-not   {\mplxssp\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
index d509459292c..d81198b163d 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
@@ -51,14 +51,16 @@ v2df_double_m_inf (void)
 vector double
 v2df_double_pi (void)
 {
-  return (vector double) { M_PI, M_PI };		/* PLFD.  */
+  return (vector double) { M_PI, M_PI };		/* 2x XXSPLTI32DX.  */
 }
 
 vector double
 v2df_double_denorm (void)
 {
-  return (vector double) { (double)0x1p-149f,
-			   (double)0x1p-149f };		/* PLFD.  */
+  return (vector double) { (double)0x1p-149f,		/* XXSPLTIB, */
+			   (double)0x1p-149f };		/* XXSPLTI32DX.  */
 }
 
-/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
+/* { dg-final { scan-assembler-times {\mxxspltidp\M}   5 } } */
+/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}          } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
index 06a8289d09b..f0eb982eadf 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -162,4 +162,4 @@ main (int argc, char *argv [])
 
 /* { dg-final { scan-assembler-times {\mxxspltiw\M} 1 } } */
 /* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
-/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */
+/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 4 } } */


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2021-04-15 17:56 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-15 17:56 [gcc(refs/users/meissner/heads/work048)] Use XXSPLTI32DX to generate some constants Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).