[gcc(refs/users/meissner/heads/work053)] Generate XXSPLTIW on power10.

public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed

* [gcc(refs/users/meissner/heads/work053)] Generate XXSPLTIW on power10.
@ 2021-05-19 20:55 Michael Meissner
  0 siblings, 0 replies; 7+ messages in thread
From: Michael Meissner @ 2021-05-19 20:55 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:b9326a506d55757e22a44b8e7a79d506d70e9b63

commit b9326a506d55757e22a44b8e7a79d506d70e9b63
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Wed May 19 16:54:56 2021 -0400

    Generate XXSPLTIW on power10.
    
    This patch adds support to automatically generate the ISA 3.1 XXSPLTIW
    instruction for V8HImode, V4SImode, and V4SFmode vectors.  It does this by
    adding support for vector constants that can be used, and adding a
    VEC_DUPLICATE pattern to generate the actual XXSPLTIW instruction.
    
    I rewrote the XXSPLTW built-in functions to use VEC_DUPLICATE instead of
    UNSPEC.
    
    This patch also updates the insn counts in the vec-splati-runnable.c test to
    work with the new option to use XXSPLTIW to load up some vector constants.
    
    I added 3 new tests to test loading up V8HI, V4SI, and V4SF vector
    constants.
    
    1 test needed to turn off power10 code generation so that the expected
    instructions would be correct.
    
    gcc/
    2021-05-19  Michael Meissner  <meissner@linux.ibm.com>
    
            * config/rs6000/predicates.md (xxspltiw_operand): New predicate.
            (easy_vector_constant): If we can use XXSPLTIW, the vector
            constant is easy.
            * config/rs6000/rs6000-cpus.def (ISA_3_1_MASKS_SERVER): Add
            -mxxspltiw support.
            (POWERPC_MASKS): Add -mxxspltiw support.
            * config/rs6000/rs6000-protos.h (sign_extend_mode_constant): New
            declaration.
            (zero_extend_mode_constant); New declaration.
            * config/rs6000/rs6000.c (rs6000_option_override_internal): Add
            -mxxspltiw support.
            (xxspltib_constant_p): If we can generate XXSPLTIW, don't generate
            a XXSPLTIB and an extend instruction.
            (output_vec_const_move): Add support for loading up vector
            constants with XXSPLTIW.
            (rs6000_opt_masks): Add -mxxspltiw.
            (sign_extend_mode_constant): New function.
            (zero_extend_mode_constant): New function.
            * config/rs6000/rs6000.opt (-mxxspltiw): New debug switch.
            * config/rs6000/vsx.md (UNSPEC_XXSPLTIW): Delete.
            (xxspltiw_v8hi): New insn.
            (xxspltiw_v4si): Rewrite to generate a vector constant.
            (xxspltiw_v4sf): Rewrite to generate a vector constant.
            (xxspltiw_v4si_inst): Delete.
            (xxspltiw_v4sf_inst): Delete.
            (xxspltiw_v8hi_dup): New insn.
            (xxspltiw_v4si_dup): New insn.
            (xxspltiw_v4sf_dup): New insn.
            (XXSPLTIW): New mode iterator.
            (XXSPLTIW splitter): New insn splitter for XXSPLTIW.
    
    gcc/testsuite/
    2021-05-19  Michael Meissner  <meissner@linux.ibm.com>
    
            * gcc.target/powerpc/pr86731-fwrapv.c: Turn off power10 code
            generation.
            * gcc.target/powerpc/vec-splati-runnable.c: Update insn counts.
            * gcc.target/powerpc/vec-splat-constant-v4sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4si.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v8hi.c: New test.

Diff:
---
 gcc/config/rs6000/predicates.md                    |  29 ++++
 gcc/config/rs6000/rs6000-cpus.def                  |   7 +-
 gcc/config/rs6000/rs6000-protos.h                  |   3 +
 gcc/config/rs6000/rs6000.c                         |  77 ++++++++++-
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           | 151 ++++++++++++++++-----
 gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv.c  |   7 +
 .../gcc.target/powerpc/vec-splat-constant-v4sf.c   |  66 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-v4si.c   |  51 +++++++
 .../gcc.target/powerpc/vec-splat-constant-v8hi.c   |  53 ++++++++
 .../gcc.target/powerpc/vec-splati-runnable.c       |   4 +-
 11 files changed, 412 insertions(+), 40 deletions(-)

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index e21bc745f72..bf678f429af 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -640,6 +640,32 @@
   return num_insns == 1;
 })
 
+;; Return 1 if the operand is a CONST_VECTOR that can be loaded with the
+;; XXSPLTIW instruction.  Do not return 1 if the constant can be generated with
+;; XXSPLTIB or VSPLTIS{H,W}
+(define_predicate "xxspltiw_operand"
+  (match_code "const_vector")
+{
+  if (!TARGET_XXSPLTIW)
+    return false;
+
+  if (mode != V8HImode && mode != V4SImode && mode != V4SFmode)
+    return false;
+
+  rtx element = CONST_VECTOR_ELT (op, 0);
+  for (size_t i = 1; i < GET_MODE_NUNITS (mode); i++)
+    if (!rtx_equal_p (element, CONST_VECTOR_ELT (op, i)))
+      return false;
+
+  if (element == CONST0_RTX (GET_MODE_INNER (mode)))
+    return false;
+
+  if (CONST_INT_P (element) && EASY_VECTOR_15 (INTVAL (element)))
+    return false;
+
+  return true;
+})
+
 ;; Return 1 if the operand is a CONST_VECTOR and can be loaded into a
 ;; vector register without using memory.
 (define_predicate "easy_vector_constant"
@@ -653,6 +679,9 @@
       if (zero_constant (op, mode) || all_ones_constant (op, mode))
 	return true;
 
+      if (xxspltiw_operand (op, mode))
+	return true;
+
       if (TARGET_P9_VECTOR
           && xxspltib_constant_p (op, mode, &num_insns, &value))
 	return true;
diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def
index cbbb42c1b3a..a21a95bc7aa 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -85,7 +85,8 @@
 				 | OTHER_POWER10_MASKS			\
 				 | OPTION_MASK_P10_FUSION		\
 				 | OPTION_MASK_P10_FUSION_LD_CMPI	\
-				 | OPTION_MASK_P10_FUSION_2LOGICAL)
+				 | OPTION_MASK_P10_FUSION_2LOGICAL	\
+				 | OPTION_MASK_XXSPLTIW)
 
 /* Flags that need to be turned off if -mno-power9-vector.  */
 #define OTHER_P9_VECTOR_MASKS	(OPTION_MASK_FLOAT128_HW		\
@@ -160,8 +161,8 @@
 				 | OPTION_MASK_RECIP_PRECISION		\
 				 | OPTION_MASK_SOFT_FLOAT		\
 				 | OPTION_MASK_STRICT_ALIGN_OPTIONAL	\
-				 | OPTION_MASK_VSX)
-
+				 | OPTION_MASK_VSX			\
+				 | OPTION_MASK_XXSPLTIW)
 #endif
 
 /* This table occasionally claims that a processor does not support a
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index c407034d58c..117724be1ff 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -283,6 +283,9 @@ extern void rs6000_asm_output_dwarf_pcrel (FILE *file, int size,
 extern void rs6000_asm_output_dwarf_datarel (FILE *file, int size,
 					     const char *label);
 extern long rs6000_const_f32_to_i32 (rtx operand);
+extern HOST_WIDE_INT sign_extend_mode_constant (machine_mode, HOST_WIDE_INT);
+extern unsigned HOST_WIDE_INT zero_extend_mode_constant (machine_mode,
+							 HOST_WIDE_INT);
 
 /* Declare functions in rs6000-c.c */
 
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index ef1ebaaee05..c313b2ae7e6 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4487,6 +4487,12 @@ rs6000_option_override_internal (bool global_init_p)
   if (!TARGET_PCREL && TARGET_PCREL_OPT)
     rs6000_isa_flags &= ~OPTION_MASK_PCREL_OPT;
 
+  if (TARGET_POWER10 && TARGET_VSX
+      && (rs6000_isa_flags_explicit & OPTION_MASK_XXSPLTIW) == 0)
+    rs6000_isa_flags |= OPTION_MASK_XXSPLTIW;
+  else if (!TARGET_POWER10 || !TARGET_VSX)
+    rs6000_isa_flags &= ~OPTION_MASK_XXSPLTIW;
+
   if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET)
     rs6000_print_isa_options (stderr, 0, "after subtarget", rs6000_isa_flags);
 
@@ -6464,9 +6470,11 @@ xxspltib_constant_p (rtx op,
 
   /* See if we could generate vspltisw/vspltish directly instead of xxspltib +
      sign extend.  Special case 0/-1 to allow getting any VSX register instead
-     of an Altivec register.  */
-  if ((mode == V4SImode || mode == V8HImode) && !IN_RANGE (value, -1, 0)
-      && EASY_VECTOR_15 (value))
+     of an Altivec register.  Also if we can generate a XXSPLTIW instruction,
+     don't emit a XXSPLTIB and an extend instruction.  */
+  if ((mode == V4SImode || mode == V8HImode)
+      && !IN_RANGE (value, -1, 0)
+      && (EASY_VECTOR_15 (value) || TARGET_XXSPLTIW))
     return false;
 
   /* Return # of instructions and the constant byte for XXSPLTIB.  */
@@ -6527,6 +6535,9 @@ output_vec_const_move (rtx *operands)
 	    gcc_unreachable ();
 	}
 
+      if (xxspltiw_operand (vec, mode))
+	return "#";
+
       if (TARGET_P9_VECTOR
 	  && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
 	{
@@ -24118,6 +24129,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] =
   { "string",			0,				false, true  },
   { "update",			OPTION_MASK_NO_UPDATE,		true , true  },
   { "vsx",			OPTION_MASK_VSX,		false, true  },
+  { "xxspltiw",			OPTION_MASK_XXSPLTIW,		false, true  },
 #ifdef OPTION_MASK_64BIT
 #if TARGET_AIX_OS
   { "aix64",			OPTION_MASK_64BIT,		false, false },
@@ -27980,6 +27992,65 @@ rs6000_output_addr_vec_elt (FILE *file, int value)
   fprintf (file, "\n");
 }
 
+/* Sign extend integer values to a given mode.  */
+HOST_WIDE_INT
+sign_extend_mode_constant (machine_mode mode, HOST_WIDE_INT value)
+{
+  HOST_WIDE_INT mask1;
+  HOST_WIDE_INT mask2;
+
+  switch (mode)
+    {
+    default:
+      gcc_unreachable ();
+
+    case E_QImode:
+      mask1 = HOST_WIDE_INT_C (0xff);
+      mask2 = HOST_WIDE_INT_C (0x80);
+      break;
+
+    case E_HImode:
+      mask1 = HOST_WIDE_INT_C (0xffff);
+      mask2 = HOST_WIDE_INT_C (0x8000);
+      break;
+
+    case E_SImode:
+      mask1 = HOST_WIDE_INT_C (0xffffffff);
+      mask2 = HOST_WIDE_INT_C (0x80000000);
+      break;
+    }
+
+  return (((value & mask1) ^ mask2) - mask2);
+}
+
+/* Zero extend integer values to a given mode.  */
+unsigned HOST_WIDE_INT
+zero_extend_mode_constant (machine_mode mode, HOST_WIDE_INT value)
+{
+  unsigned HOST_WIDE_INT uvalue = (unsigned HOST_WIDE_INT) value;
+  unsigned HOST_WIDE_INT mask;
+
+  switch (mode)
+    {
+    default:
+      gcc_unreachable ();
+
+    case E_QImode:
+      mask = HOST_WIDE_INT_UC (0xff);
+      break;
+
+    case E_HImode:
+      mask = HOST_WIDE_INT_UC (0xffff);
+      break;
+
+    case E_SImode:
+      mask = HOST_WIDE_INT_UC (0xffffffff);
+      break;
+    }
+
+  return uvalue & mask;
+}
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-rs6000.h"
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 2685fa71517..5e282d3741c 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -627,3 +627,7 @@ Enable instructions that guard against return-oriented programming attacks.
 mprivileged
 Target Var(rs6000_privileged) Init(0)
 Generate code that will run in privileged state.
+
+mxxspltiw
+Target Undocumented Mask(XXSPLTIW) Var(rs6000_isa_flags)
+Generate (do not generate) XXSPLTIW instructions.
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 15a8c0e22d8..a6218d4d8e1 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -386,7 +386,6 @@
    UNSPEC_VDIVES
    UNSPEC_VDIVEU
    UNSPEC_XXEVAL
-   UNSPEC_XXSPLTIW
    UNSPEC_XXSPLTID
    UNSPEC_XXSPLTI32DX
    UNSPEC_XXBLEND
@@ -6239,36 +6238,6 @@
   "vmulld %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
-;; XXSPLTIW built-in function support
-(define_insn "xxspltiw_v4si"
-  [(set (match_operand:V4SI 0 "register_operand" "=wa")
-	(unspec:V4SI [(match_operand:SI 1 "s32bit_cint_operand" "n")]
-		     UNSPEC_XXSPLTIW))]
- "TARGET_POWER10"
- "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")
-  (set_attr "prefixed" "yes")])
-
-(define_expand "xxspltiw_v4sf"
-  [(set (match_operand:V4SF 0 "register_operand" "=wa")
-	(unspec:V4SF [(match_operand:SF 1 "const_double_operand" "n")]
-		     UNSPEC_XXSPLTIW))]
- "TARGET_POWER10"
-{
-  long long value = rs6000_const_f32_to_i32 (operands[1]);
-  emit_insn (gen_xxspltiw_v4sf_inst (operands[0], GEN_INT (value)));
-  DONE;
-})
-
-(define_insn "xxspltiw_v4sf_inst"
-  [(set (match_operand:V4SF 0 "register_operand" "=wa")
-	(unspec:V4SF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
-		     UNSPEC_XXSPLTIW))]
- "TARGET_POWER10"
- "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")
-  (set_attr "prefixed" "yes")])
-
 ;; XXSPLTIDP built-in function support
 (define_expand "xxspltidp_v2df"
   [(set (match_operand:V2DF 0 "register_operand" )
@@ -6420,3 +6389,123 @@
    [(set_attr "type" "vecsimple")
     (set_attr "prefixed" "yes")])
 
+;; XXSPLTIW built-in function support.  Convert to a vector constant, which
+;; will then be optimized to the XXSPLTIW instruction.
+(define_expand "xxspltiw_v4si"
+  [(use (match_operand:V4SI 0 "register_operand"))
+   (use (match_operand:SI 1 "s32bit_cint_operand"))]
+  "TARGET_POWER10"
+{
+  rtx op1 = operands[1];
+  rtvec rv = gen_rtvec (4, op1, op1, op1, op1);
+  rtx vec_constant = gen_rtx_CONST_VECTOR (V4SImode, rv);
+  emit_move_insn (operands[0], vec_constant);
+})
+
+(define_expand "xxspltiw_v4sf"
+  [(use (match_operand:V4SF 0 "register_operand"))
+   (use (match_operand:SF 1 "const_double_operand"))]
+  "TARGET_POWER10"
+{
+  rtx op1 = operands[1];
+  rtvec rv = gen_rtvec (4, op1, op1, op1, op1);
+  rtx vec_constant = gen_rtx_CONST_VECTOR (V4SFmode, rv);
+  emit_move_insn (operands[0], vec_constant);
+})
+
+;; XXSPLTIW support.  Add support for the XXSPLTIW built-in functions, and to
+;; use XXSPLTIW to load up vector V8HImode, V4SImode, and V4SFmode vector
+;; constants where all elements are the the same.  We special case loading up
+;; integer -16..15 and floating point 0.0f, since we can use the shorter
+;; XXSPLTIB, VSPLTISH, and VSPLTISW instructions.
+
+(define_insn "*xxspltiw_v8hi_dup"
+  [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa,wa,v,wa")
+	(vec_duplicate:V8HI
+	 (match_operand 1 "const_int_operand" "O,wM,wB,n")))]
+ "TARGET_XXSPLTIW"
+{
+  HOST_WIDE_INT value = INTVAL (operands[1]);
+  unsigned HOST_WIDE_INT uns_value = zero_extend_mode_constant (HImode, value);
+  HOST_WIDE_INT sign_value = sign_extend_mode_constant (HImode, uns_value);
+
+  if (sign_value == 0)
+    return "xxspltib %x0,0";
+
+  if (sign_value == -1)
+    return "xxspltib %x0,255";
+
+  int r = reg_or_subregno (operands[0]);
+  if (ALTIVEC_REGNO_P (r) && EASY_VECTOR_15 (sign_value))
+    {
+      operands[2] = GEN_INT (sign_value);
+      return "vspltish %0,%1";
+    }
+
+  /* The assembler doesn't like negative values.  */
+  HOST_WIDE_INT new_value = (uns_value << 16) | uns_value;
+  operands[2] = GEN_INT (zero_extend_mode_constant (SImode, new_value));
+  return "xxspltiw %x0,%2";
+}
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "*,*,*,yes")])
+
+(define_insn "*xxspltiw_v4si_dup"
+  [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa,wa,v,wa")
+	(vec_duplicate:V4SI
+	 (match_operand 1 "const_int_operand" "O,wM,wB,n")))]
+ "TARGET_XXSPLTIW"
+{
+  HOST_WIDE_INT value = INTVAL (operands[1]);
+  unsigned HOST_WIDE_INT uns_value = zero_extend_mode_constant (SImode, value);
+  HOST_WIDE_INT sign_value = sign_extend_mode_constant (SImode, uns_value);
+
+  if (sign_value == 0)
+    return "xxspltib %x0,0";
+
+  if (sign_value == -1)
+    return "xxspltib %x0,255";
+
+  int r = reg_or_subregno (operands[0]);
+  if (ALTIVEC_REGNO_P (r) && EASY_VECTOR_15 (sign_value))
+    {
+      operands[2] = GEN_INT (sign_value);
+      return "vspltisw %0,%2";
+    }
+
+  /* The assembler doesn't like negative values.  */
+  operands[2] = GEN_INT (uns_value);
+  return "xxspltiw %x0,%2";
+}
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "*,*,*,yes")])
+
+(define_insn "xxspltiw_v4sf_dup"
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
+	(vec_duplicate:V4SF
+	 (match_operand:SF 1 "const_double_operand" "O,F")))]
+ "TARGET_XXSPLTIW"
+{
+  if (operands[1] == CONST0_RTX (SFmode))
+    return "xxspltib %x0,0";
+
+  /* The assembler doesn't like negative values.  */
+  long value = rs6000_const_f32_to_i32 (operands[1]);
+  operands[2] = GEN_INT (zero_extend_mode_constant (SImode, value));
+  return "xxspltiw %x0,%2";
+}
+ [(set_attr "type" "vecsimple")
+  (set_attr "prefixed" "*,yes")])
+
+;; Convert vector constant to vec_duplicate.
+(define_mode_iterator XXSPLTIW [V8HI V4SI V4SF])
+
+(define_split
+  [(set (match_operand:XXSPLTIW 0 "vsx_register_operand")
+	(match_operand:XXSPLTIW 1 "xxspltiw_operand"))]
+  "TARGET_XXSPLTIW && GET_CODE (operands[1]) == CONST_VECTOR"
+  [(set (match_dup 0)
+	(vec_duplicate:<MODE> (match_dup 2)))]
+{
+  operands[2] = CONST_VECTOR_ELT (operands[1], 0);
+})
diff --git a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv.c b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv.c
index f312550f04d..8a00aaca70d 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv.c
@@ -8,6 +8,13 @@
 /* { dg-require-effective-target lp64 } */
 /* { dg-options "-maltivec -O3 -fwrapv " } */
 
+/* If the compiler is generating power10 instructions, turn it off.  Otherwise,
+   it will generate a XXSPLTIW instruction instead of LXV/LXVD2X.  */
+
+#ifdef _ARCH_PWR10
+#pragma GCC target ("cpu=power9")
+#endif
+
 #include <altivec.h>
 /* original test as reported.  */
 vector unsigned int splat(void)
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
new file mode 100644
index 00000000000..06830b02076
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
@@ -0,0 +1,66 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SF vector constants.  */
+
+vector float
+v4sf_const_1 (void)
+{
+  return (vector float) { 1.0f, 1.0f, 1.0f, 1.0f };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_nan (void)
+{
+  return (vector float) { __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf ("") };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_inf (void)
+{
+  return (vector float) { __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff () };		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_m0 (void)
+{
+  return (vector float) { -0.0f, -0.0f, -0.0f, -0.0f };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_1 (void)
+{
+  return vec_splats (1.0f);				/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_nan (void)
+{
+  return vec_splats (__builtin_nanf (""));		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_inf (void)
+{
+  return vec_splats (__builtin_inff ());		/* XXSPLTIW.  */
+}
+
+vector float
+v8hi_splats_m0 (void)
+{
+  return vec_splats (-0.0f);				/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  8 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
new file mode 100644
index 00000000000..02d0c6d66a2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VEXTSB2W) is not done.  */
+
+vector int
+v4si_const_1 (void)
+{
+  return (vector int) { 1, 1, 1, 1 };			/* VSLTPISW.  */
+}
+
+vector int
+v4si_const_126 (void)
+{
+  return (vector int) { 126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector int
+v4si_const_1023 (void)
+{
+  return (vector int) { 1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector int
+v4si_splats_1 (void)
+{
+  return vec_splats (1);				/* VSLTPISW.  */
+}
+
+vector int
+v4si_splats_126 (void)
+{
+  return vec_splats (126);				/* XXSPLTIW.  */
+}
+
+vector int
+v8hi_splats_1023 (void)
+{
+  return vec_splats (1023);				/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mvspltisw\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvextsb2w\M}    } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
new file mode 100644
index 00000000000..e6d0fab6d67
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
@@ -0,0 +1,53 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V8HI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VUPKLSB) is not done.  */
+
+vector short
+v8hi_const_1 (void)
+{
+  return (vector short) { 1, 1, 1, 1, 1, 1, 1, 1 };	/* VSLTPISH.  */
+}
+
+vector short
+v8hi_const_126 (void)
+{
+  return (vector short) { 126, 126, 126, 126,
+			  126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_const_1023 (void)
+{
+  return (vector short) { 1023, 1023, 1023, 1023,
+			  1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1 (void)
+{
+  return vec_splats ((short)1);				/* VSLTPISH.  */
+}
+
+vector short
+v8hi_splats_126 (void)
+{
+  return vec_splats ((short)126);			/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1023 (void)
+{
+  return vec_splats ((short)1023);			/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mvspltish\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvupklsb\M}     } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
index a135279b1d7..f49ef91422e 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -149,8 +149,6 @@ main (int argc, char *argv [])
   return 0;
 }
 
-/* { dg-final { scan-assembler-times {\mxxspltiw\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 1 } } */
 /* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */
-
-


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [gcc(refs/users/meissner/heads/work053)] Generate XXSPLTIW on power10.
@ 2021-05-20 14:25 Michael Meissner
  0 siblings, 0 replies; 7+ messages in thread
From: Michael Meissner @ 2021-05-20 14:25 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:2ea55cd4aba018853b8052385f56030589bc8cb8

commit 2ea55cd4aba018853b8052385f56030589bc8cb8
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Thu May 20 10:24:38 2021 -0400

    Generate XXSPLTIW on power10.
    
    This patch adds support to automatically generate the ISA 3.1 XXSPLTIW
    instruction for V8HImode, V4SImode, and V4SFmode vectors.  It does this by
    adding support for vector constants that can be used, and adding a
    VEC_DUPLICATE pattern to generate the actual XXSPLTIW instruction.
    
    I rewrote the XXSPLTW built-in functions to use VEC_DUPLICATE instead of
    UNSPEC.
    
    This patch also updates the insn counts in the vec-splati-runnable.c test to
    work with the new option to use XXSPLTIW to load up some vector constants.
    
    I added 3 new tests to test loading up V8HI, V4SI, and V4SF vector
    constants.
    
    1 test needed to turn off power10 code generation so that the expected
    instructions would be correct.
    
    gcc/
    2021-05-20  Michael Meissner  <meissner@linux.ibm.com>
    
            * config/rs6000/predicates.md (xxspltiw_operand): New predicate.
            (easy_vector_constant): If we can use XXSPLTIW, the vector
            constant is easy.
            * config/rs6000/rs6000-cpus.def (ISA_3_1_MASKS_SERVER): Add
            -mxxspltiw support.
            (POWERPC_MASKS): Add -mxxspltiw support.
            * config/rs6000/rs6000.c (rs6000_option_override_internal): Add
            -mxxspltiw support.
            (xxspltib_constant_p): If we can generate XXSPLTIW, don't generate
            a XXSPLTIB and an extend instruction.
            (output_vec_const_move): Add support for loading up vector
            constants with XXSPLTIW.
            (rs6000_opt_masks): Add -mxxspltiw.
            * config/rs6000/rs6000.h (SIGN_EXTEND_8BIT): New macro.
            (SIGN_EXTEND_16BIT): New macro.
            (SIGN_EXTEND_32BIT): New macro.
            * config/rs6000/rs6000.opt (-mxxspltiw): New debug switch.
            * config/rs6000/vsx.md (UNSPEC_XXSPLTIW): Delete.
            (xxspltiw_v8hi): New insn.
            (xxspltiw_v4si): Rewrite to generate a vector constant.
            (xxspltiw_v4sf): Rewrite to generate a vector constant.
            (xxspltiw_v4si_inst): Delete.
            (xxspltiw_v4sf_inst): Delete.
            (xxspltiw_v8hi_dup): New insn.
            (xxspltiw_v4si_dup): New insn.
            (xxspltiw_v4sf_dup): New insn.
            (XXSPLTIW): New mode iterator.
            (XXSPLTIW splitter): New insn splitter for XXSPLTIW.
    
    gcc/testsuite/
    2021-05-20  Michael Meissner  <meissner@linux.ibm.com>
    
            * gcc.target/powerpc/pr86731-fwrapv.c: Turn off power10 code
            generation.
            * gcc.target/powerpc/vec-splati-runnable.c: Update insn counts.
            * gcc.target/powerpc/vec-splat-constant-v4sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4si.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v8hi.c: New test.

Diff:
---
 gcc/config/rs6000/predicates.md                    |  29 ++++
 gcc/config/rs6000/rs6000-cpus.def                  |   7 +-
 gcc/config/rs6000/rs6000.c                         |  18 ++-
 gcc/config/rs6000/rs6000.h                         |  19 +++
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           | 146 ++++++++++++++++-----
 gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv.c  |   7 +
 .../gcc.target/powerpc/vec-splat-constant-v4sf.c   |  66 ++++++++++
 .../gcc.target/powerpc/vec-splat-constant-v4si.c   |  51 +++++++
 .../gcc.target/powerpc/vec-splat-constant-v8hi.c   |  53 ++++++++
 .../gcc.target/powerpc/vec-splati-runnable.c       |   4 +-
 11 files changed, 364 insertions(+), 40 deletions(-)

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index e21bc745f72..bf678f429af 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -640,6 +640,32 @@
   return num_insns == 1;
 })
 
+;; Return 1 if the operand is a CONST_VECTOR that can be loaded with the
+;; XXSPLTIW instruction.  Do not return 1 if the constant can be generated with
+;; XXSPLTIB or VSPLTIS{H,W}
+(define_predicate "xxspltiw_operand"
+  (match_code "const_vector")
+{
+  if (!TARGET_XXSPLTIW)
+    return false;
+
+  if (mode != V8HImode && mode != V4SImode && mode != V4SFmode)
+    return false;
+
+  rtx element = CONST_VECTOR_ELT (op, 0);
+  for (size_t i = 1; i < GET_MODE_NUNITS (mode); i++)
+    if (!rtx_equal_p (element, CONST_VECTOR_ELT (op, i)))
+      return false;
+
+  if (element == CONST0_RTX (GET_MODE_INNER (mode)))
+    return false;
+
+  if (CONST_INT_P (element) && EASY_VECTOR_15 (INTVAL (element)))
+    return false;
+
+  return true;
+})
+
 ;; Return 1 if the operand is a CONST_VECTOR and can be loaded into a
 ;; vector register without using memory.
 (define_predicate "easy_vector_constant"
@@ -653,6 +679,9 @@
       if (zero_constant (op, mode) || all_ones_constant (op, mode))
 	return true;
 
+      if (xxspltiw_operand (op, mode))
+	return true;
+
       if (TARGET_P9_VECTOR
           && xxspltib_constant_p (op, mode, &num_insns, &value))
 	return true;
diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def
index cbbb42c1b3a..a21a95bc7aa 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -85,7 +85,8 @@
 				 | OTHER_POWER10_MASKS			\
 				 | OPTION_MASK_P10_FUSION		\
 				 | OPTION_MASK_P10_FUSION_LD_CMPI	\
-				 | OPTION_MASK_P10_FUSION_2LOGICAL)
+				 | OPTION_MASK_P10_FUSION_2LOGICAL	\
+				 | OPTION_MASK_XXSPLTIW)
 
 /* Flags that need to be turned off if -mno-power9-vector.  */
 #define OTHER_P9_VECTOR_MASKS	(OPTION_MASK_FLOAT128_HW		\
@@ -160,8 +161,8 @@
 				 | OPTION_MASK_RECIP_PRECISION		\
 				 | OPTION_MASK_SOFT_FLOAT		\
 				 | OPTION_MASK_STRICT_ALIGN_OPTIONAL	\
-				 | OPTION_MASK_VSX)
-
+				 | OPTION_MASK_VSX			\
+				 | OPTION_MASK_XXSPLTIW)
 #endif
 
 /* This table occasionally claims that a processor does not support a
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index ef1ebaaee05..f0984e9fec5 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4487,6 +4487,12 @@ rs6000_option_override_internal (bool global_init_p)
   if (!TARGET_PCREL && TARGET_PCREL_OPT)
     rs6000_isa_flags &= ~OPTION_MASK_PCREL_OPT;
 
+  if (TARGET_POWER10 && TARGET_VSX
+      && (rs6000_isa_flags_explicit & OPTION_MASK_XXSPLTIW) == 0)
+    rs6000_isa_flags |= OPTION_MASK_XXSPLTIW;
+  else if (!TARGET_POWER10 || !TARGET_VSX)
+    rs6000_isa_flags &= ~OPTION_MASK_XXSPLTIW;
+
   if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET)
     rs6000_print_isa_options (stderr, 0, "after subtarget", rs6000_isa_flags);
 
@@ -6464,9 +6470,11 @@ xxspltib_constant_p (rtx op,
 
   /* See if we could generate vspltisw/vspltish directly instead of xxspltib +
      sign extend.  Special case 0/-1 to allow getting any VSX register instead
-     of an Altivec register.  */
-  if ((mode == V4SImode || mode == V8HImode) && !IN_RANGE (value, -1, 0)
-      && EASY_VECTOR_15 (value))
+     of an Altivec register.  Also if we can generate a XXSPLTIW instruction,
+     don't emit a XXSPLTIB and an extend instruction.  */
+  if ((mode == V4SImode || mode == V8HImode)
+      && !IN_RANGE (value, -1, 0)
+      && (EASY_VECTOR_15 (value) || TARGET_XXSPLTIW))
     return false;
 
   /* Return # of instructions and the constant byte for XXSPLTIB.  */
@@ -6527,6 +6535,9 @@ output_vec_const_move (rtx *operands)
 	    gcc_unreachable ();
 	}
 
+      if (xxspltiw_operand (vec, mode))
+	return "#";
+
       if (TARGET_P9_VECTOR
 	  && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
 	{
@@ -24118,6 +24129,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] =
   { "string",			0,				false, true  },
   { "update",			OPTION_MASK_NO_UPDATE,		true , true  },
   { "vsx",			OPTION_MASK_VSX,		false, true  },
+  { "xxspltiw",			OPTION_MASK_XXSPLTIW,		false, true  },
 #ifdef OPTION_MASK_64BIT
 #if TARGET_AIX_OS
   { "aix64",			OPTION_MASK_64BIT,		false, false },
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index b0e1953aeae..c1a811feec8 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2623,3 +2623,22 @@ while (0)
        rs6000_asm_output_opcode (STREAM);				\
     }									\
   while (0)
+
+/* Provide macros for sign-extending values.  */
+#if HOST_BITS_PER_CHAR == 8
+#define SIGN_EXTEND_8BIT(X) ((HOST_WIDE_INT)(signed char)(X))
+#else
+#define SIGN_EXTEND_8BIT(X) ((((X) & 0xff) ^ 0x80) - 0x80)
+#endif
+
+#if HOST_BITS_PER_SHORT == 16
+#define SIGN_EXTEND_16BIT(X) ((HOST_WIDE_INT)(short)(X))
+#else
+#define SIGN_EXTEND_16BIT(X) ((((X) & 0xffff) ^ 0x8000) - 0x8000)
+#endif
+
+#if HOST_BITS_PER_INT == 32
+#define SIGN_EXTEND_32BIT(X) ((HOST_WIDE_INT)(int)(X))
+#else
+#define SIGN_EXTEND_32BIT(X) ((((X) & 0xffffffff) ^ 0x80000000) - 0x80000000)
+#endif
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 2685fa71517..5e282d3741c 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -627,3 +627,7 @@ Enable instructions that guard against return-oriented programming attacks.
 mprivileged
 Target Var(rs6000_privileged) Init(0)
 Generate code that will run in privileged state.
+
+mxxspltiw
+Target Undocumented Mask(XXSPLTIW) Var(rs6000_isa_flags)
+Generate (do not generate) XXSPLTIW instructions.
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 15a8c0e22d8..76e10f73dec 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -386,7 +386,6 @@
    UNSPEC_VDIVES
    UNSPEC_VDIVEU
    UNSPEC_XXEVAL
-   UNSPEC_XXSPLTIW
    UNSPEC_XXSPLTID
    UNSPEC_XXSPLTI32DX
    UNSPEC_XXBLEND
@@ -6239,36 +6238,6 @@
   "vmulld %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
-;; XXSPLTIW built-in function support
-(define_insn "xxspltiw_v4si"
-  [(set (match_operand:V4SI 0 "register_operand" "=wa")
-	(unspec:V4SI [(match_operand:SI 1 "s32bit_cint_operand" "n")]
-		     UNSPEC_XXSPLTIW))]
- "TARGET_POWER10"
- "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")
-  (set_attr "prefixed" "yes")])
-
-(define_expand "xxspltiw_v4sf"
-  [(set (match_operand:V4SF 0 "register_operand" "=wa")
-	(unspec:V4SF [(match_operand:SF 1 "const_double_operand" "n")]
-		     UNSPEC_XXSPLTIW))]
- "TARGET_POWER10"
-{
-  long long value = rs6000_const_f32_to_i32 (operands[1]);
-  emit_insn (gen_xxspltiw_v4sf_inst (operands[0], GEN_INT (value)));
-  DONE;
-})
-
-(define_insn "xxspltiw_v4sf_inst"
-  [(set (match_operand:V4SF 0 "register_operand" "=wa")
-	(unspec:V4SF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
-		     UNSPEC_XXSPLTIW))]
- "TARGET_POWER10"
- "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")
-  (set_attr "prefixed" "yes")])
-
 ;; XXSPLTIDP built-in function support
 (define_expand "xxspltidp_v2df"
   [(set (match_operand:V2DF 0 "register_operand" )
@@ -6420,3 +6389,118 @@
    [(set_attr "type" "vecsimple")
     (set_attr "prefixed" "yes")])
 
+;; XXSPLTIW built-in function support.  Convert to a vector constant, which
+;; will then be optimized to the XXSPLTIW instruction.
+(define_expand "xxspltiw_v4si"
+  [(use (match_operand:V4SI 0 "register_operand"))
+   (use (match_operand:SI 1 "s32bit_cint_operand"))]
+  "TARGET_POWER10"
+{
+  rtx op1 = operands[1];
+  rtvec rv = gen_rtvec (4, op1, op1, op1, op1);
+  rtx vec_constant = gen_rtx_CONST_VECTOR (V4SImode, rv);
+  emit_move_insn (operands[0], vec_constant);
+})
+
+(define_expand "xxspltiw_v4sf"
+  [(use (match_operand:V4SF 0 "register_operand"))
+   (use (match_operand:SF 1 "const_double_operand"))]
+  "TARGET_POWER10"
+{
+  rtx op1 = operands[1];
+  rtvec rv = gen_rtvec (4, op1, op1, op1, op1);
+  rtx vec_constant = gen_rtx_CONST_VECTOR (V4SFmode, rv);
+  emit_move_insn (operands[0], vec_constant);
+})
+
+;; XXSPLTIW support.  Add support for the XXSPLTIW built-in functions, and to
+;; use XXSPLTIW to load up vector V8HImode, V4SImode, and V4SFmode vector
+;; constants where all elements are the the same.  We special case loading up
+;; integer -16..15 and floating point 0.0f, since we can use the shorter
+;; XXSPLTIB, VSPLTISH, and VSPLTISW instructions.
+
+(define_insn "*xxspltiw_v8hi_dup"
+  [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa,wa,v,wa")
+	(vec_duplicate:V8HI
+	 (match_operand 1 "const_int_operand" "O,wM,wB,n")))]
+ "TARGET_XXSPLTIW"
+{
+  HOST_WIDE_INT sign_value = SIGN_EXTEND_16BIT (INTVAL (operands[1]));
+
+  if (sign_value == 0)
+    return "xxspltib %x0,0";
+
+  if (sign_value == -1)
+    return "xxspltib %x0,255";
+
+  int r = reg_or_subregno (operands[0]);
+  if (ALTIVEC_REGNO_P (r) && EASY_VECTOR_15 (sign_value))
+    {
+      operands[2] = GEN_INT (sign_value);
+      return "vspltish %0,%1";
+    }
+
+  HOST_WIDE_INT uns_value = sign_value & 0xffff;
+  operands[2] = GEN_INT ((uns_value << 16) | uns_value);
+  return "xxspltiw %x0,%2";
+}
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "*,*,*,yes")])
+
+(define_insn "*xxspltiw_v4si_dup"
+  [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa,wa,v,wa")
+	(vec_duplicate:V4SI
+	 (match_operand 1 "const_int_operand" "O,wM,wB,n")))]
+ "TARGET_XXSPLTIW"
+{
+  HOST_WIDE_INT sign_value = SIGN_EXTEND_32BIT (INTVAL (operands[1]));
+
+  if (sign_value == 0)
+    return "xxspltib %x0,0";
+
+  if (sign_value == -1)
+    return "xxspltib %x0,255";
+
+  int r = reg_or_subregno (operands[0]);
+  if (ALTIVEC_REGNO_P (r) && EASY_VECTOR_15 (sign_value))
+    {
+      operands[2] = GEN_INT (sign_value);
+      return "vspltisw %0,%2";
+    }
+
+  /* The assembler doesn't like negative values.  */
+  operands[2] = GEN_INT (sign_value & 0xffffffff);
+  return "xxspltiw %x0,%2";
+}
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "*,*,*,yes")])
+
+(define_insn "xxspltiw_v4sf_dup"
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
+	(vec_duplicate:V4SF
+	 (match_operand:SF 1 "const_double_operand" "O,F")))]
+ "TARGET_XXSPLTIW"
+{
+  if (operands[1] == CONST0_RTX (SFmode))
+    return "xxspltib %x0,0";
+
+  /* The assembler doesn't like negative values.  */
+  long value = rs6000_const_f32_to_i32 (operands[1]);
+  operands[2] = GEN_INT (value & 0xffffffff);
+  return "xxspltiw %x0,%2";
+}
+ [(set_attr "type" "vecsimple")
+  (set_attr "prefixed" "*,yes")])
+
+;; Convert vector constant to vec_duplicate.
+(define_mode_iterator XXSPLTIW [V8HI V4SI V4SF])
+
+(define_split
+  [(set (match_operand:XXSPLTIW 0 "vsx_register_operand")
+	(match_operand:XXSPLTIW 1 "xxspltiw_operand"))]
+  "TARGET_XXSPLTIW && GET_CODE (operands[1]) == CONST_VECTOR"
+  [(set (match_dup 0)
+	(vec_duplicate:<MODE> (match_dup 2)))]
+{
+  operands[2] = CONST_VECTOR_ELT (operands[1], 0);
+})
diff --git a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv.c b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv.c
index f312550f04d..8a00aaca70d 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv.c
@@ -8,6 +8,13 @@
 /* { dg-require-effective-target lp64 } */
 /* { dg-options "-maltivec -O3 -fwrapv " } */
 
+/* If the compiler is generating power10 instructions, turn it off.  Otherwise,
+   it will generate a XXSPLTIW instruction instead of LXV/LXVD2X.  */
+
+#ifdef _ARCH_PWR10
+#pragma GCC target ("cpu=power9")
+#endif
+
 #include <altivec.h>
 /* original test as reported.  */
 vector unsigned int splat(void)
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
new file mode 100644
index 00000000000..06830b02076
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
@@ -0,0 +1,66 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SF vector constants.  */
+
+vector float
+v4sf_const_1 (void)
+{
+  return (vector float) { 1.0f, 1.0f, 1.0f, 1.0f };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_nan (void)
+{
+  return (vector float) { __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf ("") };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_inf (void)
+{
+  return (vector float) { __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff () };		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_m0 (void)
+{
+  return (vector float) { -0.0f, -0.0f, -0.0f, -0.0f };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_1 (void)
+{
+  return vec_splats (1.0f);				/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_nan (void)
+{
+  return vec_splats (__builtin_nanf (""));		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_inf (void)
+{
+  return vec_splats (__builtin_inff ());		/* XXSPLTIW.  */
+}
+
+vector float
+v8hi_splats_m0 (void)
+{
+  return vec_splats (-0.0f);				/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  8 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
new file mode 100644
index 00000000000..02d0c6d66a2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VEXTSB2W) is not done.  */
+
+vector int
+v4si_const_1 (void)
+{
+  return (vector int) { 1, 1, 1, 1 };			/* VSLTPISW.  */
+}
+
+vector int
+v4si_const_126 (void)
+{
+  return (vector int) { 126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector int
+v4si_const_1023 (void)
+{
+  return (vector int) { 1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector int
+v4si_splats_1 (void)
+{
+  return vec_splats (1);				/* VSLTPISW.  */
+}
+
+vector int
+v4si_splats_126 (void)
+{
+  return vec_splats (126);				/* XXSPLTIW.  */
+}
+
+vector int
+v8hi_splats_1023 (void)
+{
+  return vec_splats (1023);				/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mvspltisw\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvextsb2w\M}    } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
new file mode 100644
index 00000000000..e6d0fab6d67
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
@@ -0,0 +1,53 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V8HI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VUPKLSB) is not done.  */
+
+vector short
+v8hi_const_1 (void)
+{
+  return (vector short) { 1, 1, 1, 1, 1, 1, 1, 1 };	/* VSLTPISH.  */
+}
+
+vector short
+v8hi_const_126 (void)
+{
+  return (vector short) { 126, 126, 126, 126,
+			  126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_const_1023 (void)
+{
+  return (vector short) { 1023, 1023, 1023, 1023,
+			  1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1 (void)
+{
+  return vec_splats ((short)1);				/* VSLTPISH.  */
+}
+
+vector short
+v8hi_splats_126 (void)
+{
+  return vec_splats ((short)126);			/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1023 (void)
+{
+  return vec_splats ((short)1023);			/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mvspltish\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvupklsb\M}     } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
index a135279b1d7..f49ef91422e 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -149,8 +149,6 @@ main (int argc, char *argv [])
   return 0;
 }
 
-/* { dg-final { scan-assembler-times {\mxxspltiw\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 1 } } */
 /* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */
-
-


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [gcc(refs/users/meissner/heads/work053)] Generate XXSPLTIW on power10.
@ 2021-05-19 20:01 Michael Meissner
  0 siblings, 0 replies; 7+ messages in thread
From: Michael Meissner @ 2021-05-19 20:01 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:3f21da04e6957c4783f4219a7d392cceb3f2e435

commit 3f21da04e6957c4783f4219a7d392cceb3f2e435
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Wed May 19 16:01:08 2021 -0400

    Generate XXSPLTIW on power10.
    
    This patch adds support to automatically generate the ISA 3.1 XXSPLTIW
    instruction for V8HImode, V4SImode, and V4SFmode vectors.  It does this by
    adding support for vector constants that can be used, and adding a
    VEC_DUPLICATE pattern to generate the actual XXSPLTIW instruction.
    
    I rewrote the XXSPLTW built-in functions to use VEC_DUPLICATE instead of
    UNSPEC.
    
    This patch also updates the insn counts in the vec-splati-runnable.c test to
    work with the new option to use XXSPLTIW to load up some vector constants.
    
    I added 3 new tests to test loading up V8HI, V4SI, and V4SF vector
    constants.
    
    1 test needed to turn off power10 code generation so that the expected
    instructions would be correct.
    
    gcc/
    2021-05-19  Michael Meissner  <meissner@linux.ibm.com>
    
            * config/rs6000/predicates.md (xxspltiw_operand): New predicate.
            (easy_vector_constant): If we can use XXSPLTIW, the vector
            constant is easy.
            * config/rs6000/rs6000-cpus.def (ISA_3_1_MASKS_SERVER): Add
            -mxxspltiw support.
            (POWERPC_MASKS): Add -mxxspltiw support.
            * config/rs6000/rs6000-protos.h (sign_extend_mode_constant): New
            declaration.
            (zero_extend_mode_constant); New declaration.
            * config/rs6000/rs6000.c (rs6000_option_override_internal): Add
            -mxxspltiw support.
            (xxspltib_constant_p): If we can generate XXSPLTIW, don't generate
            a XXSPLTIB and an extend instruction.
            (output_vec_const_move): Add support for loading up vector
            constants with XXSPLTIW.
            (rs6000_opt_masks): Add -mxxspltiw.
            (sign_extend_mode_constant): New function.
            (zero_extend_mode_constant): New function.
            * config/rs6000/rs6000.opt (-mxxspltiw): New debug switch.
            * config/rs6000/vsx.md (UNSPEC_XXSPLTIW): Delete.
            (xxspltiw_v8hi): New insn.
            (xxspltiw_v4si): Rewrite to generate a vector constant.
            (xxspltiw_v4sf): Rewrite to generate a vector constant.
            (xxspltiw_v4si_inst): Delete.
            (xxspltiw_v4sf_inst): Delete.
            (xxspltiw_v8hi_dup): New insn.
            (xxspltiw_v4si_dup): New insn.
            (xxspltiw_v4sf_dup): New insn.
            (XXSPLTIW): New mode iterator.
            (XXSPLTIW splitter): New insn splitter for XXSPLTIW.
    
    gcc/testsuite/
    2021-05-19  Michael Meissner  <meissner@linux.ibm.com>
    
            * gcc.target/powerpc/pr86731-fwrapv.c: Turn off power10 code
            generation.
            * gcc.target/powerpc/vec-splati-runnable.c: Update insn counts.
            * gcc.target/powerpc/vec-splat-constant-v4sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4si.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v8hi.c: New test.

Diff:
---
 gcc/config/rs6000/predicates.md                    |  29 ++++
 gcc/config/rs6000/rs6000-cpus.def                  |   7 +-
 gcc/config/rs6000/rs6000-protos.h                  |   3 +
 gcc/config/rs6000/rs6000.c                         |  73 +++++++++-
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           | 151 ++++++++++++++++-----
 gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv.c  |   7 +
 .../gcc.target/powerpc/vec-splat-constant-v4sf.c   |  66 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-v4si.c   |  51 +++++++
 .../gcc.target/powerpc/vec-splat-constant-v8hi.c   |  53 ++++++++
 .../gcc.target/powerpc/vec-splati-runnable.c       |   4 +-
 11 files changed, 408 insertions(+), 40 deletions(-)

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index e21bc745f72..bf678f429af 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -640,6 +640,32 @@
   return num_insns == 1;
 })
 
+;; Return 1 if the operand is a CONST_VECTOR that can be loaded with the
+;; XXSPLTIW instruction.  Do not return 1 if the constant can be generated with
+;; XXSPLTIB or VSPLTIS{H,W}
+(define_predicate "xxspltiw_operand"
+  (match_code "const_vector")
+{
+  if (!TARGET_XXSPLTIW)
+    return false;
+
+  if (mode != V8HImode && mode != V4SImode && mode != V4SFmode)
+    return false;
+
+  rtx element = CONST_VECTOR_ELT (op, 0);
+  for (size_t i = 1; i < GET_MODE_NUNITS (mode); i++)
+    if (!rtx_equal_p (element, CONST_VECTOR_ELT (op, i)))
+      return false;
+
+  if (element == CONST0_RTX (GET_MODE_INNER (mode)))
+    return false;
+
+  if (CONST_INT_P (element) && EASY_VECTOR_15 (INTVAL (element)))
+    return false;
+
+  return true;
+})
+
 ;; Return 1 if the operand is a CONST_VECTOR and can be loaded into a
 ;; vector register without using memory.
 (define_predicate "easy_vector_constant"
@@ -653,6 +679,9 @@
       if (zero_constant (op, mode) || all_ones_constant (op, mode))
 	return true;
 
+      if (xxspltiw_operand (op, mode))
+	return true;
+
       if (TARGET_P9_VECTOR
           && xxspltib_constant_p (op, mode, &num_insns, &value))
 	return true;
diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def
index cbbb42c1b3a..a21a95bc7aa 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -85,7 +85,8 @@
 				 | OTHER_POWER10_MASKS			\
 				 | OPTION_MASK_P10_FUSION		\
 				 | OPTION_MASK_P10_FUSION_LD_CMPI	\
-				 | OPTION_MASK_P10_FUSION_2LOGICAL)
+				 | OPTION_MASK_P10_FUSION_2LOGICAL	\
+				 | OPTION_MASK_XXSPLTIW)
 
 /* Flags that need to be turned off if -mno-power9-vector.  */
 #define OTHER_P9_VECTOR_MASKS	(OPTION_MASK_FLOAT128_HW		\
@@ -160,8 +161,8 @@
 				 | OPTION_MASK_RECIP_PRECISION		\
 				 | OPTION_MASK_SOFT_FLOAT		\
 				 | OPTION_MASK_STRICT_ALIGN_OPTIONAL	\
-				 | OPTION_MASK_VSX)
-
+				 | OPTION_MASK_VSX			\
+				 | OPTION_MASK_XXSPLTIW)
 #endif
 
 /* This table occasionally claims that a processor does not support a
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index c407034d58c..117724be1ff 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -283,6 +283,9 @@ extern void rs6000_asm_output_dwarf_pcrel (FILE *file, int size,
 extern void rs6000_asm_output_dwarf_datarel (FILE *file, int size,
 					     const char *label);
 extern long rs6000_const_f32_to_i32 (rtx operand);
+extern HOST_WIDE_INT sign_extend_mode_constant (machine_mode, HOST_WIDE_INT);
+extern unsigned HOST_WIDE_INT zero_extend_mode_constant (machine_mode,
+							 HOST_WIDE_INT);
 
 /* Declare functions in rs6000-c.c */
 
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index ef1ebaaee05..1b65e729017 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4487,6 +4487,12 @@ rs6000_option_override_internal (bool global_init_p)
   if (!TARGET_PCREL && TARGET_PCREL_OPT)
     rs6000_isa_flags &= ~OPTION_MASK_PCREL_OPT;
 
+  if (TARGET_POWER10 && TARGET_VSX
+      && (rs6000_isa_flags_explicit & OPTION_MASK_XXSPLTIW) == 0)
+    rs6000_isa_flags |= OPTION_MASK_XXSPLTIW;
+  else if (!TARGET_POWER10 || !TARGET_VSX)
+    rs6000_isa_flags &= ~OPTION_MASK_XXSPLTIW;
+
   if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET)
     rs6000_print_isa_options (stderr, 0, "after subtarget", rs6000_isa_flags);
 
@@ -6464,9 +6470,11 @@ xxspltib_constant_p (rtx op,
 
   /* See if we could generate vspltisw/vspltish directly instead of xxspltib +
      sign extend.  Special case 0/-1 to allow getting any VSX register instead
-     of an Altivec register.  */
-  if ((mode == V4SImode || mode == V8HImode) && !IN_RANGE (value, -1, 0)
-      && EASY_VECTOR_15 (value))
+     of an Altivec register.  Also if we can generate a XXSPLTIW instruction,
+     don't emit a XXSPLTIB and an extend instruction.  */
+  if ((mode == V4SImode || mode == V8HImode)
+      && !IN_RANGE (value, -1, 0)
+      && (EASY_VECTOR_15 (value) || TARGET_XXSPLTIW))
     return false;
 
   /* Return # of instructions and the constant byte for XXSPLTIB.  */
@@ -6527,6 +6535,9 @@ output_vec_const_move (rtx *operands)
 	    gcc_unreachable ();
 	}
 
+      if (xxspltiw_operand (vec, mode))
+	return "#";
+
       if (TARGET_P9_VECTOR
 	  && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
 	{
@@ -24118,6 +24129,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] =
   { "string",			0,				false, true  },
   { "update",			OPTION_MASK_NO_UPDATE,		true , true  },
   { "vsx",			OPTION_MASK_VSX,		false, true  },
+  { "xxspltiw",			OPTION_MASK_XXSPLTIW,		false, true  },
 #ifdef OPTION_MASK_64BIT
 #if TARGET_AIX_OS
   { "aix64",			OPTION_MASK_64BIT,		false, false },
@@ -27980,6 +27992,61 @@ rs6000_output_addr_vec_elt (FILE *file, int value)
   fprintf (file, "\n");
 }
 
+/* Sign extend integer values to a given mode.  */
+HOST_WIDE_INT
+sign_extend_mode_constant (machine_mode mode, HOST_WIDE_INT value)
+{
+  HOST_WIDE_INT mask;
+
+  switch (mode)
+    {
+    default:
+      gcc_unreachable ();
+
+    case E_QImode:
+      mask = HOST_WIDE_INT_C (0x80);
+      break;
+
+    case E_HImode:
+      mask = HOST_WIDE_INT_C (0x8000);
+      break;
+
+    case E_SImode:
+      mask = HOST_WIDE_INT_C (0x80000000);
+      break;
+    }
+
+  return ((value ^ mask) - mask);
+}
+
+/* Zero extend integer values to a given mode.  */
+unsigned HOST_WIDE_INT
+zero_extend_mode_constant (machine_mode mode, HOST_WIDE_INT value)
+{
+  unsigned HOST_WIDE_INT uvalue = (unsigned HOST_WIDE_INT) value;
+  unsigned HOST_WIDE_INT mask;
+
+  switch (mode)
+    {
+    default:
+      gcc_unreachable ();
+
+    case E_QImode:
+      mask = HOST_WIDE_INT_UC (0xff);
+      break;
+
+    case E_HImode:
+      mask = HOST_WIDE_INT_UC (0xffff);
+      break;
+
+    case E_SImode:
+      mask = HOST_WIDE_INT_UC (0xffffffff);
+      break;
+    }
+
+  return uvalue & mask;
+}
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-rs6000.h"
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 2685fa71517..5e282d3741c 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -627,3 +627,7 @@ Enable instructions that guard against return-oriented programming attacks.
 mprivileged
 Target Var(rs6000_privileged) Init(0)
 Generate code that will run in privileged state.
+
+mxxspltiw
+Target Undocumented Mask(XXSPLTIW) Var(rs6000_isa_flags)
+Generate (do not generate) XXSPLTIW instructions.
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 15a8c0e22d8..a6218d4d8e1 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -386,7 +386,6 @@
    UNSPEC_VDIVES
    UNSPEC_VDIVEU
    UNSPEC_XXEVAL
-   UNSPEC_XXSPLTIW
    UNSPEC_XXSPLTID
    UNSPEC_XXSPLTI32DX
    UNSPEC_XXBLEND
@@ -6239,36 +6238,6 @@
   "vmulld %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
-;; XXSPLTIW built-in function support
-(define_insn "xxspltiw_v4si"
-  [(set (match_operand:V4SI 0 "register_operand" "=wa")
-	(unspec:V4SI [(match_operand:SI 1 "s32bit_cint_operand" "n")]
-		     UNSPEC_XXSPLTIW))]
- "TARGET_POWER10"
- "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")
-  (set_attr "prefixed" "yes")])
-
-(define_expand "xxspltiw_v4sf"
-  [(set (match_operand:V4SF 0 "register_operand" "=wa")
-	(unspec:V4SF [(match_operand:SF 1 "const_double_operand" "n")]
-		     UNSPEC_XXSPLTIW))]
- "TARGET_POWER10"
-{
-  long long value = rs6000_const_f32_to_i32 (operands[1]);
-  emit_insn (gen_xxspltiw_v4sf_inst (operands[0], GEN_INT (value)));
-  DONE;
-})
-
-(define_insn "xxspltiw_v4sf_inst"
-  [(set (match_operand:V4SF 0 "register_operand" "=wa")
-	(unspec:V4SF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
-		     UNSPEC_XXSPLTIW))]
- "TARGET_POWER10"
- "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")
-  (set_attr "prefixed" "yes")])
-
 ;; XXSPLTIDP built-in function support
 (define_expand "xxspltidp_v2df"
   [(set (match_operand:V2DF 0 "register_operand" )
@@ -6420,3 +6389,123 @@
    [(set_attr "type" "vecsimple")
     (set_attr "prefixed" "yes")])
 
+;; XXSPLTIW built-in function support.  Convert to a vector constant, which
+;; will then be optimized to the XXSPLTIW instruction.
+(define_expand "xxspltiw_v4si"
+  [(use (match_operand:V4SI 0 "register_operand"))
+   (use (match_operand:SI 1 "s32bit_cint_operand"))]
+  "TARGET_POWER10"
+{
+  rtx op1 = operands[1];
+  rtvec rv = gen_rtvec (4, op1, op1, op1, op1);
+  rtx vec_constant = gen_rtx_CONST_VECTOR (V4SImode, rv);
+  emit_move_insn (operands[0], vec_constant);
+})
+
+(define_expand "xxspltiw_v4sf"
+  [(use (match_operand:V4SF 0 "register_operand"))
+   (use (match_operand:SF 1 "const_double_operand"))]
+  "TARGET_POWER10"
+{
+  rtx op1 = operands[1];
+  rtvec rv = gen_rtvec (4, op1, op1, op1, op1);
+  rtx vec_constant = gen_rtx_CONST_VECTOR (V4SFmode, rv);
+  emit_move_insn (operands[0], vec_constant);
+})
+
+;; XXSPLTIW support.  Add support for the XXSPLTIW built-in functions, and to
+;; use XXSPLTIW to load up vector V8HImode, V4SImode, and V4SFmode vector
+;; constants where all elements are the the same.  We special case loading up
+;; integer -16..15 and floating point 0.0f, since we can use the shorter
+;; XXSPLTIB, VSPLTISH, and VSPLTISW instructions.
+
+(define_insn "*xxspltiw_v8hi_dup"
+  [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa,wa,v,wa")
+	(vec_duplicate:V8HI
+	 (match_operand 1 "const_int_operand" "O,wM,wB,n")))]
+ "TARGET_XXSPLTIW"
+{
+  HOST_WIDE_INT value = INTVAL (operands[1]);
+  unsigned HOST_WIDE_INT uns_value = zero_extend_mode_constant (HImode, value);
+  HOST_WIDE_INT sign_value = sign_extend_mode_constant (HImode, uns_value);
+
+  if (sign_value == 0)
+    return "xxspltib %x0,0";
+
+  if (sign_value == -1)
+    return "xxspltib %x0,255";
+
+  int r = reg_or_subregno (operands[0]);
+  if (ALTIVEC_REGNO_P (r) && EASY_VECTOR_15 (sign_value))
+    {
+      operands[2] = GEN_INT (sign_value);
+      return "vspltish %0,%1";
+    }
+
+  /* The assembler doesn't like negative values.  */
+  HOST_WIDE_INT new_value = (uns_value << 16) | uns_value;
+  operands[2] = GEN_INT (zero_extend_mode_constant (SImode, new_value));
+  return "xxspltiw %x0,%2";
+}
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "*,*,*,yes")])
+
+(define_insn "*xxspltiw_v4si_dup"
+  [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa,wa,v,wa")
+	(vec_duplicate:V4SI
+	 (match_operand 1 "const_int_operand" "O,wM,wB,n")))]
+ "TARGET_XXSPLTIW"
+{
+  HOST_WIDE_INT value = INTVAL (operands[1]);
+  unsigned HOST_WIDE_INT uns_value = zero_extend_mode_constant (SImode, value);
+  HOST_WIDE_INT sign_value = sign_extend_mode_constant (SImode, uns_value);
+
+  if (sign_value == 0)
+    return "xxspltib %x0,0";
+
+  if (sign_value == -1)
+    return "xxspltib %x0,255";
+
+  int r = reg_or_subregno (operands[0]);
+  if (ALTIVEC_REGNO_P (r) && EASY_VECTOR_15 (sign_value))
+    {
+      operands[2] = GEN_INT (sign_value);
+      return "vspltisw %0,%2";
+    }
+
+  /* The assembler doesn't like negative values.  */
+  operands[2] = GEN_INT (uns_value);
+  return "xxspltiw %x0,%2";
+}
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "*,*,*,yes")])
+
+(define_insn "xxspltiw_v4sf_dup"
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
+	(vec_duplicate:V4SF
+	 (match_operand:SF 1 "const_double_operand" "O,F")))]
+ "TARGET_XXSPLTIW"
+{
+  if (operands[1] == CONST0_RTX (SFmode))
+    return "xxspltib %x0,0";
+
+  /* The assembler doesn't like negative values.  */
+  long value = rs6000_const_f32_to_i32 (operands[1]);
+  operands[2] = GEN_INT (zero_extend_mode_constant (SImode, value));
+  return "xxspltiw %x0,%2";
+}
+ [(set_attr "type" "vecsimple")
+  (set_attr "prefixed" "*,yes")])
+
+;; Convert vector constant to vec_duplicate.
+(define_mode_iterator XXSPLTIW [V8HI V4SI V4SF])
+
+(define_split
+  [(set (match_operand:XXSPLTIW 0 "vsx_register_operand")
+	(match_operand:XXSPLTIW 1 "xxspltiw_operand"))]
+  "TARGET_XXSPLTIW && GET_CODE (operands[1]) == CONST_VECTOR"
+  [(set (match_dup 0)
+	(vec_duplicate:<MODE> (match_dup 2)))]
+{
+  operands[2] = CONST_VECTOR_ELT (operands[1], 0);
+})
diff --git a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv.c b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv.c
index f312550f04d..8a00aaca70d 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv.c
@@ -8,6 +8,13 @@
 /* { dg-require-effective-target lp64 } */
 /* { dg-options "-maltivec -O3 -fwrapv " } */
 
+/* If the compiler is generating power10 instructions, turn it off.  Otherwise,
+   it will generate a XXSPLTIW instruction instead of LXV/LXVD2X.  */
+
+#ifdef _ARCH_PWR10
+#pragma GCC target ("cpu=power9")
+#endif
+
 #include <altivec.h>
 /* original test as reported.  */
 vector unsigned int splat(void)
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
new file mode 100644
index 00000000000..06830b02076
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
@@ -0,0 +1,66 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SF vector constants.  */
+
+vector float
+v4sf_const_1 (void)
+{
+  return (vector float) { 1.0f, 1.0f, 1.0f, 1.0f };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_nan (void)
+{
+  return (vector float) { __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf ("") };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_inf (void)
+{
+  return (vector float) { __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff () };		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_m0 (void)
+{
+  return (vector float) { -0.0f, -0.0f, -0.0f, -0.0f };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_1 (void)
+{
+  return vec_splats (1.0f);				/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_nan (void)
+{
+  return vec_splats (__builtin_nanf (""));		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_inf (void)
+{
+  return vec_splats (__builtin_inff ());		/* XXSPLTIW.  */
+}
+
+vector float
+v8hi_splats_m0 (void)
+{
+  return vec_splats (-0.0f);				/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  8 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
new file mode 100644
index 00000000000..02d0c6d66a2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VEXTSB2W) is not done.  */
+
+vector int
+v4si_const_1 (void)
+{
+  return (vector int) { 1, 1, 1, 1 };			/* VSLTPISW.  */
+}
+
+vector int
+v4si_const_126 (void)
+{
+  return (vector int) { 126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector int
+v4si_const_1023 (void)
+{
+  return (vector int) { 1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector int
+v4si_splats_1 (void)
+{
+  return vec_splats (1);				/* VSLTPISW.  */
+}
+
+vector int
+v4si_splats_126 (void)
+{
+  return vec_splats (126);				/* XXSPLTIW.  */
+}
+
+vector int
+v8hi_splats_1023 (void)
+{
+  return vec_splats (1023);				/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mvspltisw\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvextsb2w\M}    } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
new file mode 100644
index 00000000000..e6d0fab6d67
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
@@ -0,0 +1,53 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V8HI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VUPKLSB) is not done.  */
+
+vector short
+v8hi_const_1 (void)
+{
+  return (vector short) { 1, 1, 1, 1, 1, 1, 1, 1 };	/* VSLTPISH.  */
+}
+
+vector short
+v8hi_const_126 (void)
+{
+  return (vector short) { 126, 126, 126, 126,
+			  126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_const_1023 (void)
+{
+  return (vector short) { 1023, 1023, 1023, 1023,
+			  1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1 (void)
+{
+  return vec_splats ((short)1);				/* VSLTPISH.  */
+}
+
+vector short
+v8hi_splats_126 (void)
+{
+  return vec_splats ((short)126);			/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1023 (void)
+{
+  return vec_splats ((short)1023);			/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mvspltish\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvupklsb\M}     } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
index a135279b1d7..f49ef91422e 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -149,8 +149,6 @@ main (int argc, char *argv [])
   return 0;
 }
 
-/* { dg-final { scan-assembler-times {\mxxspltiw\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 1 } } */
 /* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */
-
-


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [gcc(refs/users/meissner/heads/work053)] Generate XXSPLTIW on power10.
@ 2021-05-19 16:38 Michael Meissner
  0 siblings, 0 replies; 7+ messages in thread
From: Michael Meissner @ 2021-05-19 16:38 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:0545b27c5141d8d8468ac5db799c2a55c1457fff

commit 0545b27c5141d8d8468ac5db799c2a55c1457fff
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Wed May 19 12:38:41 2021 -0400

    Generate XXSPLTIW on power10.
    
    This patch adds support to automatically generate the ISA 3.1 XXSPLTIW
    instruction for V8HImode, V4SImode, and V4SFmode vectors.  It does this by
    adding support for vector constants that can be used, and adding a
    VEC_DUPLICATE pattern to generate the actual XXSPLTIW instruction.
    
    I rewrote the XXSPLTW built-in functions to use VEC_DUPLICATE instead of
    UNSPEC.
    
    This patch also updates the insn counts in the vec-splati-runnable.c test to
    work with the new option to use XXSPLTIW to load up some vector constants.
    
    I added 3 new tests to test loading up V8HI, V4SI, and V4SF vector
    constants.
    
    1 test needed to turn off power10 code generation so that the expected
    instructions would be correct.
    
    gcc/
    2021-05-19  Michael Meissner  <meissner@linux.ibm.com>
    
            * config/rs6000/predicates.md (xxspltiw_operand): New predicate.
            (easy_vector_constant): If we can use XXSPLTIW, the vector
            constant is easy.
            * config/rs6000/rs6000-cpus.def (ISA_3_1_MASKS_SERVER): Add
            -mxxspltiw support.
            (POWERPC_MASKS): Add -mxxspltiw support.
            * config/rs6000/rs6000.c (rs6000_option_override_internal): Add
            -mxxspltiw support.
            (xxspltib_constant_p): If we can generate XXSPLTIW, don't generate
            a XXSPLTIB and an extend instruction.
            (output_vec_const_move): Add support for loading up vector
            constants with XXSPLTIW.
            (rs6000_opt_masks): Add -mxxspltiw.
            * config/rs6000/rs6000.h (sign_extend_n_bit): New inline
            function.
            (zero_extend_n_bit): New inline function.
            * config/rs6000/rs6000.opt (-mxxspltiw): New debug switch.
            * config/rs6000/vsx.md (UNSPEC_XXSPLTIW): Delete.
            (xxspltiw_v8hi): New insn.
            (xxspltiw_v4si): Rewrite to generate a vector constant.
            (xxspltiw_v4sf): Rewrite to generate a vector constant.
            (xxspltiw_v4si_inst): Delete.
            (xxspltiw_v4sf_inst): Delete.
            (xxspltiw_v8hi_dup): New insn.
            (xxspltiw_v4si_dup): New insn.
            (xxspltiw_v4sf_dup): New insn.
            (XXSPLTIW): New mode iterator.
            (XXSPLTIW splitter): New insn splitter for XXSPLTIW.
    
    gcc/testsuite/
    2021-05-19  Michael Meissner  <meissner@linux.ibm.com>
    
            * gcc.target/powerpc/pr86731-fwrapv.c: Turn off power10 code
            generation.
            * gcc.target/powerpc/vec-splati-runnable.c: Update insn counts.
            * gcc.target/powerpc/vec-splat-constant-v4sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4si.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v8hi.c: New test.

Diff:
---
 gcc/config/rs6000/predicates.md                    |  29 ++++
 gcc/config/rs6000/rs6000-cpus.def                  |   7 +-
 gcc/config/rs6000/rs6000.c                         |  18 ++-
 gcc/config/rs6000/rs6000.h                         |  23 ++++
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           | 148 ++++++++++++++++-----
 gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv.c  |   7 +
 .../gcc.target/powerpc/vec-splat-constant-v4sf.c   |  66 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-v4si.c   |  51 +++++++
 .../gcc.target/powerpc/vec-splat-constant-v8hi.c   |  53 ++++++++
 .../gcc.target/powerpc/vec-splati-runnable.c       |   4 +-
 11 files changed, 370 insertions(+), 40 deletions(-)

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index e21bc745f72..bf678f429af 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -640,6 +640,32 @@
   return num_insns == 1;
 })
 
+;; Return 1 if the operand is a CONST_VECTOR that can be loaded with the
+;; XXSPLTIW instruction.  Do not return 1 if the constant can be generated with
+;; XXSPLTIB or VSPLTIS{H,W}
+(define_predicate "xxspltiw_operand"
+  (match_code "const_vector")
+{
+  if (!TARGET_XXSPLTIW)
+    return false;
+
+  if (mode != V8HImode && mode != V4SImode && mode != V4SFmode)
+    return false;
+
+  rtx element = CONST_VECTOR_ELT (op, 0);
+  for (size_t i = 1; i < GET_MODE_NUNITS (mode); i++)
+    if (!rtx_equal_p (element, CONST_VECTOR_ELT (op, i)))
+      return false;
+
+  if (element == CONST0_RTX (GET_MODE_INNER (mode)))
+    return false;
+
+  if (CONST_INT_P (element) && EASY_VECTOR_15 (INTVAL (element)))
+    return false;
+
+  return true;
+})
+
 ;; Return 1 if the operand is a CONST_VECTOR and can be loaded into a
 ;; vector register without using memory.
 (define_predicate "easy_vector_constant"
@@ -653,6 +679,9 @@
       if (zero_constant (op, mode) || all_ones_constant (op, mode))
 	return true;
 
+      if (xxspltiw_operand (op, mode))
+	return true;
+
       if (TARGET_P9_VECTOR
           && xxspltib_constant_p (op, mode, &num_insns, &value))
 	return true;
diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def
index cbbb42c1b3a..a21a95bc7aa 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -85,7 +85,8 @@
 				 | OTHER_POWER10_MASKS			\
 				 | OPTION_MASK_P10_FUSION		\
 				 | OPTION_MASK_P10_FUSION_LD_CMPI	\
-				 | OPTION_MASK_P10_FUSION_2LOGICAL)
+				 | OPTION_MASK_P10_FUSION_2LOGICAL	\
+				 | OPTION_MASK_XXSPLTIW)
 
 /* Flags that need to be turned off if -mno-power9-vector.  */
 #define OTHER_P9_VECTOR_MASKS	(OPTION_MASK_FLOAT128_HW		\
@@ -160,8 +161,8 @@
 				 | OPTION_MASK_RECIP_PRECISION		\
 				 | OPTION_MASK_SOFT_FLOAT		\
 				 | OPTION_MASK_STRICT_ALIGN_OPTIONAL	\
-				 | OPTION_MASK_VSX)
-
+				 | OPTION_MASK_VSX			\
+				 | OPTION_MASK_XXSPLTIW)
 #endif
 
 /* This table occasionally claims that a processor does not support a
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index ef1ebaaee05..f0984e9fec5 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4487,6 +4487,12 @@ rs6000_option_override_internal (bool global_init_p)
   if (!TARGET_PCREL && TARGET_PCREL_OPT)
     rs6000_isa_flags &= ~OPTION_MASK_PCREL_OPT;
 
+  if (TARGET_POWER10 && TARGET_VSX
+      && (rs6000_isa_flags_explicit & OPTION_MASK_XXSPLTIW) == 0)
+    rs6000_isa_flags |= OPTION_MASK_XXSPLTIW;
+  else if (!TARGET_POWER10 || !TARGET_VSX)
+    rs6000_isa_flags &= ~OPTION_MASK_XXSPLTIW;
+
   if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET)
     rs6000_print_isa_options (stderr, 0, "after subtarget", rs6000_isa_flags);
 
@@ -6464,9 +6470,11 @@ xxspltib_constant_p (rtx op,
 
   /* See if we could generate vspltisw/vspltish directly instead of xxspltib +
      sign extend.  Special case 0/-1 to allow getting any VSX register instead
-     of an Altivec register.  */
-  if ((mode == V4SImode || mode == V8HImode) && !IN_RANGE (value, -1, 0)
-      && EASY_VECTOR_15 (value))
+     of an Altivec register.  Also if we can generate a XXSPLTIW instruction,
+     don't emit a XXSPLTIB and an extend instruction.  */
+  if ((mode == V4SImode || mode == V8HImode)
+      && !IN_RANGE (value, -1, 0)
+      && (EASY_VECTOR_15 (value) || TARGET_XXSPLTIW))
     return false;
 
   /* Return # of instructions and the constant byte for XXSPLTIB.  */
@@ -6527,6 +6535,9 @@ output_vec_const_move (rtx *operands)
 	    gcc_unreachable ();
 	}
 
+      if (xxspltiw_operand (vec, mode))
+	return "#";
+
       if (TARGET_P9_VECTOR
 	  && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
 	{
@@ -24118,6 +24129,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] =
   { "string",			0,				false, true  },
   { "update",			OPTION_MASK_NO_UPDATE,		true , true  },
   { "vsx",			OPTION_MASK_VSX,		false, true  },
+  { "xxspltiw",			OPTION_MASK_XXSPLTIW,		false, true  },
 #ifdef OPTION_MASK_64BIT
 #if TARGET_AIX_OS
   { "aix64",			OPTION_MASK_64BIT,		false, false },
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index b0e1953aeae..61442855f27 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2623,3 +2623,26 @@ while (0)
        rs6000_asm_output_opcode (STREAM);				\
     }									\
   while (0)
+
+#ifdef HOST_WIDE_INT
+/* Sign extend integer values to a given bit size.  */
+static inline HOST_WIDE_INT
+sign_extend_n_bit (int n_bits, HOST_WIDE_INT value)
+{
+  unsigned HOST_WIDE_INT uvalue = (unsigned HOST_WIDE_INT) value;
+  unsigned HOST_WIDE_INT mask = HOST_WIDE_INT_1U << (n_bits - 1);
+
+  return (HOST_WIDE_INT) ((uvalue ^ mask) - mask);
+}
+
+/* Zero extend integer values to a given bit size.  */
+static inline unsigned HOST_WIDE_INT
+zero_extend_n_bit (int n_bits, HOST_WIDE_INT value)
+{
+  unsigned HOST_WIDE_INT uvalue = (unsigned HOST_WIDE_INT) value;
+  unsigned HOST_WIDE_INT mask
+    = (HOST_WIDE_INT_1U << (n_bits + 1)) - HOST_WIDE_INT_1U;
+
+  return uvalue & mask;
+}
+#endif
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 2685fa71517..5e282d3741c 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -627,3 +627,7 @@ Enable instructions that guard against return-oriented programming attacks.
 mprivileged
 Target Var(rs6000_privileged) Init(0)
 Generate code that will run in privileged state.
+
+mxxspltiw
+Target Undocumented Mask(XXSPLTIW) Var(rs6000_isa_flags)
+Generate (do not generate) XXSPLTIW instructions.
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 15a8c0e22d8..71894045a54 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -386,7 +386,6 @@
    UNSPEC_VDIVES
    UNSPEC_VDIVEU
    UNSPEC_XXEVAL
-   UNSPEC_XXSPLTIW
    UNSPEC_XXSPLTID
    UNSPEC_XXSPLTI32DX
    UNSPEC_XXBLEND
@@ -6239,36 +6238,6 @@
   "vmulld %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
-;; XXSPLTIW built-in function support
-(define_insn "xxspltiw_v4si"
-  [(set (match_operand:V4SI 0 "register_operand" "=wa")
-	(unspec:V4SI [(match_operand:SI 1 "s32bit_cint_operand" "n")]
-		     UNSPEC_XXSPLTIW))]
- "TARGET_POWER10"
- "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")
-  (set_attr "prefixed" "yes")])
-
-(define_expand "xxspltiw_v4sf"
-  [(set (match_operand:V4SF 0 "register_operand" "=wa")
-	(unspec:V4SF [(match_operand:SF 1 "const_double_operand" "n")]
-		     UNSPEC_XXSPLTIW))]
- "TARGET_POWER10"
-{
-  long long value = rs6000_const_f32_to_i32 (operands[1]);
-  emit_insn (gen_xxspltiw_v4sf_inst (operands[0], GEN_INT (value)));
-  DONE;
-})
-
-(define_insn "xxspltiw_v4sf_inst"
-  [(set (match_operand:V4SF 0 "register_operand" "=wa")
-	(unspec:V4SF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
-		     UNSPEC_XXSPLTIW))]
- "TARGET_POWER10"
- "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")
-  (set_attr "prefixed" "yes")])
-
 ;; XXSPLTIDP built-in function support
 (define_expand "xxspltidp_v2df"
   [(set (match_operand:V2DF 0 "register_operand" )
@@ -6420,3 +6389,120 @@
    [(set_attr "type" "vecsimple")
     (set_attr "prefixed" "yes")])
 
+;; XXSPLTIW built-in function support.  Convert to a vector constant, which
+;; will then be optimized to the XXSPLTIW instruction.
+(define_expand "xxspltiw_v4si"
+  [(use (match_operand:V4SI 0 "register_operand"))
+   (use (match_operand:SI 1 "s32bit_cint_operand"))]
+  "TARGET_POWER10"
+{
+  rtx op1 = operands[1];
+  rtvec rv = gen_rtvec (4, op1, op1, op1, op1);
+  rtx vec_constant = gen_rtx_CONST_VECTOR (V4SImode, rv);
+  emit_move_insn (operands[0], vec_constant);
+})
+
+(define_expand "xxspltiw_v4sf"
+  [(use (match_operand:V4SF 0 "register_operand"))
+   (use (match_operand:SF 1 "const_double_operand"))]
+  "TARGET_POWER10"
+{
+  rtx op1 = operands[1];
+  rtvec rv = gen_rtvec (4, op1, op1, op1, op1);
+  rtx vec_constant = gen_rtx_CONST_VECTOR (V4SFmode, rv);
+  emit_move_insn (operands[0], vec_constant);
+})
+
+;; XXSPLTIW support.  Add support for the XXSPLTIW built-in functions, and to
+;; use XXSPLTIW to load up vector V8HImode, V4SImode, and V4SFmode vector
+;; constants where all elements are the the same.  We special case loading up
+;; integer -16..15 and floating point 0.0f, since we can use the shorter
+;; XXSPLTIB, VSPLTISH, and VSPLTISW instructions.
+
+(define_insn "*xxspltiw_v8hi_dup"
+  [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa,wa,v,wa")
+	(vec_duplicate:V8HI
+	 (match_operand 1 "const_int_operand" "O,wM,wB,n")))]
+ "TARGET_XXSPLTIW"
+{
+  HOST_WIDE_INT uns_value = zero_extend_n_bit (16, INTVAL (operands[1]));
+  HOST_WIDE_INT sign_value = sign_extend_n_bit (16, uns_value);
+
+  if (sign_value == 0)
+    return "xxspltib %x0,0";
+
+  if (sign_value == -1)
+    return "xxspltib %x0,255";
+
+  int r = reg_or_subregno (operands[0]);
+  if (ALTIVEC_REGNO_P (r) && EASY_VECTOR_15 (sign_value))
+    {
+      operands[2] = GEN_INT (sign_value);
+      return "vspltish %0,%1";
+    }
+
+  operands[2] = GEN_INT ((HOST_WIDE_INT) ((uns_value << 16) | uns_value));
+  return "xxspltiw %x0,%2";
+}
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "*,*,*,yes")])
+
+(define_insn "*xxspltiw_v4si_dup"
+  [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa,wa,v,wa")
+	(vec_duplicate:V4SI
+	 (match_operand 1 "const_int_operand" "O,wM,wB,n")))]
+ "TARGET_XXSPLTIW"
+{
+  HOST_WIDE_INT sign_value = sign_extend_n_bit (32, INTVAL (operands[1]));
+
+  if (sign_value == 0)
+    return "xxspltib %x0,0";
+
+  if (sign_value == -1)
+    return "xxspltib %x0,255";
+
+  operands[2] = GEN_INT (sign_value);
+  int r = reg_or_subregno (operands[0]);
+  if (ALTIVEC_REGNO_P (r) && EASY_VECTOR_15 (sign_value))
+    return "vspltisw %0,%2";
+
+  return "xxspltiw %x0,%2";
+}
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "*,*,*,yes")])
+
+(define_insn "xxspltiw_v4sf_dup"
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
+	(vec_duplicate:V4SF
+	 (match_operand:SF 1 "const_double_operand" "O,F")))]
+ "TARGET_XXSPLTIW"
+{
+  if (operands[1] == CONST0_RTX (SFmode))
+    return "xxspltib %x0,0";
+
+  operands[2] = GEN_INT (rs6000_const_f32_to_i32 (operands[1]));
+  return "xxspltiw %x0,%2";
+}
+ [(set_attr "type" "vecsimple")
+  (set_attr "prefixed" "*,yes")])
+
+;; Convert vector constant to vec_duplicate.
+(define_mode_iterator XXSPLTIW [V8HI V4SI V4SF])
+
+(define_split
+  [(set (match_operand:XXSPLTIW 0 "vsx_register_operand")
+	(match_operand:XXSPLTIW 1 "xxspltiw_operand"))]
+  "TARGET_XXSPLTIW && GET_CODE (operands[1]) == CONST_VECTOR"
+  [(set (match_dup 0)
+	(vec_duplicate:<MODE> (match_dup 2)))]
+{
+  rtx element = CONST_VECTOR_ELT (operands[1], 0);
+
+  /* Make sure SImode/HImode are canonical signed constants.  */
+  if (<MODE>mode == SImode)
+    operands[2] = GEN_INT (sign_extend_n_bit (32, INTVAL (element)));
+  else if (<MODE>mode == HImode)
+    operands[2] = GEN_INT (sign_extend_n_bit (16, INTVAL (element)));
+  else
+    operands[2] = element;
+})
diff --git a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv.c b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv.c
index f312550f04d..8a00aaca70d 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv.c
@@ -8,6 +8,13 @@
 /* { dg-require-effective-target lp64 } */
 /* { dg-options "-maltivec -O3 -fwrapv " } */
 
+/* If the compiler is generating power10 instructions, turn it off.  Otherwise,
+   it will generate a XXSPLTIW instruction instead of LXV/LXVD2X.  */
+
+#ifdef _ARCH_PWR10
+#pragma GCC target ("cpu=power9")
+#endif
+
 #include <altivec.h>
 /* original test as reported.  */
 vector unsigned int splat(void)
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
new file mode 100644
index 00000000000..06830b02076
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
@@ -0,0 +1,66 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SF vector constants.  */
+
+vector float
+v4sf_const_1 (void)
+{
+  return (vector float) { 1.0f, 1.0f, 1.0f, 1.0f };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_nan (void)
+{
+  return (vector float) { __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf ("") };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_inf (void)
+{
+  return (vector float) { __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff () };		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_m0 (void)
+{
+  return (vector float) { -0.0f, -0.0f, -0.0f, -0.0f };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_1 (void)
+{
+  return vec_splats (1.0f);				/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_nan (void)
+{
+  return vec_splats (__builtin_nanf (""));		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_inf (void)
+{
+  return vec_splats (__builtin_inff ());		/* XXSPLTIW.  */
+}
+
+vector float
+v8hi_splats_m0 (void)
+{
+  return vec_splats (-0.0f);				/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  8 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
new file mode 100644
index 00000000000..02d0c6d66a2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VEXTSB2W) is not done.  */
+
+vector int
+v4si_const_1 (void)
+{
+  return (vector int) { 1, 1, 1, 1 };			/* VSLTPISW.  */
+}
+
+vector int
+v4si_const_126 (void)
+{
+  return (vector int) { 126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector int
+v4si_const_1023 (void)
+{
+  return (vector int) { 1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector int
+v4si_splats_1 (void)
+{
+  return vec_splats (1);				/* VSLTPISW.  */
+}
+
+vector int
+v4si_splats_126 (void)
+{
+  return vec_splats (126);				/* XXSPLTIW.  */
+}
+
+vector int
+v8hi_splats_1023 (void)
+{
+  return vec_splats (1023);				/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mvspltisw\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvextsb2w\M}    } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
new file mode 100644
index 00000000000..e6d0fab6d67
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
@@ -0,0 +1,53 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V8HI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VUPKLSB) is not done.  */
+
+vector short
+v8hi_const_1 (void)
+{
+  return (vector short) { 1, 1, 1, 1, 1, 1, 1, 1 };	/* VSLTPISH.  */
+}
+
+vector short
+v8hi_const_126 (void)
+{
+  return (vector short) { 126, 126, 126, 126,
+			  126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_const_1023 (void)
+{
+  return (vector short) { 1023, 1023, 1023, 1023,
+			  1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1 (void)
+{
+  return vec_splats ((short)1);				/* VSLTPISH.  */
+}
+
+vector short
+v8hi_splats_126 (void)
+{
+  return vec_splats ((short)126);			/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1023 (void)
+{
+  return vec_splats ((short)1023);			/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mvspltish\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvupklsb\M}     } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
index a135279b1d7..f49ef91422e 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -149,8 +149,6 @@ main (int argc, char *argv [])
   return 0;
 }
 
-/* { dg-final { scan-assembler-times {\mxxspltiw\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 1 } } */
 /* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */
-
-


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [gcc(refs/users/meissner/heads/work053)] Generate XXSPLTIW on power10.
@ 2021-05-19 16:20 Michael Meissner
  0 siblings, 0 replies; 7+ messages in thread
From: Michael Meissner @ 2021-05-19 16:20 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:52c2ac10ff21a50ece469a81190ed787be615903

commit 52c2ac10ff21a50ece469a81190ed787be615903
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Wed May 19 12:20:31 2021 -0400

    Generate XXSPLTIW on power10.
    
    This patch adds support to automatically generate the ISA 3.1 XXSPLTIW
    instruction for V8HImode, V4SImode, and V4SFmode vectors.  It does this by
    adding support for vector constants that can be used, and adding a
    VEC_DUPLICATE pattern to generate the actual XXSPLTIW instruction.
    
    I rewrote the XXSPLTW built-in functions to use VEC_DUPLICATE instead of
    UNSPEC.
    
    This patch also updates the insn counts in the vec-splati-runnable.c test to
    work with the new option to use XXSPLTIW to load up some vector constants.
    
    I added 3 new tests to test loading up V8HI, V4SI, and V4SF vector
    constants.
    
    1 test needed to turn off power10 code generation so that the expected
    instructions would be correct.
    
    gcc/
    2021-05-19  Michael Meissner  <meissner@linux.ibm.com>
    
            * config/rs6000/predicates.md (xxspltiw_operand): New predicate.
            (easy_vector_constant): If we can use XXSPLTIW, the vector
            constant is easy.
            * config/rs6000/rs6000-cpus.def (ISA_3_1_MASKS_SERVER): Add
            -mxxspltiw support.
            (POWERPC_MASKS): Add -mxxspltiw support.
            * config/rs6000/rs6000.c (rs6000_option_override_internal): Add
            -mxxspltiw support.
            (xxspltib_constant_p): If we can generate XXSPLTIW, don't generate
            a XXSPLTIB and an extend instruction.
            (output_vec_const_move): Add support for loading up vector
            constants with XXSPLTIW.
            (rs6000_opt_masks): Add -mxxspltiw.
            * config/rs6000/rs6000.h (sign_extend_n_bit): New inline
            function.
            (zero_extend_n_bit): New inline function.
            * config/rs6000/rs6000.opt (-mxxspltiw): New debug switch.
            * config/rs6000/vsx.md (UNSPEC_XXSPLTIW): Delete.
            (xxspltiw_v8hi): New insn.
            (xxspltiw_v4si): Rewrite to generate a vector constant.
            (xxspltiw_v4sf): Rewrite to generate a vector constant.
            (xxspltiw_v4si_inst): Delete.
            (xxspltiw_v4sf_inst): Delete.
            (xxspltiw_v8hi_dup): New insn.
            (xxspltiw_v4si_dup): New insn.
            (xxspltiw_v4sf_dup): New insn.
            (XXSPLTIW): New mode iterator.
            (XXSPLTIW splitter): New insn splitter for XXSPLTIW.
    
    gcc/testsuite/
    2021-05-19  Michael Meissner  <meissner@linux.ibm.com>
    
            * gcc.target/powerpc/pr86731-fwrapv.c: Turn off power10 code
            generation.
            * gcc.target/powerpc/vec-splati-runnable.c: Update insn counts.
            * gcc.target/powerpc/vec-splat-constant-v4sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4si.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v8hi.c: New test.

Diff:
---
 gcc/config/rs6000/predicates.md                    |  29 ++++
 gcc/config/rs6000/rs6000-cpus.def                  |   7 +-
 gcc/config/rs6000/rs6000.c                         |  18 ++-
 gcc/config/rs6000/rs6000.h                         |  21 +++
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           | 148 ++++++++++++++++-----
 gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv.c  |   7 +
 .../gcc.target/powerpc/vec-splat-constant-v4sf.c   |  66 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-v4si.c   |  51 +++++++
 .../gcc.target/powerpc/vec-splat-constant-v8hi.c   |  53 ++++++++
 .../gcc.target/powerpc/vec-splati-runnable.c       |   4 +-
 11 files changed, 368 insertions(+), 40 deletions(-)

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index e21bc745f72..bf678f429af 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -640,6 +640,32 @@
   return num_insns == 1;
 })
 
+;; Return 1 if the operand is a CONST_VECTOR that can be loaded with the
+;; XXSPLTIW instruction.  Do not return 1 if the constant can be generated with
+;; XXSPLTIB or VSPLTIS{H,W}
+(define_predicate "xxspltiw_operand"
+  (match_code "const_vector")
+{
+  if (!TARGET_XXSPLTIW)
+    return false;
+
+  if (mode != V8HImode && mode != V4SImode && mode != V4SFmode)
+    return false;
+
+  rtx element = CONST_VECTOR_ELT (op, 0);
+  for (size_t i = 1; i < GET_MODE_NUNITS (mode); i++)
+    if (!rtx_equal_p (element, CONST_VECTOR_ELT (op, i)))
+      return false;
+
+  if (element == CONST0_RTX (GET_MODE_INNER (mode)))
+    return false;
+
+  if (CONST_INT_P (element) && EASY_VECTOR_15 (INTVAL (element)))
+    return false;
+
+  return true;
+})
+
 ;; Return 1 if the operand is a CONST_VECTOR and can be loaded into a
 ;; vector register without using memory.
 (define_predicate "easy_vector_constant"
@@ -653,6 +679,9 @@
       if (zero_constant (op, mode) || all_ones_constant (op, mode))
 	return true;
 
+      if (xxspltiw_operand (op, mode))
+	return true;
+
       if (TARGET_P9_VECTOR
           && xxspltib_constant_p (op, mode, &num_insns, &value))
 	return true;
diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def
index cbbb42c1b3a..a21a95bc7aa 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -85,7 +85,8 @@
 				 | OTHER_POWER10_MASKS			\
 				 | OPTION_MASK_P10_FUSION		\
 				 | OPTION_MASK_P10_FUSION_LD_CMPI	\
-				 | OPTION_MASK_P10_FUSION_2LOGICAL)
+				 | OPTION_MASK_P10_FUSION_2LOGICAL	\
+				 | OPTION_MASK_XXSPLTIW)
 
 /* Flags that need to be turned off if -mno-power9-vector.  */
 #define OTHER_P9_VECTOR_MASKS	(OPTION_MASK_FLOAT128_HW		\
@@ -160,8 +161,8 @@
 				 | OPTION_MASK_RECIP_PRECISION		\
 				 | OPTION_MASK_SOFT_FLOAT		\
 				 | OPTION_MASK_STRICT_ALIGN_OPTIONAL	\
-				 | OPTION_MASK_VSX)
-
+				 | OPTION_MASK_VSX			\
+				 | OPTION_MASK_XXSPLTIW)
 #endif
 
 /* This table occasionally claims that a processor does not support a
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index ef1ebaaee05..f0984e9fec5 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4487,6 +4487,12 @@ rs6000_option_override_internal (bool global_init_p)
   if (!TARGET_PCREL && TARGET_PCREL_OPT)
     rs6000_isa_flags &= ~OPTION_MASK_PCREL_OPT;
 
+  if (TARGET_POWER10 && TARGET_VSX
+      && (rs6000_isa_flags_explicit & OPTION_MASK_XXSPLTIW) == 0)
+    rs6000_isa_flags |= OPTION_MASK_XXSPLTIW;
+  else if (!TARGET_POWER10 || !TARGET_VSX)
+    rs6000_isa_flags &= ~OPTION_MASK_XXSPLTIW;
+
   if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET)
     rs6000_print_isa_options (stderr, 0, "after subtarget", rs6000_isa_flags);
 
@@ -6464,9 +6470,11 @@ xxspltib_constant_p (rtx op,
 
   /* See if we could generate vspltisw/vspltish directly instead of xxspltib +
      sign extend.  Special case 0/-1 to allow getting any VSX register instead
-     of an Altivec register.  */
-  if ((mode == V4SImode || mode == V8HImode) && !IN_RANGE (value, -1, 0)
-      && EASY_VECTOR_15 (value))
+     of an Altivec register.  Also if we can generate a XXSPLTIW instruction,
+     don't emit a XXSPLTIB and an extend instruction.  */
+  if ((mode == V4SImode || mode == V8HImode)
+      && !IN_RANGE (value, -1, 0)
+      && (EASY_VECTOR_15 (value) || TARGET_XXSPLTIW))
     return false;
 
   /* Return # of instructions and the constant byte for XXSPLTIB.  */
@@ -6527,6 +6535,9 @@ output_vec_const_move (rtx *operands)
 	    gcc_unreachable ();
 	}
 
+      if (xxspltiw_operand (vec, mode))
+	return "#";
+
       if (TARGET_P9_VECTOR
 	  && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
 	{
@@ -24118,6 +24129,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] =
   { "string",			0,				false, true  },
   { "update",			OPTION_MASK_NO_UPDATE,		true , true  },
   { "vsx",			OPTION_MASK_VSX,		false, true  },
+  { "xxspltiw",			OPTION_MASK_XXSPLTIW,		false, true  },
 #ifdef OPTION_MASK_64BIT
 #if TARGET_AIX_OS
   { "aix64",			OPTION_MASK_64BIT,		false, false },
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index b0e1953aeae..aa6d565ebc3 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -2623,3 +2623,24 @@ while (0)
        rs6000_asm_output_opcode (STREAM);				\
     }									\
   while (0)
+
+/* Sign extend integer values to a given bit size.  */
+static inline HOST_WIDE_INT
+sign_extend_n_bit (int n_bits, HOST_WIDE_INT value)
+{
+  unsigned HOST_WIDE_INT uvalue = (unsigned HOST_WIDE_INT) value;
+  unsigned HOST_WIDE_INT mask = HOST_WIDE_INT_1U << (n_bits - 1);
+
+  return (HOST_WIDE_INT) ((uvalue ^ mask) - mask);
+}
+
+/* Zero extend integer values to a given bit size.  */
+static inline unsigned HOST_WIDE_INT
+zero_extend_n_bit (int n_bits, HOST_WIDE_INT value)
+{
+  unsigned HOST_WIDE_INT uvalue = (unsigned HOST_WIDE_INT) value;
+  unsigned HOST_WIDE_INT mask
+    = (HOST_WIDE_INT_1U << (n_bits + 1)) - HOST_WIDE_INT_1U;
+
+  return uvalue & mask;
+}
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 2685fa71517..5e282d3741c 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -627,3 +627,7 @@ Enable instructions that guard against return-oriented programming attacks.
 mprivileged
 Target Var(rs6000_privileged) Init(0)
 Generate code that will run in privileged state.
+
+mxxspltiw
+Target Undocumented Mask(XXSPLTIW) Var(rs6000_isa_flags)
+Generate (do not generate) XXSPLTIW instructions.
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 15a8c0e22d8..71894045a54 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -386,7 +386,6 @@
    UNSPEC_VDIVES
    UNSPEC_VDIVEU
    UNSPEC_XXEVAL
-   UNSPEC_XXSPLTIW
    UNSPEC_XXSPLTID
    UNSPEC_XXSPLTI32DX
    UNSPEC_XXBLEND
@@ -6239,36 +6238,6 @@
   "vmulld %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
-;; XXSPLTIW built-in function support
-(define_insn "xxspltiw_v4si"
-  [(set (match_operand:V4SI 0 "register_operand" "=wa")
-	(unspec:V4SI [(match_operand:SI 1 "s32bit_cint_operand" "n")]
-		     UNSPEC_XXSPLTIW))]
- "TARGET_POWER10"
- "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")
-  (set_attr "prefixed" "yes")])
-
-(define_expand "xxspltiw_v4sf"
-  [(set (match_operand:V4SF 0 "register_operand" "=wa")
-	(unspec:V4SF [(match_operand:SF 1 "const_double_operand" "n")]
-		     UNSPEC_XXSPLTIW))]
- "TARGET_POWER10"
-{
-  long long value = rs6000_const_f32_to_i32 (operands[1]);
-  emit_insn (gen_xxspltiw_v4sf_inst (operands[0], GEN_INT (value)));
-  DONE;
-})
-
-(define_insn "xxspltiw_v4sf_inst"
-  [(set (match_operand:V4SF 0 "register_operand" "=wa")
-	(unspec:V4SF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
-		     UNSPEC_XXSPLTIW))]
- "TARGET_POWER10"
- "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")
-  (set_attr "prefixed" "yes")])
-
 ;; XXSPLTIDP built-in function support
 (define_expand "xxspltidp_v2df"
   [(set (match_operand:V2DF 0 "register_operand" )
@@ -6420,3 +6389,120 @@
    [(set_attr "type" "vecsimple")
     (set_attr "prefixed" "yes")])
 
+;; XXSPLTIW built-in function support.  Convert to a vector constant, which
+;; will then be optimized to the XXSPLTIW instruction.
+(define_expand "xxspltiw_v4si"
+  [(use (match_operand:V4SI 0 "register_operand"))
+   (use (match_operand:SI 1 "s32bit_cint_operand"))]
+  "TARGET_POWER10"
+{
+  rtx op1 = operands[1];
+  rtvec rv = gen_rtvec (4, op1, op1, op1, op1);
+  rtx vec_constant = gen_rtx_CONST_VECTOR (V4SImode, rv);
+  emit_move_insn (operands[0], vec_constant);
+})
+
+(define_expand "xxspltiw_v4sf"
+  [(use (match_operand:V4SF 0 "register_operand"))
+   (use (match_operand:SF 1 "const_double_operand"))]
+  "TARGET_POWER10"
+{
+  rtx op1 = operands[1];
+  rtvec rv = gen_rtvec (4, op1, op1, op1, op1);
+  rtx vec_constant = gen_rtx_CONST_VECTOR (V4SFmode, rv);
+  emit_move_insn (operands[0], vec_constant);
+})
+
+;; XXSPLTIW support.  Add support for the XXSPLTIW built-in functions, and to
+;; use XXSPLTIW to load up vector V8HImode, V4SImode, and V4SFmode vector
+;; constants where all elements are the the same.  We special case loading up
+;; integer -16..15 and floating point 0.0f, since we can use the shorter
+;; XXSPLTIB, VSPLTISH, and VSPLTISW instructions.
+
+(define_insn "*xxspltiw_v8hi_dup"
+  [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa,wa,v,wa")
+	(vec_duplicate:V8HI
+	 (match_operand 1 "const_int_operand" "O,wM,wB,n")))]
+ "TARGET_XXSPLTIW"
+{
+  HOST_WIDE_INT uns_value = zero_extend_n_bit (16, INTVAL (operands[1]));
+  HOST_WIDE_INT sign_value = sign_extend_n_bit (16, uns_value);
+
+  if (sign_value == 0)
+    return "xxspltib %x0,0";
+
+  if (sign_value == -1)
+    return "xxspltib %x0,255";
+
+  int r = reg_or_subregno (operands[0]);
+  if (ALTIVEC_REGNO_P (r) && EASY_VECTOR_15 (sign_value))
+    {
+      operands[2] = GEN_INT (sign_value);
+      return "vspltish %0,%1";
+    }
+
+  operands[2] = GEN_INT ((HOST_WIDE_INT) ((uns_value << 16) | uns_value));
+  return "xxspltiw %x0,%2";
+}
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "*,*,*,yes")])
+
+(define_insn "*xxspltiw_v4si_dup"
+  [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa,wa,v,wa")
+	(vec_duplicate:V4SI
+	 (match_operand 1 "const_int_operand" "O,wM,wB,n")))]
+ "TARGET_XXSPLTIW"
+{
+  HOST_WIDE_INT sign_value = sign_extend_n_bit (32, INTVAL (operands[1]));
+
+  if (sign_value == 0)
+    return "xxspltib %x0,0";
+
+  if (sign_value == -1)
+    return "xxspltib %x0,255";
+
+  operands[2] = GEN_INT (sign_value);
+  int r = reg_or_subregno (operands[0]);
+  if (ALTIVEC_REGNO_P (r) && EASY_VECTOR_15 (sign_value))
+    return "vspltisw %0,%2";
+
+  return "xxspltiw %x0,%2";
+}
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "*,*,*,yes")])
+
+(define_insn "xxspltiw_v4sf_dup"
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
+	(vec_duplicate:V4SF
+	 (match_operand:SF 1 "const_double_operand" "O,F")))]
+ "TARGET_XXSPLTIW"
+{
+  if (operands[1] == CONST0_RTX (SFmode))
+    return "xxspltib %x0,0";
+
+  operands[2] = GEN_INT (rs6000_const_f32_to_i32 (operands[1]));
+  return "xxspltiw %x0,%2";
+}
+ [(set_attr "type" "vecsimple")
+  (set_attr "prefixed" "*,yes")])
+
+;; Convert vector constant to vec_duplicate.
+(define_mode_iterator XXSPLTIW [V8HI V4SI V4SF])
+
+(define_split
+  [(set (match_operand:XXSPLTIW 0 "vsx_register_operand")
+	(match_operand:XXSPLTIW 1 "xxspltiw_operand"))]
+  "TARGET_XXSPLTIW && GET_CODE (operands[1]) == CONST_VECTOR"
+  [(set (match_dup 0)
+	(vec_duplicate:<MODE> (match_dup 2)))]
+{
+  rtx element = CONST_VECTOR_ELT (operands[1], 0);
+
+  /* Make sure SImode/HImode are canonical signed constants.  */
+  if (<MODE>mode == SImode)
+    operands[2] = GEN_INT (sign_extend_n_bit (32, INTVAL (element)));
+  else if (<MODE>mode == HImode)
+    operands[2] = GEN_INT (sign_extend_n_bit (16, INTVAL (element)));
+  else
+    operands[2] = element;
+})
diff --git a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv.c b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv.c
index f312550f04d..8a00aaca70d 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv.c
@@ -8,6 +8,13 @@
 /* { dg-require-effective-target lp64 } */
 /* { dg-options "-maltivec -O3 -fwrapv " } */
 
+/* If the compiler is generating power10 instructions, turn it off.  Otherwise,
+   it will generate a XXSPLTIW instruction instead of LXV/LXVD2X.  */
+
+#ifdef _ARCH_PWR10
+#pragma GCC target ("cpu=power9")
+#endif
+
 #include <altivec.h>
 /* original test as reported.  */
 vector unsigned int splat(void)
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
new file mode 100644
index 00000000000..06830b02076
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
@@ -0,0 +1,66 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SF vector constants.  */
+
+vector float
+v4sf_const_1 (void)
+{
+  return (vector float) { 1.0f, 1.0f, 1.0f, 1.0f };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_nan (void)
+{
+  return (vector float) { __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf ("") };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_inf (void)
+{
+  return (vector float) { __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff () };		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_m0 (void)
+{
+  return (vector float) { -0.0f, -0.0f, -0.0f, -0.0f };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_1 (void)
+{
+  return vec_splats (1.0f);				/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_nan (void)
+{
+  return vec_splats (__builtin_nanf (""));		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_inf (void)
+{
+  return vec_splats (__builtin_inff ());		/* XXSPLTIW.  */
+}
+
+vector float
+v8hi_splats_m0 (void)
+{
+  return vec_splats (-0.0f);				/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  8 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
new file mode 100644
index 00000000000..02d0c6d66a2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VEXTSB2W) is not done.  */
+
+vector int
+v4si_const_1 (void)
+{
+  return (vector int) { 1, 1, 1, 1 };			/* VSLTPISW.  */
+}
+
+vector int
+v4si_const_126 (void)
+{
+  return (vector int) { 126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector int
+v4si_const_1023 (void)
+{
+  return (vector int) { 1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector int
+v4si_splats_1 (void)
+{
+  return vec_splats (1);				/* VSLTPISW.  */
+}
+
+vector int
+v4si_splats_126 (void)
+{
+  return vec_splats (126);				/* XXSPLTIW.  */
+}
+
+vector int
+v8hi_splats_1023 (void)
+{
+  return vec_splats (1023);				/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mvspltisw\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvextsb2w\M}    } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
new file mode 100644
index 00000000000..e6d0fab6d67
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
@@ -0,0 +1,53 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V8HI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VUPKLSB) is not done.  */
+
+vector short
+v8hi_const_1 (void)
+{
+  return (vector short) { 1, 1, 1, 1, 1, 1, 1, 1 };	/* VSLTPISH.  */
+}
+
+vector short
+v8hi_const_126 (void)
+{
+  return (vector short) { 126, 126, 126, 126,
+			  126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_const_1023 (void)
+{
+  return (vector short) { 1023, 1023, 1023, 1023,
+			  1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1 (void)
+{
+  return vec_splats ((short)1);				/* VSLTPISH.  */
+}
+
+vector short
+v8hi_splats_126 (void)
+{
+  return vec_splats ((short)126);			/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1023 (void)
+{
+  return vec_splats ((short)1023);			/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mvspltish\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvupklsb\M}     } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
index a135279b1d7..f49ef91422e 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -149,8 +149,6 @@ main (int argc, char *argv [])
   return 0;
 }
 
-/* { dg-final { scan-assembler-times {\mxxspltiw\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 1 } } */
 /* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */
-
-


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [gcc(refs/users/meissner/heads/work053)] Generate XXSPLTIW on power10.
@ 2021-05-19  3:38 Michael Meissner
  0 siblings, 0 replies; 7+ messages in thread
From: Michael Meissner @ 2021-05-19  3:38 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:5f855571e9e63dcb68492e3c6ede6919e3f86e94

commit 5f855571e9e63dcb68492e3c6ede6919e3f86e94
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Tue May 18 23:37:34 2021 -0400

    Generate XXSPLTIW on power10.
    
    This patch implements XXSPLTIDP support for SF and DF scalar constants and
    V2DF vector constants.
    
    A new constraint (eF) is added to match constants that can be loaded with
    the XXSPLTIDP instruction.
    
    I have added a temporary switch (-mxxspltidp) to control whether or not the
    XXSPLTIDP instruction is generated.
    
    This patch provides a xxspltidp_constant_p function which decodes both
    VEC_DUPLICATE and VECTOR_CONST insns (similar to the existing
    xxspltib_constant_p function).
    
    The xxspltidp_constant_p function returns the appropriate integer that will be
    used in the XXSPLTIDP instruction.  Note, because SFmode denormal values
    are undefined in the hardware, the xxspltidp_constant_p function returns
    false for these values.  Also xxspltidp_constant_p returns false for 0.0
    because is cheaper to implement without XXSPLTIDP.
    
    I added 3 new tests to test loading up SF/DF scalar and V2DF vector
    constants.
    
    gcc/
    2021-05-18  Michael Meissner  <meissner@linux.ibm.com>
    
            * config/rs6000/constraints.md (eF): New constraint.
            * config/rs6000/predicates.md (easy_fp_constant): If we can load
            the scalar constant with XXSPLTIDP, the floating point constant is
            easy.
            (xxspltidp_operand): New predicate.
            (easy_vector_constant): If we can generate XXSPLTIDP, mark the
            vector constant as easy.
            * config/rs6000/rs6000-cpus.def (OTHER_POWER10_MASKS): Add
            -mxxspltidp support.
            (POWERPC_MASKS): Add -mxxspltidp support.
            * config/rs6000/rs6000-protos.h (xxspltidp_constant_p): New
            declaration.
            * config/rs6000/rs6000.c (rs6000_option_override_internal): Add
            -mxxspltidp support.
            (const_vector_element_all_same): New function.
            (xxspltidp_constant_p): New function.
            (output_vec_const_move): Add support for XXSPLTIDP.
            (rs6000_opt_masks): Add -mxxspltidp support.
            (rs6000_emit_xxspltidp_v2df): Change function to implement the
            XXSPLTIDP instruction.
            * config/rs6000/rs6000.md (movsf_hardfloat): Add XXSPLTIDP
            support.
            (mov<mode>_hardfloat32, FMOVE64 iterator): Add XXSPLTIDP support.
            (mov<mode>_hardfloat64, FMOVE64 iterator): Add XXSPLTIDP support.
            * config/rs6000/rs6000.opt (-mxxspltidp): New switch.
            * config/rs6000/vsx.md (UNSPEC_XXSPLTIDP): Rename UNSPEC_XXSPLTID
            to UNSPEC_XXSPLTIDP to match the instruction.
            (xxspltidp_v2df): Use 'use' for the expand arguments, instead of
            writing out an insn.
            (xxspltidp_v2df_inst): Delete.
            (XXSPLTIDP): New mode iterator.
            (xxspltidp_<mode>_internal1): New define_insn_and_split.
            (xxspltidp_<mode>_internal2): New define_insn.
    
    gcc/testsuite/
    2021-05-03  Michael Meissner  <meissner@linux.ibm.com>
    
            * gcc.target/powerpc/vec-splat-constant-sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2df.c: New test.

Diff:
---
 gcc/config/rs6000/constraints.md                   |   5 +
 gcc/config/rs6000/predicates.md                    |  21 ++++
 gcc/config/rs6000/rs6000-cpus.def                  |   2 +
 gcc/config/rs6000/rs6000-protos.h                  |   1 +
 gcc/config/rs6000/rs6000.c                         | 111 +++++++++++++++++++--
 gcc/config/rs6000/rs6000.md                        |  52 ++++++----
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           |  52 +++++++---
 .../gcc.target/powerpc/vec-splat-constant-df.c     |  60 +++++++++++
 .../gcc.target/powerpc/vec-splat-constant-sf.c     |  60 +++++++++++
 .../gcc.target/powerpc/vec-splat-constant-v2df.c   |  64 ++++++++++++
 11 files changed, 394 insertions(+), 38 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index 561ce9797af..e1fadd63580 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -208,6 +208,11 @@
   (and (match_code "const_int")
        (match_test "((- (unsigned HOST_WIDE_INT) ival) + 0x8000) < 0x10000")))
 
+;; SF/DF/V2DF scalar or vector constant that can be loaded with XXSPLTIDP
+(define_constraint "eF"
+  "A vector constant that can be loaded with the XXSPLTIDP instruction."
+  (match_operand 0 "xxspltidp_operand"))
+
 ;; 34-bit signed integer constant
 (define_constraint "eI"
   "A signed 34-bit integer constant if prefixed instructions are supported."
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index bf678f429af..8c461ba2b76 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -601,6 +601,11 @@
   if (TARGET_VSX && op == CONST0_RTX (mode))
     return 1;
 
+  /* If we have the ISA 3.1 XXSPLTIDP instruction, see if the constant can
+     be loaded with that instruction.  */
+  if (xxspltidp_operand (op, mode))
+    return 1;
+
   /* Otherwise consider floating point constants hard, so that the
      constant gets pushed to memory during the early RTL phases.  This
      has the advantage that double precision constants that can be
@@ -666,6 +671,19 @@
   return true;
 })
 
+;; Return 1 if operand is a SF/DF CONST_DOUBLE or V2DF CONST_VECTOR that can be
+;; loaded via the ISA 3.1 XXSPLTIDP instruction.  Do not return true if the
+;; value is 0.0, since that is easy to generate without using XXSPLTIDP.
+(define_predicate "xxspltidp_operand"
+  (match_code "const_double,const_vector,vec_duplicate")
+{
+  if (op == CONST0_RTX (mode))
+    return false;
+
+  HOST_WIDE_INT value = 0;
+  return xxspltidp_constant_p (op, mode, &value);
+})
+
 ;; Return 1 if the operand is a CONST_VECTOR and can be loaded into a
 ;; vector register without using memory.
 (define_predicate "easy_vector_constant"
@@ -682,6 +700,9 @@
       if (xxspltiw_operand (op, mode))
 	return true;
 
+      if (xxspltidp_operand (op, mode))
+	return true;
+
       if (TARGET_P9_VECTOR
           && xxspltib_constant_p (op, mode, &num_insns, &value))
 	return true;
diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def
index a21a95bc7aa..3b657e490b1 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -86,6 +86,7 @@
 				 | OPTION_MASK_P10_FUSION		\
 				 | OPTION_MASK_P10_FUSION_LD_CMPI	\
 				 | OPTION_MASK_P10_FUSION_2LOGICAL	\
+				 | OPTION_MASK_XXSPLTIDP		\
 				 | OPTION_MASK_XXSPLTIW)
 
 /* Flags that need to be turned off if -mno-power9-vector.  */
@@ -162,6 +163,7 @@
 				 | OPTION_MASK_SOFT_FLOAT		\
 				 | OPTION_MASK_STRICT_ALIGN_OPTIONAL	\
 				 | OPTION_MASK_VSX			\
+				 | OPTION_MASK_XXSPLTIDP		\
 				 | OPTION_MASK_XXSPLTIW)
 #endif
 
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index c407034d58c..ea8ca6f8d95 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -32,6 +32,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, int, int, int,
 
 extern bool easy_altivec_constant (rtx, machine_mode);
 extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
+extern bool xxspltidp_constant_p (rtx, machine_mode, HOST_WIDE_INT *);
 extern int vspltis_shifted (rtx);
 extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
 extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index f0984e9fec5..bac28806a89 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4487,11 +4487,16 @@ rs6000_option_override_internal (bool global_init_p)
   if (!TARGET_PCREL && TARGET_PCREL_OPT)
     rs6000_isa_flags &= ~OPTION_MASK_PCREL_OPT;
 
-  if (TARGET_POWER10 && TARGET_VSX
-      && (rs6000_isa_flags_explicit & OPTION_MASK_XXSPLTIW) == 0)
-    rs6000_isa_flags |= OPTION_MASK_XXSPLTIW;
-  else if (!TARGET_POWER10 || !TARGET_VSX)
-    rs6000_isa_flags &= ~OPTION_MASK_XXSPLTIW;
+  if (TARGET_POWER10 && TARGET_VSX)
+    {
+      if ((rs6000_isa_flags_explicit & OPTION_MASK_XXSPLTIW) == 0)
+	rs6000_isa_flags |= OPTION_MASK_XXSPLTIW;
+
+      if ((rs6000_isa_flags_explicit & OPTION_MASK_XXSPLTIDP) == 0)
+	rs6000_isa_flags |= OPTION_MASK_XXSPLTIDP;
+    }
+  else
+    rs6000_isa_flags &= ~(OPTION_MASK_XXSPLTIW | OPTION_MASK_XXSPLTIDP);
 
   if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET)
     rs6000_print_isa_options (stderr, 0, "after subtarget", rs6000_isa_flags);
@@ -6491,6 +6496,96 @@ xxspltib_constant_p (rtx op,
   return true;
 }
 
+/* Return the element of a constant vector whose elements are all the same.  In
+   addition if VEC_DUPLICATE is used, return the element being duplicated.  If
+   neither is true, return NULL_RTX.  */
+
+static rtx
+const_vector_element_all_same (rtx op)
+{
+  if (GET_CODE (op) == VEC_DUPLICATE)
+    {
+      rtx element = XEXP (op, 0);
+      return (CONST_INT_P (element) || CONST_DOUBLE_P (element)
+	       ? element
+	       : NULL_RTX);
+    }
+
+  else if (GET_CODE (op) == CONST_VECTOR)
+    {
+      machine_mode mode = GET_MODE (op);
+      size_t n_elts = GET_MODE_NUNITS (mode);
+      rtx element = CONST_VECTOR_ELT (op, 0);
+
+      for (size_t i = 1; i < n_elts; i++)
+	if (!rtx_equal_p (element, CONST_VECTOR_ELT (op, 1)))
+	  return NULL_RTX;
+
+      return element;
+    }
+
+  return NULL_RTX;
+}
+
+/* Return true if OP is of the given MODE and can be synthesized with ISA 3.1
+   XXSPLTIDP instruction.
+
+   Return the constant that is being split via CONSTANT_PTR to use in the
+   XXSPLTIDP instruction.  */
+
+bool
+xxspltidp_constant_p (rtx op,
+		      machine_mode mode,
+		      HOST_WIDE_INT *constant_ptr)
+{
+  *constant_ptr = 0;
+
+  if (!TARGET_XXSPLTIDP)
+    return false;
+
+  if (mode == VOIDmode)
+    mode = GET_MODE (op);
+
+  rtx element = op;
+  if (mode == V2DFmode)
+    {
+      element = const_vector_element_all_same (op);
+      if (!element)
+	return false;
+
+      mode = DFmode;
+    }
+
+  if (mode != SFmode && mode != DFmode)
+    return false;
+
+  if (GET_MODE (element) != mode)
+    return false;
+
+  if (!CONST_DOUBLE_P (element))
+    return false;
+
+  /* Don't return true for 0.0 since that is easy to create without
+     XXSPLTIDP.  */
+  if (element == CONST0_RTX (mode))
+    return false;
+
+  /* If the value doesn't fit in a SFmode, exactly, we can't use XXSPLTIDP.  */
+  const struct real_value *rv = CONST_DOUBLE_REAL_VALUE (element);
+  if (!exact_real_truncate (SFmode, rv))
+    return 0;
+
+  long value;
+  REAL_VALUE_TO_TARGET_SINGLE (*rv, value);
+
+  /* Test for SFmode denormal (exponent is 0, mantissa field is non-zero).  */
+  if (((value & 0x7F800000) == 0) && ((value & 0x7FFFFF) != 0))
+    return false;
+
+  *constant_ptr = value;
+  return true;
+}
+
 const char *
 output_vec_const_move (rtx *operands)
 {
@@ -6535,7 +6630,8 @@ output_vec_const_move (rtx *operands)
 	    gcc_unreachable ();
 	}
 
-      if (xxspltiw_operand (vec, mode))
+      if (xxspltiw_operand (vec, mode)
+	  || xxspltidp_operand (vec, mode))
 	return "#";
 
       if (TARGET_P9_VECTOR
@@ -24130,6 +24226,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] =
   { "update",			OPTION_MASK_NO_UPDATE,		true , true  },
   { "vsx",			OPTION_MASK_VSX,		false, true  },
   { "xxspltiw",			OPTION_MASK_XXSPLTIW,		false, true  },
+  { "xxspltidp",		OPTION_MASK_XXSPLTIDP,		false, true  },
 #ifdef OPTION_MASK_64BIT
 #if TARGET_AIX_OS
   { "aix64",			OPTION_MASK_64BIT,		false, false },
@@ -27969,7 +28066,7 @@ rs6000_emit_xxspltidp_v2df (rtx dst, long value)
     inform (input_location,
 	    "the result for the xxspltidp instruction "
 	    "is undefined for subnormal input values");
-  emit_insn( gen_xxspltidp_v2df_inst (dst, GEN_INT (value)));
+  emit_insn (gen_xxspltidp_v2df_internal2 (dst, GEN_INT (value)));
 }
 
 /* Implement TARGET_ASM_GENERATE_PIC_ADDR_DIFF_VEC.  */
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 0c76338c734..57bbe281cee 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -7614,17 +7614,17 @@
 ;;
 ;;	LWZ          LFS        LXSSP       LXSSPX     STFS       STXSSP
 ;;	STXSSPX      STW        XXLXOR      LI         FMR        XSCPSGNDP
-;;	MR           MT<x>      MF<x>       NOP
+;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP
 
 (define_insn "movsf_hardfloat"
   [(set (match_operand:SF 0 "nonimmediate_operand"
 	 "=!r,       f,         v,          wa,        m,         wY,
 	  Z,         m,         wa,         !r,        f,         wa,
-	  !r,        *c*l,      !r,         *h")
+	  !r,        *c*l,      !r,         *h,        wa")
 	(match_operand:SF 1 "input_operand"
 	 "m,         m,         wY,         Z,         f,         v,
 	  wa,        r,         j,          j,         f,         wa,
-	  r,         r,         *h,         0"))]
+	  r,         r,         *h,         0,         eF"))]
   "(register_operand (operands[0], SFmode)
    || register_operand (operands[1], SFmode))
    && TARGET_HARD_FLOAT
@@ -7646,15 +7646,20 @@
    mr %0,%1
    mt%0 %1
    mf%1 %0
-   nop"
+   nop
+   #"
   [(set_attr "type"
 	"load,       fpload,    fpload,     fpload,    fpstore,   fpstore,
 	 fpstore,    store,     veclogical, integer,   fpsimple,  fpsimple,
-	 *,          mtjmpr,    mfjmpr,     *")
+	 *,          mtjmpr,    mfjmpr,     *,         vecperm")
    (set_attr "isa"
 	"*,          *,         p9v,        p8v,       *,         p9v,
 	 p8v,        *,         *,          *,         *,         *,
-	 *,          *,         *,          *")])
+	 *,          *,         *,          *,         p10")
+   (set_attr "prefixed"
+	"*,          *,         *,          *,         *,         *,
+	 *,          *,         *,          *,         *,         *,
+	 *,          *,         *,          *,         yes")])
 
 ;;	LWZ          LFIWZX     STW        STFIWX     MTVSRWZ    MFVSRWZ
 ;;	FMR          MR         MT%0       MF%1       NOP
@@ -7914,18 +7919,18 @@
 
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSD         STXSD       XXLOR       XXLXOR      GPR<-0
-;;           LWZ          STW         MR
+;;           LWZ          STW         MR          XXSPLTIDP
 
 
 (define_insn "*mov<mode>_hardfloat32"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
             "=m,          d,          d,          <f64_p9>,   wY,
               <f64_av>,   Z,          <f64_vsx>,  <f64_vsx>,  !r,
-              Y,          r,          !r")
+              Y,          r,          !r,         wa")
 	(match_operand:FMOVE64 1 "input_operand"
              "d,          m,          d,          wY,         <f64_p9>,
               Z,          <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
-              r,          Y,          r"))]
+              r,          Y,          r,          eF"))]
   "! TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -7942,20 +7947,25 @@
    #
    #
    #
+   #
    #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, two,
-             store,       load,       two")
+             store,       load,       two,        vecperm")
    (set_attr "size" "64")
    (set_attr "length"
             "*,           *,          *,          *,          *,
              *,           *,          *,          *,          8,
-             8,           8,          8")
+             8,           8,          8,          *")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
-             *,           *,          *")])
+             *,           *,          *,          p10")
+   (set_attr "prefixed"
+            "*,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          yes")])
 
 ;;           STW      LWZ     MR      G-const H-const F-const
 
@@ -7982,19 +7992,19 @@
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSDX        STXSDX      XXLOR       XXLXOR      LI 0
 ;;           STD          LD          MR          MT{CTR,LR}  MF{CTR,LR}
-;;           NOP          MFVSRD      MTVSRD
+;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP
 
 (define_insn "*mov<mode>_hardfloat64"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
            "=m,           d,          d,          <f64_p9>,   wY,
              <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
              YZ,          r,          !r,         *c*l,       !r,
-            *h,           r,          <f64_dm>")
+            *h,           r,          <f64_dm>,   wa")
 	(match_operand:FMOVE64 1 "input_operand"
             "d,           m,          d,          wY,         <f64_p9>,
              Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
              r,           YZ,         r,          r,          *h,
-             0,           <f64_dm>,   r"))]
+             0,           <f64_dm>,   r,          eF"))]
   "TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8016,18 +8026,24 @@
    mf%1 %0
    nop
    mfvsrd %0,%x1
-   mtvsrd %x0,%1"
+   mtvsrd %x0,%1
+   #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, integer,
              store,       load,       *,          mtjmpr,     mfjmpr,
-             *,           mfvsr,      mtvsr")
+             *,           mfvsr,      mtvsr,      vecperm")
    (set_attr "size" "64")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
              *,           *,          *,          *,          *,
-             *,           p8v,        p8v")])
+             *,           p8v,        p8v,        p10")
+   (set_attr "prefixed"
+            "*,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          yes")])
 
 ;;           STD      LD       MR      MT<SPR> MF<SPR> G-const
 ;;           H-const  F-const  Special
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 5e282d3741c..03e7ed28634 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -631,3 +631,7 @@ Generate code that will run in privileged state.
 mxxspltiw
 Target Undocumented Mask(XXSPLTIW) Var(rs6000_isa_flags)
 Generate (do not generate) XXSPLTIW instructions.
+
+mxxspltidp
+Target Undocumented Mask(XXSPLTIDP) Var(rs6000_isa_flags)
+Generate (do not generate) XXSPLTIDP instructions.
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index c850864c7ad..44e8fe1a974 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -386,7 +386,7 @@
    UNSPEC_VDIVES
    UNSPEC_VDIVEU
    UNSPEC_XXEVAL
-   UNSPEC_XXSPLTID
+   UNSPEC_XXSPLTIDP
    UNSPEC_XXSPLTI32DX
    UNSPEC_XXBLEND
    UNSPEC_XXPERMX
@@ -6240,9 +6240,8 @@
 
 ;; XXSPLTIDP built-in function support
 (define_expand "xxspltidp_v2df"
-  [(set (match_operand:V2DF 0 "register_operand" )
-	(unspec:V2DF [(match_operand:SF 1 "const_double_operand")]
-		     UNSPEC_XXSPLTID))]
+  [(use (match_operand:V2DF 0 "register_operand" ))
+   (use (match_operand:SF 1 "const_double_operand"))]
  "TARGET_POWER10"
 {
   long value = rs6000_const_f32_to_i32 (operands[1]);
@@ -6250,15 +6249,6 @@
   DONE;
 })
 
-(define_insn "xxspltidp_v2df_inst"
-  [(set (match_operand:V2DF 0 "register_operand" "=wa")
-	(unspec:V2DF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
-		     UNSPEC_XXSPLTID))]
-  "TARGET_POWER10"
-  "xxspltidp %x0,%1"
-  [(set_attr "type" "vecsimple")
-   (set_attr "prefixed" "yes")])
-
 ;; XXSPLTI32DX built-in function support
 (define_expand "xxsplti32dx_v4si"
   [(set (match_operand:V4SI 0 "register_operand" "=wa")
@@ -6484,3 +6474,39 @@
 {
   operands[2] = CONST_VECTOR_ELT (operands[1], 0);
 })
+
+;; Generate the XXSPLTIDP instruction to support SFmode and DFmode scalar
+;; constants and V2DF vector constants where both elements are the same.  The
+;; constant has be expressible as a SFmode constant that is not a SFmode
+;; denormal value.
+(define_mode_iterator XXSPLTIDP [SF DF V2DF])
+
+(define_insn_and_split "*xxspltidp_<mode>_internal1"
+  [(set (match_operand:XXSPLTIDP 0 "vsx_register_operand" "=wa")
+	(match_operand:XXSPLTIDP 1 "xxspltidp_operand"))]
+  "TARGET_XXSPLTIDP"
+  "#"
+  "&& 1"
+  [(set (match_operand:XXSPLTIDP 0 "vsx_register_operand")
+	(unspec:XXSPLTIDP [(match_dup 2)] UNSPEC_XXSPLTIDP))]
+{
+  HOST_WIDE_INT value = 0;
+
+  if (!xxspltidp_constant_p (operands[1], <MODE>mode, &value))
+    gcc_unreachable ();
+
+  operands[2] = GEN_INT (value);
+}
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "yes")])
+
+;; Just in case the user issued -mno-xxspltidp, allow the built-in function
+;; even if the compiler does not automatically generate XXSPLTIDP.
+(define_insn "xxspltidp_<mode>_internal2"
+  [(set (match_operand:XXSPLTIDP 0 "vsx_register_operand" "=wa")
+	(unspec:XXSPLTIDP [(match_operand 1 "const_int_operand" "n")]
+			  UNSPEC_XXSPLTIDP))]
+  "TARGET_POWER10"
+  "xxspltidp %x0,%1"
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "yes")])
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
new file mode 100644
index 00000000000..8f6e176f9af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+double
+scalar_double_0 (void)
+{
+  return 0.0;			/* XXSPLTIB or XXLXOR.  */
+}
+
+double
+scalar_double_1 (void)
+{
+  return 1.0;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+double
+scalar_double_m0 (void)
+{
+  return -0.0;			/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_nan (void)
+{
+  return __builtin_nan ("");	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_inf (void)
+{
+  return __builtin_inf ();	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inf ();
+}
+#endif
+
+double
+scalar_double_pi (void)
+{
+  return M_PI;			/* PLFD.  */
+}
+
+double
+scalar_double_denorm (void)
+{
+  return 0x1p-149f;		/* PLFD.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
new file mode 100644
index 00000000000..72504bdfbbd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating SFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+float
+scalar_float_0 (void)
+{
+  return 0.0f;			/* XXSPLTIB or XXLXOR.  */
+}
+
+float
+scalar_float_1 (void)
+{
+  return 1.0f;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+float
+scalar_float_m0 (void)
+{
+  return -0.0f;			/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_nan (void)
+{
+  return __builtin_nanf ("");	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_inf (void)
+{
+  return __builtin_inff ();	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inff ();
+}
+#endif
+
+float
+scalar_float_pi (void)
+{
+  return (float)M_PI;		/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_denorm (void)
+{
+  return 0x1p-149f;		/* PLFS.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
new file mode 100644
index 00000000000..d509459292c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
@@ -0,0 +1,64 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating V2DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+vector double
+v2df_double_0 (void)
+{
+  return (vector double) { 0.0, 0.0 };			/* XXSPLTIB or XXLXOR.  */
+}
+
+vector double
+v2df_double_1 (void)
+{
+  return (vector double) { 1.0, 1.0 };			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+vector double
+v2df_double_m0 (void)
+{
+  return (vector double) { -0.0, -0.0 };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_nan (void)
+{
+  return (vector double) { __builtin_nan (""),
+			   __builtin_nan ("") };	/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_inf (void)
+{
+  return (vector double) { __builtin_inf (),
+			   __builtin_inf () };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_m_inf (void)
+{
+  return (vector double) { - __builtin_inf (),
+			   - __builtin_inf () };	/* XXSPLTIDP.  */
+}
+#endif
+
+vector double
+v2df_double_pi (void)
+{
+  return (vector double) { M_PI, M_PI };		/* PLFD.  */
+}
+
+vector double
+v2df_double_denorm (void)
+{
+  return (vector double) { (double)0x1p-149f,
+			   (double)0x1p-149f };		/* PLFD.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [gcc(refs/users/meissner/heads/work053)] Generate XXSPLTIW on power10.
@ 2021-05-18 22:57 Michael Meissner
  0 siblings, 0 replies; 7+ messages in thread
From: Michael Meissner @ 2021-05-18 22:57 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:3c51a16356ee96826d87b3401e72078b22e50e9c

commit 3c51a16356ee96826d87b3401e72078b22e50e9c
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Tue May 18 18:57:20 2021 -0400

    Generate XXSPLTIW on power10.
    
    This patch adds support to automatically generate the ISA 3.1 XXSPLTIW
    instruction for V8HImode, V4SImode, and V4SFmode vectors.  It does this by
    adding support for vector constants that can be used, and adding a
    VEC_DUPLICATE pattern to generate the actual XXSPLTIW instruction.
    
    I rewrote the XXSPLTW built-in functions to use VEC_DUPLICATE instead of
    UNSPEC.
    
    This patch also updates the insn counts in the vec-splati-runnable.c test to
    work with the new option to use XXSPLTIW to load up some vector constants.
    
    I added 3 new tests to test loading up V8HI, V4SI, and V4SF vector
    constants.
    
    gcc/
    2021-05-18  Michael Meissner  <meissner@linux.ibm.com>
    
            * config/rs6000/predicates.md (xxspltiw_operand): New predicate.
            (easy_vector_constant): If we can use XXSPLTIW, the vector
            constant is easy.
            * config/rs6000/rs6000-cpus.def (ISA_3_1_MASKS_SERVER): Add
            -mxxspltiw support.
            (POWERPC_MASKS): Add -mxxspltiw support.
            * config/rs6000/rs6000.c (rs6000_option_override_internal): Add
            -mxxspltiw support.
            (xxspltib_constant_p): If we can generate XXSPLTIW, don't generate
            a XXSPLTIB and an extend instruction.
            (output_vec_const_move): Add support for loading up vector
            constants with XXSPLTIW.
            (rs6000_opt_masks): Add -mxxspltiw.
            * config/rs6000/rs6000.opt (-mxxspltiw): New debug switch.
            * config/rs6000/vsx.md (UNSPEC_XXSPLTIW): Delete.
            (xxspltiw_v8hi): New insn.
            (xxspltiw_v4si): Rewrite to generate a vector constant.
            (xxspltiw_v4sf): Rewrite to generate a vector constant.
            (xxspltiw_v4si_inst): Delete.
            (xxspltiw_v4sf_inst): Delete.
            (xxspltiw_v8hi_dup): New insn.
            (xxspltiw_v4si_dup): New insn.
            (xxspltiw_v4sf_dup): New insn.
            (XXSPLTIW): New mode iterator.
            (XXSPLTIW splitter): New insn splitter for XXSPLTIW.
    
    gcc/testsuite/
    2021-05-03  Michael Meissner  <meissner@linux.ibm.com>
    
            * gcc.target/powerpc/vec-splati-runnable.c: Update insn counts.
            * gcc.target/powerpc/vec-splat-constant-v4sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4si.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v8hi.c: New test.

Diff:
---
 gcc/config/rs6000/predicates.md                    |  29 +++++
 gcc/config/rs6000/rs6000-cpus.def                  |   7 +-
 gcc/config/rs6000/rs6000.c                         |  18 ++-
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           | 126 ++++++++++++++++-----
 .../gcc.target/powerpc/vec-splat-constant-v4sf.c   |  66 +++++++++++
 .../gcc.target/powerpc/vec-splat-constant-v4si.c   |  51 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-v8hi.c   |  53 +++++++++
 .../gcc.target/powerpc/vec-splati-runnable.c       |   4 +-
 9 files changed, 318 insertions(+), 40 deletions(-)

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index e21bc745f72..bf678f429af 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -640,6 +640,32 @@
   return num_insns == 1;
 })
 
+;; Return 1 if the operand is a CONST_VECTOR that can be loaded with the
+;; XXSPLTIW instruction.  Do not return 1 if the constant can be generated with
+;; XXSPLTIB or VSPLTIS{H,W}
+(define_predicate "xxspltiw_operand"
+  (match_code "const_vector")
+{
+  if (!TARGET_XXSPLTIW)
+    return false;
+
+  if (mode != V8HImode && mode != V4SImode && mode != V4SFmode)
+    return false;
+
+  rtx element = CONST_VECTOR_ELT (op, 0);
+  for (size_t i = 1; i < GET_MODE_NUNITS (mode); i++)
+    if (!rtx_equal_p (element, CONST_VECTOR_ELT (op, i)))
+      return false;
+
+  if (element == CONST0_RTX (GET_MODE_INNER (mode)))
+    return false;
+
+  if (CONST_INT_P (element) && EASY_VECTOR_15 (INTVAL (element)))
+    return false;
+
+  return true;
+})
+
 ;; Return 1 if the operand is a CONST_VECTOR and can be loaded into a
 ;; vector register without using memory.
 (define_predicate "easy_vector_constant"
@@ -653,6 +679,9 @@
       if (zero_constant (op, mode) || all_ones_constant (op, mode))
 	return true;
 
+      if (xxspltiw_operand (op, mode))
+	return true;
+
       if (TARGET_P9_VECTOR
           && xxspltib_constant_p (op, mode, &num_insns, &value))
 	return true;
diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def
index cbbb42c1b3a..a21a95bc7aa 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -85,7 +85,8 @@
 				 | OTHER_POWER10_MASKS			\
 				 | OPTION_MASK_P10_FUSION		\
 				 | OPTION_MASK_P10_FUSION_LD_CMPI	\
-				 | OPTION_MASK_P10_FUSION_2LOGICAL)
+				 | OPTION_MASK_P10_FUSION_2LOGICAL	\
+				 | OPTION_MASK_XXSPLTIW)
 
 /* Flags that need to be turned off if -mno-power9-vector.  */
 #define OTHER_P9_VECTOR_MASKS	(OPTION_MASK_FLOAT128_HW		\
@@ -160,8 +161,8 @@
 				 | OPTION_MASK_RECIP_PRECISION		\
 				 | OPTION_MASK_SOFT_FLOAT		\
 				 | OPTION_MASK_STRICT_ALIGN_OPTIONAL	\
-				 | OPTION_MASK_VSX)
-
+				 | OPTION_MASK_VSX			\
+				 | OPTION_MASK_XXSPLTIW)
 #endif
 
 /* This table occasionally claims that a processor does not support a
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index ef1ebaaee05..f0984e9fec5 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4487,6 +4487,12 @@ rs6000_option_override_internal (bool global_init_p)
   if (!TARGET_PCREL && TARGET_PCREL_OPT)
     rs6000_isa_flags &= ~OPTION_MASK_PCREL_OPT;
 
+  if (TARGET_POWER10 && TARGET_VSX
+      && (rs6000_isa_flags_explicit & OPTION_MASK_XXSPLTIW) == 0)
+    rs6000_isa_flags |= OPTION_MASK_XXSPLTIW;
+  else if (!TARGET_POWER10 || !TARGET_VSX)
+    rs6000_isa_flags &= ~OPTION_MASK_XXSPLTIW;
+
   if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET)
     rs6000_print_isa_options (stderr, 0, "after subtarget", rs6000_isa_flags);
 
@@ -6464,9 +6470,11 @@ xxspltib_constant_p (rtx op,
 
   /* See if we could generate vspltisw/vspltish directly instead of xxspltib +
      sign extend.  Special case 0/-1 to allow getting any VSX register instead
-     of an Altivec register.  */
-  if ((mode == V4SImode || mode == V8HImode) && !IN_RANGE (value, -1, 0)
-      && EASY_VECTOR_15 (value))
+     of an Altivec register.  Also if we can generate a XXSPLTIW instruction,
+     don't emit a XXSPLTIB and an extend instruction.  */
+  if ((mode == V4SImode || mode == V8HImode)
+      && !IN_RANGE (value, -1, 0)
+      && (EASY_VECTOR_15 (value) || TARGET_XXSPLTIW))
     return false;
 
   /* Return # of instructions and the constant byte for XXSPLTIB.  */
@@ -6527,6 +6535,9 @@ output_vec_const_move (rtx *operands)
 	    gcc_unreachable ();
 	}
 
+      if (xxspltiw_operand (vec, mode))
+	return "#";
+
       if (TARGET_P9_VECTOR
 	  && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
 	{
@@ -24118,6 +24129,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] =
   { "string",			0,				false, true  },
   { "update",			OPTION_MASK_NO_UPDATE,		true , true  },
   { "vsx",			OPTION_MASK_VSX,		false, true  },
+  { "xxspltiw",			OPTION_MASK_XXSPLTIW,		false, true  },
 #ifdef OPTION_MASK_64BIT
 #if TARGET_AIX_OS
   { "aix64",			OPTION_MASK_64BIT,		false, false },
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 2685fa71517..5e282d3741c 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -627,3 +627,7 @@ Enable instructions that guard against return-oriented programming attacks.
 mprivileged
 Target Var(rs6000_privileged) Init(0)
 Generate code that will run in privileged state.
+
+mxxspltiw
+Target Undocumented Mask(XXSPLTIW) Var(rs6000_isa_flags)
+Generate (do not generate) XXSPLTIW instructions.
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 15a8c0e22d8..c850864c7ad 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -386,7 +386,6 @@
    UNSPEC_VDIVES
    UNSPEC_VDIVEU
    UNSPEC_XXEVAL
-   UNSPEC_XXSPLTIW
    UNSPEC_XXSPLTID
    UNSPEC_XXSPLTI32DX
    UNSPEC_XXBLEND
@@ -6239,36 +6238,6 @@
   "vmulld %0,%1,%2"
   [(set_attr "type" "veccomplex")])
 
-;; XXSPLTIW built-in function support
-(define_insn "xxspltiw_v4si"
-  [(set (match_operand:V4SI 0 "register_operand" "=wa")
-	(unspec:V4SI [(match_operand:SI 1 "s32bit_cint_operand" "n")]
-		     UNSPEC_XXSPLTIW))]
- "TARGET_POWER10"
- "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")
-  (set_attr "prefixed" "yes")])
-
-(define_expand "xxspltiw_v4sf"
-  [(set (match_operand:V4SF 0 "register_operand" "=wa")
-	(unspec:V4SF [(match_operand:SF 1 "const_double_operand" "n")]
-		     UNSPEC_XXSPLTIW))]
- "TARGET_POWER10"
-{
-  long long value = rs6000_const_f32_to_i32 (operands[1]);
-  emit_insn (gen_xxspltiw_v4sf_inst (operands[0], GEN_INT (value)));
-  DONE;
-})
-
-(define_insn "xxspltiw_v4sf_inst"
-  [(set (match_operand:V4SF 0 "register_operand" "=wa")
-	(unspec:V4SF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
-		     UNSPEC_XXSPLTIW))]
- "TARGET_POWER10"
- "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")
-  (set_attr "prefixed" "yes")])
-
 ;; XXSPLTIDP built-in function support
 (define_expand "xxspltidp_v2df"
   [(set (match_operand:V2DF 0 "register_operand" )
@@ -6420,3 +6389,98 @@
    [(set_attr "type" "vecsimple")
     (set_attr "prefixed" "yes")])
 
+;; XXSPLTIW built-in function support.  Convert to a vector constant, which
+;; will then be optimized to the XXSPLTIW instruction.
+(define_expand "xxspltiw_v4si"
+  [(use (match_operand:V4SI 0 "register_operand"))
+   (use (match_operand:SI 1 "s32bit_cint_operand"))]
+  "TARGET_POWER10"
+{
+  rtx op1 = operands[1];
+  rtvec rv = gen_rtvec (4, op1, op1, op1, op1);
+  rtx vec_constant = gen_rtx_CONST_VECTOR (V4SImode, rv);
+  emit_move_insn (operands[0], vec_constant);
+})
+
+(define_expand "xxspltiw_v4sf"
+  [(use (match_operand:V4SF 0 "register_operand"))
+   (use (match_operand:SF 1 "const_double_operand"))]
+  "TARGET_POWER10"
+{
+  rtx op1 = operands[1];
+  rtvec rv = gen_rtvec (4, op1, op1, op1, op1);
+  rtx vec_constant = gen_rtx_CONST_VECTOR (V4SFmode, rv);
+  emit_move_insn (operands[0], vec_constant);
+})
+
+;; XXSPLTIW support.  Add support for the XXSPLTIW built-in functions, and to
+;; use XXSPLTIW to load up vector V8HImode, V4SImode, and V4SFmode vector
+;; constants where all elements are the the same.  We special case loading up
+;; integer -16..15 and floating point 0.0f, since we can use the shorter
+;; XXSPLTIB, VSPLTISH, and VSPLTISW instructions.
+
+(define_insn "*xxspltiw_v8hi_dup"
+  [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa,wa,v,wa")
+	(vec_duplicate:V8HI
+	 (match_operand 1 "const_int_operand" "O,wM,wB,n")))]
+ "TARGET_XXSPLTIW"
+{
+  HOST_WIDE_INT uns_value = INTVAL (operands[1]) & 0xffff;
+  HOST_WIDE_INT sign_value = (uns_value ^ 0x8000) - 0x8000;
+
+  if (sign_value == 0)
+    return "xxspltib %x0,0";
+
+  if (sign_value == -1)
+    return "xxspltib %x0,255";
+
+  int r = reg_or_subregno (operands[0]);
+  if (ALTIVEC_REGNO_P (r) && EASY_VECTOR_15 (sign_value))
+    return "vspltish %0,%1";
+
+  operands[2] = GEN_INT ((uns_value << 16) | uns_value);
+  return "xxspltiw %x0,%2";
+}
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "*,*,*,yes")])
+
+(define_insn "*xxspltiw_v4si_dup"
+  [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa,wa,v,wa")
+	(vec_duplicate:V4SI
+	 (match_operand 1 "const_int_operand" "O,wM,wB,n")))]
+ "TARGET_XXSPLTIW"
+ "@
+  xxspltib %x0,0
+  xxspltib %x0,255
+  vspltisw %0,%1
+  xxspltiw %x0,%1"
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "*,*,*,yes")])
+
+(define_insn "xxspltiw_v4sf_dup"
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
+	(vec_duplicate:V4SF
+	 (match_operand:SF 1 "const_double_operand" "O,F")))]
+ "TARGET_XXSPLTIW"
+{
+  if (operands[1] == CONST0_RTX (SFmode))
+    return "xxspltib %x0,0";
+
+  operands[2] = GEN_INT (rs6000_const_f32_to_i32 (operands[1]));
+  return "xxspltiw %x0,%2";
+}
+ [(set_attr "type" "vecsimple")
+  (set_attr "prefixed" "*,yes")])
+
+;; Convert vector constant to vec_duplicate.
+(define_mode_iterator XXSPLTIW [V8HI V4SI V4SF])
+
+(define_split
+  [(set (match_operand:XXSPLTIW 0 "vsx_register_operand")
+	(match_operand:XXSPLTIW 1 "xxspltiw_operand"))]
+  "TARGET_XXSPLTIW && GET_CODE (operands[1]) == CONST_VECTOR"
+  [(set (match_dup 0)
+	(vec_duplicate:<MODE> (match_dup 2)))]
+{
+  operands[2] = CONST_VECTOR_ELT (operands[1], 0);
+})
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
new file mode 100644
index 00000000000..06830b02076
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
@@ -0,0 +1,66 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SF vector constants.  */
+
+vector float
+v4sf_const_1 (void)
+{
+  return (vector float) { 1.0f, 1.0f, 1.0f, 1.0f };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_nan (void)
+{
+  return (vector float) { __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf ("") };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_inf (void)
+{
+  return (vector float) { __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff () };		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_m0 (void)
+{
+  return (vector float) { -0.0f, -0.0f, -0.0f, -0.0f };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_1 (void)
+{
+  return vec_splats (1.0f);				/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_nan (void)
+{
+  return vec_splats (__builtin_nanf (""));		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_inf (void)
+{
+  return vec_splats (__builtin_inff ());		/* XXSPLTIW.  */
+}
+
+vector float
+v8hi_splats_m0 (void)
+{
+  return vec_splats (-0.0f);				/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  8 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
new file mode 100644
index 00000000000..02d0c6d66a2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VEXTSB2W) is not done.  */
+
+vector int
+v4si_const_1 (void)
+{
+  return (vector int) { 1, 1, 1, 1 };			/* VSLTPISW.  */
+}
+
+vector int
+v4si_const_126 (void)
+{
+  return (vector int) { 126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector int
+v4si_const_1023 (void)
+{
+  return (vector int) { 1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector int
+v4si_splats_1 (void)
+{
+  return vec_splats (1);				/* VSLTPISW.  */
+}
+
+vector int
+v4si_splats_126 (void)
+{
+  return vec_splats (126);				/* XXSPLTIW.  */
+}
+
+vector int
+v8hi_splats_1023 (void)
+{
+  return vec_splats (1023);				/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mvspltisw\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvextsb2w\M}    } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
new file mode 100644
index 00000000000..e6d0fab6d67
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
@@ -0,0 +1,53 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V8HI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VUPKLSB) is not done.  */
+
+vector short
+v8hi_const_1 (void)
+{
+  return (vector short) { 1, 1, 1, 1, 1, 1, 1, 1 };	/* VSLTPISH.  */
+}
+
+vector short
+v8hi_const_126 (void)
+{
+  return (vector short) { 126, 126, 126, 126,
+			  126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_const_1023 (void)
+{
+  return (vector short) { 1023, 1023, 1023, 1023,
+			  1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1 (void)
+{
+  return vec_splats ((short)1);				/* VSLTPISH.  */
+}
+
+vector short
+v8hi_splats_126 (void)
+{
+  return vec_splats ((short)126);			/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1023 (void)
+{
+  return vec_splats ((short)1023);			/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mvspltish\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvupklsb\M}     } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
index a135279b1d7..f49ef91422e 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -149,8 +149,6 @@ main (int argc, char *argv [])
   return 0;
 }
 
-/* { dg-final { scan-assembler-times {\mxxspltiw\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 1 } } */
 /* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */
-
-


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2021-05-20 14:25 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-19 20:55 [gcc(refs/users/meissner/heads/work053)] Generate XXSPLTIW on power10 Michael Meissner
  -- strict thread matches above, loose matches on Subject: below --
2021-05-20 14:25 Michael Meissner
2021-05-19 20:01 Michael Meissner
2021-05-19 16:38 Michael Meissner
2021-05-19 16:20 Michael Meissner
2021-05-19  3:38 Michael Meissner
2021-05-18 22:57 Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).