[gcc(refs/users/meissner/heads/work065)] PATCH Generate XXSPLTIW on power10.

public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed

* [gcc(refs/users/meissner/heads/work065)] PATCH Generate XXSPLTIW on power10.
@ 2021-08-18  0:04 Michael Meissner
  0 siblings, 0 replies; only message in thread
From: Michael Meissner @ 2021-08-18  0:04 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:001b2c167a233cd7db2524fc67cbf0136049aedd

commit 001b2c167a233cd7db2524fc67cbf0136049aedd
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Tue Aug 17 20:03:59 2021 -0400

    PATCH Generate XXSPLTIW on power10.
    
    This patch adds support to automatically generate the ISA 3.1 XXSPLTIW
    instruction for V8HImode, V4SImode, and V4SFmode vectors.  It does this by
    adding support for vector constants that can be used, and adding a
    VEC_DUPLICATE pattern to generate the actual XXSPLTIW instruction.
    
    I rewrote the XXSPLTW built-in functions to use VEC_DUPLICATE instead of
    UNSPEC.
    
    This patch also updates the insn counts in the vec-splati-runnable.c test to
    work with the new option to use XXSPLTIW to load up some vector constants.
    
    I added 3 new tests to test loading up V8HI, V4SI, and V4SF vector
    constants.
    
    The pr87631-wrapv.c test needed to be adjusted to account for xxspltiw
    code generation on power10.
    
    2021-08-17  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
            * config/rs6000/constraints.md (eW): New constraint.
            * config/rs6000/predicates.md (xxspltiw_operand): New predicate.
            (easy_vector_constant): If we can use XXSPLTIW, the vector
            constant is easy.
            * config/rs6000/rs6000-protos.h (xxspltiw_constant_p): New
            declaration.
            * config/rs6000/rs6000.c (xxspltib_constant_p): If we can generate
            XXSPLTIW, don't generate a XXSPLTIB and an extend instruction.
            (const_vector_all_elements_equal_p): New function.
            (xxspltiw_constant_p): New function.
            (output_vec_const_move): Add support for loading up vector
            constants with XXSPLTIW.
            (prefixed_permute_p): New function.
            * config/rs6000/rs6000.h (TARGET_XXSPLTIW): New macro.
            (SIGN_EXTEND_8BIT): New macro.
            (SIGN_EXTEND_16BIT): New macro.
            (SIGN_EXTEND_32BIT): New macro.
            * config/rs6000/rs6000.md (prefixed attribute): Add support for
            prefixed permute instructions like XXSPLTIW.
            * config/rs6000/rs6000.opt (-mxxspltiw): New debug switch.
            * config/rs6000/vsx.md (UNSPEC_XXSPLTIW): Delete.
            (vsx_mov<mode>_64bit): Add support for constants loaded with
            XXSPLTIW.
            (vsx_mov<mode>_32bit): Likewise.
            (xxspltiw_v8hi): New insn.
            (xxspltiw_v4si): Rewrite to generate a vector constant.
            (xxspltiw_v4sf): Rewrite to generate a vector constant.
            (xxspltiw_v4si_inst): Delete.
            (xxspltiw_v4sf_inst): Delete.
            (xxspltiw_v8hi_dup): New insn.
            (xxspltiw_v4si_dup): New insn.
            (xxspltiw_v4sf_dup): New insn.
    
    gcc/testsuite/
            * gcc.target/powerpc/pr86731-fwrapv.c: Update insn counts on
            power10.
            * gcc.target/powerpc/vec-splati-runnable.c: Update insn counts.
            * gcc.target/powerpc/vec-splat-constant-v4sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4si.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v8hi.c: New test.

Diff:
---
 gcc/config/rs6000/constraints.md                   |   5 +
 gcc/config/rs6000/predicates.md                    |  14 ++
 gcc/config/rs6000/rs6000-protos.h                  |   2 +
 gcc/config/rs6000/rs6000.c                         | 169 ++++++++++++++++++++-
 gcc/config/rs6000/rs6000.h                         |  24 +++
 gcc/config/rs6000/rs6000.md                        |   5 +
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           | 147 ++++++++++++++----
 gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv.c  |   9 +-
 .../gcc.target/powerpc/vec-splat-constant-v4sf.c   |  66 ++++++++
 .../gcc.target/powerpc/vec-splat-constant-v4si.c   |  51 +++++++
 .../gcc.target/powerpc/vec-splat-constant-v8hi.c   |  53 +++++++
 .../gcc.target/powerpc/vec-splati-runnable.c       |   4 +-
 13 files changed, 514 insertions(+), 39 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index c8cff1a3038..fe30ca4ea8f 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -213,6 +213,11 @@
   "A signed 34-bit integer constant if prefixed instructions are supported."
   (match_operand 0 "cint34_operand"))
 
+;; Vector constant that can be loaded with XXSPLTIW
+(define_constraint "eW"
+  "A vector constant that can be loaded with the XXSPLTIW instruction."
+  (match_operand 0 "xxspltiw_operand"))
+
 ;; Floating-point constraints.  These two are defined so that insn
 ;; length attributes can be calculated exactly.
 
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 956e42bc514..00fdeb1b8e3 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -640,6 +640,17 @@
   return num_insns == 1;
 })
 
+;; Return 1 if the operand is a CONST_VECTOR that can be loaded with the
+;; XXSPLTIW instruction.  Do not return 1 if the constant can be generated with
+;; XXSPLTIB or VSPLTIS{H,W}
+(define_predicate "xxspltiw_operand"
+  (match_code "const_vector")
+{
+  HOST_WIDE_INT xxspltiw_value = 0;
+
+  return xxspltiw_constant_p (op, mode, &xxspltiw_value);
+})
+
 ;; Return 1 if the operand is a CONST_VECTOR and can be loaded into a
 ;; vector register without using memory.
 (define_predicate "easy_vector_constant"
@@ -653,6 +664,9 @@
       if (zero_constant (op, mode) || all_ones_constant (op, mode))
 	return true;
 
+      if (xxspltiw_operand (op, mode))
+	return true;
+
       if (TARGET_P9_VECTOR
           && xxspltib_constant_p (op, mode, &num_insns, &value))
 	return true;
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 14f6b313105..139a487a19d 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -32,6 +32,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, int, int, int,
 
 extern int easy_altivec_constant (rtx, machine_mode);
 extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
+extern bool xxspltiw_constant_p (rtx, machine_mode, HOST_WIDE_INT *);
 extern int vspltis_shifted (rtx);
 extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
 extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
@@ -198,6 +199,7 @@ enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode);
 extern bool prefixed_load_p (rtx_insn *);
 extern bool prefixed_store_p (rtx_insn *);
 extern bool prefixed_paddi_p (rtx_insn *);
+extern bool prefixed_permute_p (rtx_insn *);
 extern void rs6000_asm_output_opcode (FILE *);
 extern void output_pcrel_opt_reloc (rtx);
 extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index e073b26b430..c26ca38fbcc 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6514,9 +6514,11 @@ xxspltib_constant_p (rtx op,
 
   /* See if we could generate vspltisw/vspltish directly instead of xxspltib +
      sign extend.  Special case 0/-1 to allow getting any VSX register instead
-     of an Altivec register.  */
-  if ((mode == V4SImode || mode == V8HImode) && !IN_RANGE (value, -1, 0)
-      && EASY_VECTOR_15 (value))
+     of an Altivec register.  Also if we can generate a XXSPLTIW instruction,
+     don't emit a XXSPLTIB and an extend instruction.  */
+  if ((mode == V4SImode || mode == V8HImode)
+      && !IN_RANGE (value, -1, 0)
+      && (EASY_VECTOR_15 (value) || TARGET_XXSPLTIW))
     return false;
 
   /* Return # of instructions and the constant byte for XXSPLTIB.  */
@@ -6533,6 +6535,129 @@ xxspltib_constant_p (rtx op,
   return true;
 }
 
+/* Return true if the argument is a constant vector where all elements are the
+   same.  */
+
+static bool
+const_vector_all_elements_equal_p (rtx op, machine_mode mode)
+{
+  if (!CONST_VECTOR_P (op))
+    return false;
+
+  rtx element = CONST_VECTOR_ELT (op, 0);
+  if (!CONST_INT_P (element) && !CONST_DOUBLE_P (element))
+    return false;
+
+  for (size_t i = 1; i < GET_MODE_NUNITS (mode); i++)
+    if (!rtx_equal_p (element, CONST_VECTOR_ELT (op, i)))
+      return false;
+
+  return true;
+}
+
+/* Return true if OP is of the given MODE and can be synthesized with ISA 3.1
+   XXSPLTIW instruction.
+
+   Return the constant via CONSTANT_PTR to use in the XXSPLTIW instruction.
+   The assembler does not like negative numbers for XXSPLTIW, so we need to
+   return a 16-bit unsigned value.  */
+
+bool
+xxspltiw_constant_p (rtx op,
+		     machine_mode mode,
+		     HOST_WIDE_INT *constant_ptr)
+{
+  HOST_WIDE_INT value;
+
+  *constant_ptr = 0;
+
+  if (!TARGET_XXSPLTIW)
+    return false;
+
+  if (!CONST_VECTOR_P (op))
+    return true;
+
+  rtx element0 = CONST_VECTOR_ELT (op, 0);
+
+  switch (mode)
+    {
+      /* V4SImode constant vectors that have the same element are can be used
+	 with XXSPLTIW.  */
+    case V4SImode:
+      if (!const_vector_all_elements_equal_p (op, mode))
+	return false;
+
+      /* Don't return true if we can use the shorter vspltisw instruction.  */
+      value = INTVAL (element0);
+      if (EASY_VECTOR_15 (value))
+	return false;
+
+      *constant_ptr = value & 0xffffffff;
+      return true;
+
+      /* V4SFmode constant vectors that have the same element are
+	 can be used with XXSPLTIW.  */
+    case V4SFmode:
+      if (!const_vector_all_elements_equal_p (op, mode))
+	return false;
+
+      /* Don't return true for 0.0f, since that can be created with
+	 xxspltib.  */
+      if (element0 == CONST0_RTX (SFmode))
+	return false;
+
+      value = rs6000_const_f32_to_i32 (element0);
+      *constant_ptr = value & 0xffffffff;
+      return true;
+
+      /* V8Hmode constant vectors that have the same element are can be used
+	 with XXSPLTIW.  */
+    case V8HImode:
+      if (const_vector_all_elements_equal_p (op, mode))
+	{
+	  /* Don't return true if we can use the shorter vspltish instruction.  */
+	  value = INTVAL (element0);
+	  if (EASY_VECTOR_15 (value))
+	    return false;
+
+	  value &= 0xffff;
+	  *constant_ptr = (value << 16) | value;
+	  return true;
+	}
+
+      else
+	{
+	  /* Check if all even elements are the same and all odd elements are
+	     the same.  */
+	  rtx element1 = CONST_VECTOR_ELT (op, 1);
+
+	  if (rtx_equal_p (element0, CONST_VECTOR_ELT (op, 2))
+	      && rtx_equal_p (element1, CONST_VECTOR_ELT (op, 3))
+	      && rtx_equal_p (element0, CONST_VECTOR_ELT (op, 4))
+	      && rtx_equal_p (element1, CONST_VECTOR_ELT (op, 5))
+	      && rtx_equal_p (element0, CONST_VECTOR_ELT (op, 6))
+	      && rtx_equal_p (element1, CONST_VECTOR_ELT (op, 7)))
+	    {
+	      HOST_WIDE_INT even = INTVAL (element0) & 0xffff;
+	      HOST_WIDE_INT odd = INTVAL (element1) & 0xffff;
+
+	      if (!BYTES_BIG_ENDIAN)
+		std::swap (even, odd);
+
+	      *constant_ptr = (even << 16) | odd;
+	      return true;
+	    }
+
+	  break;
+	}
+
+    default:
+      break;
+    }
+
+  return false;
+}
+
 const char *
 output_vec_const_move (rtx *operands)
 {
@@ -6548,6 +6673,7 @@ output_vec_const_move (rtx *operands)
     {
       bool dest_vmx_p = ALTIVEC_REGNO_P (REGNO (dest));
       int xxspltib_value = 256;
+      HOST_WIDE_INT xxspltiw_value = 0;
       int num_insns = -1;
 
       if (zero_constant (vec, mode))
@@ -6577,6 +6703,12 @@ output_vec_const_move (rtx *operands)
 	    gcc_unreachable ();
 	}
 
+      if (xxspltiw_constant_p (vec, mode, &xxspltiw_value))
+	{
+	  operands[2] = GEN_INT (xxspltiw_value);
+	  return "xxspltiw %x0,%2";
+	}
+
       if (TARGET_P9_VECTOR
 	  && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
 	{
@@ -26219,6 +26351,37 @@ prefixed_paddi_p (rtx_insn *insn)
   return (iform == INSN_FORM_PCREL_EXTERNAL || iform == INSN_FORM_PCREL_LOCAL);
 }
 
+/* Whether a permute type instruction is a prefixed instruction.  This is
+   called from the prefixed attribute processing.  */
+
+bool
+prefixed_permute_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  if (!set)
+    return false;
+
+  rtx dest = SET_DEST (set);
+  rtx src = SET_SRC (set);
+  machine_mode mode = GET_MODE (dest);
+
+  if (!REG_P (dest) && !SUBREG_P (dest))
+    return false;
+
+  switch (mode)
+    {
+    case V8HImode:
+    case V4SImode:
+    case V4SFmode:
+      return xxspltiw_operand (src, mode);
+
+    default:
+      break;
+    }
+
+  return false;
+}
+
 /* Whether the next instruction needs a 'p' prefix issued before the
    instruction is printed out.  */
 static bool prepend_p_to_next_insn;
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index c5d20d240f2..88c10c55eb5 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -504,6 +504,11 @@ extern int rs6000_vector_align[];
 #define TARGET_MINMAX	(TARGET_HARD_FLOAT && TARGET_PPC_GFXOPT		\
 			 && (TARGET_P9_MINMAX || !flag_trapping_math))
 
+/* Whether we can generate the XXSPLTI* prefixed instructions.  We also need
+   VSX instructions to be generated.  */
+#define TARGET_XXSPLTIW		(TARGET_XXSPLTIW_DEBUG && TARGET_PREFIXED \
+				 && TARGET_VSX)
+
 /* In switching from using target_flags to using rs6000_isa_flags, the options
    machinery creates OPTION_MASK_<xxx> instead of MASK_<xxx>.  For now map
    OPTION_MASK_<xxx> back into MASK_<xxx>.  */
@@ -2609,3 +2614,22 @@ while (0)
        rs6000_asm_output_opcode (STREAM);				\
     }									\
   while (0)
+
+/* Provide macros for sign-extending values.  */
+#if HOST_BITS_PER_CHAR == 8
+#define SIGN_EXTEND_8BIT(X) ((HOST_WIDE_INT)(signed char)(X))
+#else
+#define SIGN_EXTEND_8BIT(X) ((((X) & 0xff) ^ 0x80) - 0x80)
+#endif
+
+#if HOST_BITS_PER_SHORT == 16
+#define SIGN_EXTEND_16BIT(X) ((HOST_WIDE_INT)(short)(X))
+#else
+#define SIGN_EXTEND_16BIT(X) ((((X) & 0xffff) ^ 0x8000) - 0x8000)
+#endif
+
+#if HOST_BITS_PER_INT == 32
+#define SIGN_EXTEND_32BIT(X) ((HOST_WIDE_INT)(int)(X))
+#else
+#define SIGN_EXTEND_32BIT(X) ((((X) & 0xffffffff) ^ 0x80000000) - 0x80000000)
+#endif
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index a84438f8545..9ea9568cbe4 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -314,6 +314,11 @@
 
 	 (eq_attr "type" "integer,add")
 	 (if_then_else (match_test "prefixed_paddi_p (insn)")
+		       (const_string "yes")
+		       (const_string "no"))
+
+	 (eq_attr "type" "vecperm")
+	 (if_then_else (match_test "prefixed_permute_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))]
 
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 0538db387dc..9548334d846 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -639,3 +639,7 @@ Enable instructions that guard against return-oriented programming attacks.
 mprivileged
 Target Var(rs6000_privileged) Init(0)
 Generate code that will run in privileged state.
+
+mxxspltiw
+Target Undocumented Var(TARGET_XXSPLTIW_DEBUG) Init(1) Save
+Generate (do not generate) XXSPLTIW instructions.
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index e4ca6e94d49..b7141b00435 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -373,7 +373,6 @@
    UNSPEC_VDIVES
    UNSPEC_VDIVEU
    UNSPEC_XXEVAL
-   UNSPEC_XXSPLTIW
    UNSPEC_XXSPLTID
    UNSPEC_XXSPLTI32DX
    UNSPEC_XXBLEND
@@ -1192,16 +1191,19 @@
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
+;;              XXSPLTIW
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        r,         we,        ?wQ,
                 ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
+                wa,
                 ?wa,       v,         <??r>,     wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        we,        r,         r,
                 wQ,        Y,         r,         r,         wE,        jwM,
+                eW,
                 ?jwM,      W,         <nW>,      v,         wZ"))]
 
   "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
@@ -1213,35 +1215,43 @@
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
                 store,     load,      store,     *,         vecsimple, vecsimple,
+                vecperm,
                 vecsimple, *,         *,         vecstore,  vecload")
    (set_attr "num_insns"
                "*,         *,         *,         2,         *,         2,
                 2,         2,         2,         2,         *,         *,
+                *,
                 *,         5,         2,         *,         *")
    (set_attr "max_prefixed_insns"
                "*,         *,         *,         *,         *,         2,
                 2,         2,         2,         2,         *,         *,
+                *,
                 *,         *,         *,         *,         *")
    (set_attr "length"
                "*,         *,         *,         8,         *,         8,
                 8,         8,         8,         8,         *,         *,
+                *,
                 *,         20,        8,         *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
                 *,         *,         *,         *,         p9v,       *,
+                p10,
                 <VSisa>,   *,         *,         *,         *")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
+;;              XXSPLTIW
 ;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
+                wa,
                 wa,        v,         ?wa,       v,         <??r>,
                 wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        Y,         r,         r,
+                eW,
                 wE,        jwM,       ?jwM,      W,         <nW>,
                 v,         wZ"))]
 
@@ -1253,14 +1263,17 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, load,      store,    *,
+                vecperm,
                 vecsimple, vecsimple, vecsimple, *,         *,
                 vecstore,  vecload")
    (set_attr "length"
                "*,         *,         *,         16,        16,        16,
+                *,
                 *,         *,         *,         20,        16,
                 *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
+                p10,
                 p9v,       *,         <VSisa>,   *,         *,
                 *,         *")])
 
@@ -6407,36 +6420,6 @@
   [(set_attr "type" "veccomplex")])
 
 \f
-;; XXSPLTIW built-in function support
-(define_insn "xxspltiw_v4si"
-  [(set (match_operand:V4SI 0 "register_operand" "=wa")
-	(unspec:V4SI [(match_operand:SI 1 "s32bit_cint_operand" "n")]
-		     UNSPEC_XXSPLTIW))]
- "TARGET_POWER10"
- "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")
-  (set_attr "prefixed" "yes")])
-
-(define_expand "xxspltiw_v4sf"
-  [(set (match_operand:V4SF 0 "register_operand" "=wa")
-	(unspec:V4SF [(match_operand:SF 1 "const_double_operand" "n")]
-		     UNSPEC_XXSPLTIW))]
- "TARGET_POWER10"
-{
-  long value = rs6000_const_f32_to_i32 (operands[1]);
-  emit_insn (gen_xxspltiw_v4sf_inst (operands[0], GEN_INT (value)));
-  DONE;
-})
-
-(define_insn "xxspltiw_v4sf_inst"
-  [(set (match_operand:V4SF 0 "register_operand" "=wa")
-	(unspec:V4SF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
-		     UNSPEC_XXSPLTIW))]
- "TARGET_POWER10"
- "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")
-  (set_attr "prefixed" "yes")])
-
 ;; XXSPLTIDP built-in function support
 (define_expand "xxspltidp_v2df"
   [(set (match_operand:V2DF 0 "register_operand" )
@@ -6589,3 +6572,105 @@
    [(set_attr "type" "vecsimple")
     (set_attr "prefixed" "yes")])
 
+;; XXSPLTIW built-in function support.  Convert to a vector constant, which
+;; will then be optimized to the XXSPLTIW instruction.
+(define_expand "xxspltiw_v4si"
+  [(use (match_operand:V4SI 0 "register_operand"))
+   (use (match_operand:SI 1 "s32bit_cint_operand"))]
+  "TARGET_POWER10"
+{
+  rtx op1 = operands[1];
+  rtvec rv = gen_rtvec (4, op1, op1, op1, op1);
+  rtx vec_constant = gen_rtx_CONST_VECTOR (V4SImode, rv);
+  emit_move_insn (operands[0], vec_constant);
+})
+
+(define_expand "xxspltiw_v4sf"
+  [(use (match_operand:V4SF 0 "register_operand"))
+   (use (match_operand:SF 1 "const_double_operand"))]
+  "TARGET_POWER10"
+{
+  rtx op1 = operands[1];
+  rtvec rv = gen_rtvec (4, op1, op1, op1, op1);
+  rtx vec_constant = gen_rtx_CONST_VECTOR (V4SFmode, rv);
+  emit_move_insn (operands[0], vec_constant);
+})
+
+;; XXSPLTIW support.  Add support for the XXSPLTIW built-in functions, and to
+;; use XXSPLTIW to load up vector V8HImode, V4SImode, and V4SFmode vector
+;; constants where all elements are the the same.  We special case loading up
+;; integer -16..15 and floating point 0.0f, since we can use the shorter
+;; XXSPLTIB, VSPLTISH, and VSPLTISW instructions.
+
+(define_insn "*xxspltiw_v8hi_dup"
+  [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa,wa,v,wa")
+	(vec_duplicate:V8HI
+	 (match_operand 1 "const_int_operand" "O,wM,wB,n")))]
+ "TARGET_XXSPLTIW"
+{
+  HOST_WIDE_INT sign_value = SIGN_EXTEND_16BIT (INTVAL (operands[1]));
+
+  if (sign_value == 0)
+    return "xxspltib %x0,0";
+
+  if (sign_value == -1)
+    return "xxspltib %x0,255";
+
+  int r = reg_or_subregno (operands[0]);
+  if (ALTIVEC_REGNO_P (r) && EASY_VECTOR_15 (sign_value))
+    {
+      operands[2] = GEN_INT (sign_value);
+      return "vspltish %0,%1";
+    }
+
+  HOST_WIDE_INT uns_value = sign_value & 0xffff;
+  operands[2] = GEN_INT ((uns_value << 16) | uns_value);
+  return "xxspltiw %x0,%2";
+}
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "*,*,*,yes")])
+
+(define_insn "*xxspltiw_v4si_dup"
+  [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa,wa,v,wa")
+	(vec_duplicate:V4SI
+	 (match_operand 1 "const_int_operand" "O,wM,wB,n")))]
+ "TARGET_XXSPLTIW"
+{
+  HOST_WIDE_INT sign_value = SIGN_EXTEND_32BIT (INTVAL (operands[1]));
+
+  if (sign_value == 0)
+    return "xxspltib %x0,0";
+
+  if (sign_value == -1)
+    return "xxspltib %x0,255";
+
+  int r = reg_or_subregno (operands[0]);
+  if (ALTIVEC_REGNO_P (r) && EASY_VECTOR_15 (sign_value))
+    {
+      operands[2] = GEN_INT (sign_value);
+      return "vspltisw %0,%2";
+    }
+
+  /* The assembler doesn't like negative values.  */
+  operands[2] = GEN_INT (sign_value & 0xffffffff);
+  return "xxspltiw %x0,%2";
+}
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "*,*,*,yes")])
+
+(define_insn "xxspltiw_v4sf_dup"
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
+	(vec_duplicate:V4SF
+	 (match_operand:SF 1 "const_double_operand" "O,F")))]
+ "TARGET_XXSPLTIW"
+{
+  if (operands[1] == CONST0_RTX (SFmode))
+    return "xxspltib %x0,0";
+
+  /* The assembler doesn't like negative values.  */
+  long value = rs6000_const_f32_to_i32 (operands[1]);
+  operands[2] = GEN_INT (value & 0xffffffff);
+  return "xxspltiw %x0,%2";
+}
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "*,yes")])
diff --git a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv.c b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv.c
index f312550f04d..22e43d21565 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv.c
@@ -57,7 +57,12 @@ vector signed int splats3(void)
    If folding is enabled, the vec_sl tests using vector long long type will
    generate a lvx instead of a vspltisw+vsld pair.  */
 
-/* { dg-final { scan-assembler-times {\mvspltis[bhw]\M|\mxxspltib\M} 7 } } */
-/* { dg-final { scan-assembler-times {\mvsl[bhwd]\M} 7 } } */
+/* { dg-final { scan-assembler-times {\mvspltis[bhw]\M|\mxxspltib\M} 7 { target { ! has_arch_pwr10 } } } } */
+/* { dg-final { scan-assembler-times {\mvsl[bhwd]\M} 7                 { target { ! has_arch_pwr10 } } } } */
+
+/* { dg-final { scan-assembler-times {\mxxspltib\M}  2                 { target {   has_arch_pwr10 } } } } */
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  5                 { target {   has_arch_pwr10 } } } } */
+/* { dg-final { scan-assembler-times {\mvsl[bhwd]\M} 2                 { target {   has_arch_pwr10 } } } } */
+
 /* { dg-final { scan-assembler-times {\mlvx\M|\mlxvd2x\M} 0 } } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
new file mode 100644
index 00000000000..06830b02076
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
@@ -0,0 +1,66 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SF vector constants.  */
+
+vector float
+v4sf_const_1 (void)
+{
+  return (vector float) { 1.0f, 1.0f, 1.0f, 1.0f };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_nan (void)
+{
+  return (vector float) { __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf ("") };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_inf (void)
+{
+  return (vector float) { __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff () };		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_m0 (void)
+{
+  return (vector float) { -0.0f, -0.0f, -0.0f, -0.0f };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_1 (void)
+{
+  return vec_splats (1.0f);				/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_nan (void)
+{
+  return vec_splats (__builtin_nanf (""));		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_inf (void)
+{
+  return vec_splats (__builtin_inff ());		/* XXSPLTIW.  */
+}
+
+vector float
+v8hi_splats_m0 (void)
+{
+  return vec_splats (-0.0f);				/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  8 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
new file mode 100644
index 00000000000..02d0c6d66a2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VEXTSB2W) is not done.  */
+
+vector int
+v4si_const_1 (void)
+{
+  return (vector int) { 1, 1, 1, 1 };			/* VSLTPISW.  */
+}
+
+vector int
+v4si_const_126 (void)
+{
+  return (vector int) { 126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector int
+v4si_const_1023 (void)
+{
+  return (vector int) { 1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector int
+v4si_splats_1 (void)
+{
+  return vec_splats (1);				/* VSLTPISW.  */
+}
+
+vector int
+v4si_splats_126 (void)
+{
+  return vec_splats (126);				/* XXSPLTIW.  */
+}
+
+vector int
+v8hi_splats_1023 (void)
+{
+  return vec_splats (1023);				/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mvspltisw\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvextsb2w\M}    } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
new file mode 100644
index 00000000000..e6d0fab6d67
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
@@ -0,0 +1,53 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V8HI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VUPKLSB) is not done.  */
+
+vector short
+v8hi_const_1 (void)
+{
+  return (vector short) { 1, 1, 1, 1, 1, 1, 1, 1 };	/* VSLTPISH.  */
+}
+
+vector short
+v8hi_const_126 (void)
+{
+  return (vector short) { 126, 126, 126, 126,
+			  126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_const_1023 (void)
+{
+  return (vector short) { 1023, 1023, 1023, 1023,
+			  1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1 (void)
+{
+  return vec_splats ((short)1);				/* VSLTPISH.  */
+}
+
+vector short
+v8hi_splats_126 (void)
+{
+  return vec_splats ((short)126);			/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1023 (void)
+{
+  return vec_splats ((short)1023);			/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mvspltish\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvupklsb\M}     } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
index a135279b1d7..f49ef91422e 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -149,8 +149,6 @@ main (int argc, char *argv [])
   return 0;
 }
 
-/* { dg-final { scan-assembler-times {\mxxspltiw\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 1 } } */
 /* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */
-
-


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2021-08-18  0:04 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-18  0:04 [gcc(refs/users/meissner/heads/work065)] PATCH Generate XXSPLTIW on power10 Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).