public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc(refs/users/meissner/heads/work066)] Generate XXSPLTIW on power10.
@ 2021-08-24 22:28 Michael Meissner
  0 siblings, 0 replies; 3+ messages in thread
From: Michael Meissner @ 2021-08-24 22:28 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:b9c47f59300e9a7812277d6a78d19dfcd6b1f3a2

commit b9c47f59300e9a7812277d6a78d19dfcd6b1f3a2
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Tue Aug 24 18:28:15 2021 -0400

    Generate XXSPLTIW on power10.
    
    This patch adds support to automatically generate the ISA 3.1 XXSPLTIW
    instruction for V8HImode, V4SImode, and V4SFmode vectors.  It does this by
    adding support for vector constants that can be used, and adding a
    VEC_DUPLICATE pattern to generate the actual XXSPLTIW instruction.
    
    I rewrote the XXSPLTW built-in functions to use VEC_DUPLICATE instead of
    UNSPEC.
    
    This patch also updates the insn counts in the vec-splati-runnable.c test to
    work with the new option to use XXSPLTIW to load up some vector constants.
    
    I added 4 new tests to test loading up V16QI, V8HI, V4SI, and V4SF vector
    constants.
    
    2021-08-24  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
            * config/rs6000/constraints.md (eW): New constraint.
            * config/rs6000/predicates.md (xxspltiw_operand): New predicate.
            (easy_vector_constant): If we can use XXSPLTIW, the vector
            constant is easy.
            * config/rs6000/rs6000-protos.h (xxspltiw_constant_p): New
            declaration.
            * config/rs6000/rs6000.c (xxspltib_constant_p): If we can generate
            XXSPLTIW, don't generate a XXSPLTIB and an extend instruction.
            (const_vector_all_elements_equal_p): New function.
            (xxspltiw_constant_p): New function.
            (output_vec_const_move): Add support for loading up vector
            constants with XXSPLTIW.
            (prefixed_permute_p): Recognize xxspltiw instructions as
            prefixed.
            * config/rs6000/rs6000.opt (-mxxspltiw): New debug switch.
            * config/rs6000/vsx.md (vsx_mov<mode>_64bit): Add support for
            constants loaded with XXSPLTIW.
            (vsx_mov<mode>_32bit): Likewise.
            (vsx_splat_v8hi_xxspltiw): New insn.
            (vsx_splat_v4si_xxspltiw): New insn.
            (vsx_splat_v4sf_xxspltiw): New insn.
    
    gcc/testsuite/
            * gcc.target/powerpc/vec-splat-constant-v16qi.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4si.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v8hi.c: New test.
            * gcc.target/powerpc/vec-splati-runnable.c: Update insn counts.

Diff:
---
 gcc/config/rs6000/constraints.md                   |   5 +
 gcc/config/rs6000/predicates.md                    |  13 ++
 gcc/config/rs6000/rs6000-protos.h                  |   1 +
 gcc/config/rs6000/rs6000.c                         | 174 +++++++++++++++++++++
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           |  74 +++++++--
 .../gcc.target/powerpc/vec-splat-constant-v16qi.c  |  27 ++++
 .../gcc.target/powerpc/vec-splat-constant-v4sf.c   |  67 ++++++++
 .../gcc.target/powerpc/vec-splat-constant-v4si.c   |  51 ++++++
 .../gcc.target/powerpc/vec-splat-constant-v8hi.c   |  62 ++++++++
 .../gcc.target/powerpc/vec-splati-runnable.c       |   4 +-
 11 files changed, 465 insertions(+), 17 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index 4637345f84b..82fecca4a91 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -218,6 +218,11 @@
   "A signed 34-bit integer constant if prefixed instructions are supported."
   (match_operand 0 "cint34_operand"))
 
+;; Vector constant that can be loaded with XXSPLTIW
+(define_constraint "eW"
+  "A vector constant that can be loaded with the XXSPLTIW instruction."
+  (match_operand 0 "xxspltiw_operand"))
+
 ;; KF/TF scalar than can be loaded with LXVKQ
 (define_constraint "eQ"
   "An IEEE 128-bit constant that can be loaded with the LXVKQ instruction."
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 1a87272604b..b601f73600f 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -650,6 +650,16 @@
   return num_insns == 1;
 })
 
+;; Return 1 if the operand is a CONST_VECTOR that can be loaded with the
+;; XXSPLTIW instruction.
+(define_predicate "xxspltiw_operand"
+  (match_code "const_vector")
+{
+  HOST_WIDE_INT xxspltiw_value = 0;
+
+  return xxspltiw_constant_p (op, mode, &xxspltiw_value);
+})
+
 ;; Return 1 if operand is a SF/DF CONST_DOUBLE or V2DF CONST_VECTOR that can be
 ;; loaded via the ISA 3.1 XXSPLTIDP instruction.
 (define_predicate "xxspltidp_operand"
@@ -681,6 +691,9 @@
       if (zero_constant (op, mode) || all_ones_constant (op, mode))
 	return true;
 
+      if (xxspltiw_operand (op, mode))
+	return true;
+
       if (xxspltidp_operand (op, mode))
 	return true;
 
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index de1c1ee9a8b..181d20d7e05 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -33,6 +33,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, int, int, int,
 extern int easy_altivec_constant (rtx, machine_mode);
 extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
 extern bool xxspltidp_constant_p (rtx, machine_mode, HOST_WIDE_INT *);
+extern bool xxspltiw_constant_p (rtx, machine_mode, HOST_WIDE_INT *);
 extern bool lxvkq_constant_p (rtx, machine_mode, int *);
 extern int vspltis_shifted (rtx);
 extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 4f12f9eb968..86e79977ec2 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6526,6 +6526,10 @@ xxspltib_constant_p (rtx op,
   else if (IN_RANGE (value, -1, 0))
     *num_insns_ptr = 1;
 
+  /* See if we could generate XXSPLTIW directly.  */
+  else if (xxspltiw_operand (op, mode))
+    return false;
+
   else
     *num_insns_ptr = 2;
 
@@ -6533,6 +6537,164 @@ xxspltib_constant_p (rtx op,
   return true;
 }
 
+/* Return true if the argument is a constant vector where all elements are the
+   same.  */
+
+static bool
+const_vector_all_elements_equal_p (rtx op, machine_mode mode)
+{
+  if (!CONST_VECTOR_P (op))
+    return false;
+
+  rtx element = CONST_VECTOR_ELT (op, 0);
+  if (!CONST_INT_P (element) && !CONST_DOUBLE_P (element))
+    return false;
+
+  for (size_t i = 1; i < GET_MODE_NUNITS (mode); i++)
+    if (!rtx_equal_p (element, CONST_VECTOR_ELT (op, i)))
+      return false;
+
+  return true;
+}
+
+/* Return true if OP is of the given MODE and can be synthesized with ISA 3.1
+   XXSPLTIW instruction.
+
+   Return the constant via CONSTANT_PTR to use in the XXSPLTIW instruction.
+   The assembler does not like negative numbers for XXSPLTIW, so we need to
+   return a 16-bit unsigned value.  */
+
+bool
+xxspltiw_constant_p (rtx op,
+		     machine_mode mode,
+		     HOST_WIDE_INT *constant_ptr)
+{
+  HOST_WIDE_INT value;
+
+  *constant_ptr = 0;
+
+  if (!TARGET_XXSPLTIW)
+    return false;
+
+  if (!CONST_VECTOR_P (op))
+    return true;
+
+  rtx element0 = CONST_VECTOR_ELT (op, 0);
+
+  switch (mode)
+    {
+      /* V4SImode constant vectors that have the same element are can be used
+	 with XXSPLTIW.  */
+    case V4SImode:
+      if (!const_vector_all_elements_equal_p (op, mode))
+	return false;
+
+      /* Don't return true if we can use the shorter vspltisw instruction.  */
+      value = INTVAL (element0);
+      if (EASY_VECTOR_15 (value))
+	return false;
+
+      *constant_ptr = value & 0xffffffff;
+      return true;
+
+      /* V4SFmode constant vectors that have the same element are
+	 can be used with XXSPLTIW.  */
+    case V4SFmode:
+      if (!const_vector_all_elements_equal_p (op, mode))
+	return false;
+
+      /* Don't return true for 0.0f, since that can be created with
+	 xxspltib.  */
+      if (element0 == CONST0_RTX (SFmode))
+	return false;
+
+      value = rs6000_const_f32_to_i32 (element0);
+      *constant_ptr = value & 0xffffffff;
+      return true;
+
+      /* V8Hmode constant vectors that have the same element are can be used
+	 with XXSPLTIW.  */
+    case V8HImode:
+      if (const_vector_all_elements_equal_p (op, mode))
+	{
+	  /* Don't return true if we can use the shorter vspltish instruction.  */
+	  value = INTVAL (element0);
+	  if (EASY_VECTOR_15 (value))
+	    return false;
+
+	  value &= 0xffff;
+	  *constant_ptr = (value << 16) | value;
+	  return true;
+	}
+
+      else
+	{
+	  /* Check if all even elements are the same and all odd elements are
+	     the same.  */
+	  rtx element1 = CONST_VECTOR_ELT (op, 1);
+
+	  if (!CONST_INT_P (element1))
+	    return false;
+
+	  for (size_t i = 2; i < GET_MODE_NUNITS (V8HImode); i += 2)
+	    if (!rtx_equal_p (element0, CONST_VECTOR_ELT (op, i))
+		|| !rtx_equal_p (element1, CONST_VECTOR_ELT (op, i + 1)))
+	      return false;
+
+	  HOST_WIDE_INT value0 = INTVAL (element0) & 0xffff;
+	  HOST_WIDE_INT value1 = INTVAL (element1) & 0xffff;
+
+	  if (!BYTES_BIG_ENDIAN)
+	    std::swap (value0, value1);
+
+	  *constant_ptr = (value0 << 16) | value1;
+	  return true;
+	}
+
+      /* V16QI constant vectors that have the first four elements identical to
+	 the next set of 4 elements, and so forth can generate XXSPLTIW.  */
+    case V16QImode:
+	{
+	  if (xxspltib_constant_nosplit (op, mode))
+	    return false;
+
+	  rtx element1 = CONST_VECTOR_ELT (op, 1);
+	  rtx element2 = CONST_VECTOR_ELT (op, 2);
+	  rtx element3 = CONST_VECTOR_ELT (op, 3);
+
+	  if (!CONST_INT_P (element0) || !CONST_INT_P (element1)
+	      || !CONST_INT_P (element2) || !CONST_INT_P (element3))
+	    return false;
+
+	  for (size_t i = 4; i < GET_MODE_NUNITS (V16QImode); i += 4)
+	    if (!rtx_equal_p (element0, CONST_VECTOR_ELT (op, i))
+		|| !rtx_equal_p (element1, CONST_VECTOR_ELT (op, i + 1))
+		|| !rtx_equal_p (element2, CONST_VECTOR_ELT (op, i + 2))
+		|| !rtx_equal_p (element3, CONST_VECTOR_ELT (op, i + 3)))
+	      return false;
+
+	  HOST_WIDE_INT value0 = INTVAL (element0) & 0xff;
+	  HOST_WIDE_INT value1 = INTVAL (element1) & 0xff;
+	  HOST_WIDE_INT value2 = INTVAL (element2) & 0xff;
+	  HOST_WIDE_INT value3 = INTVAL (element3) & 0xff;
+
+	  if (BYTES_BIG_ENDIAN)
+	    *constant_ptr = ((value0 << 24) | (value1 << 16) | (value2 << 8)
+			     | value3);
+	  else
+	    *constant_ptr = ((value3 << 24) | (value2 << 16) | (value1 << 8)
+			     | value0);
+
+	  return true;
+	}
+
+    default:
+      break;
+    }
+
+  return false;
+}
+
 /* Return true if OP is of the given MODE and can be synthesized with ISA 3.1
    XXSPLTIDP instruction.
 
@@ -6722,6 +6884,7 @@ output_vec_const_move (rtx *operands)
     {
       bool dest_vmx_p = ALTIVEC_REGNO_P (REGNO (dest));
       int xxspltib_value = 256;
+      HOST_WIDE_INT xxspltiw_value = 0;
       HOST_WIDE_INT xxspltidp_value = 0;
       int num_insns = -1;
       int lxvkq_immediate = 0;
@@ -6753,6 +6916,12 @@ output_vec_const_move (rtx *operands)
 	    gcc_unreachable ();
 	}
 
+      if (xxspltiw_constant_p (vec, mode, &xxspltiw_value))
+	{
+	  operands[2] = GEN_INT (xxspltiw_value);
+	  return "xxspltiw %x0,%2";
+	}
+
       if (xxspltidp_constant_p (vec, mode, &xxspltidp_value))
 	{
 	  operands[2] = GEN_INT (xxspltidp_value);
@@ -26435,6 +26604,11 @@ prefixed_permute_p (rtx_insn *insn)
 
   switch (mode)
     {
+    case V8HImode:
+    case V4SImode:
+    case V4SFmode:
+      return xxspltiw_operand (src, mode);
+
     case DFmode:
     case SFmode:
     case V2DFmode:
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 016b19e5d2d..ebdf7cd036d 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -644,6 +644,10 @@ mxxspltidp
 Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
 Generate (do not generate) XXSPLTIDP instructions.
 
+mxxspltiw
+Target Undocumented Var(TARGET_XXSPLTIW) Init(1) Save
+Generate (do not generate) XXSPLTIW instructions.
+
 mlxvkq
 Target Undocumented Var(TARGET_LXVKQ) Init(1) Save
 Generate (do not generate) LXVKQ instructions.
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 00f6d0eda14..4f716a9f2d2 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1191,19 +1191,19 @@
 ;; instruction). But generate XXLXOR/XXLORC if it will avoid a register move.
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
-;;		XXSPLTIDP  LXVKQ
+;;		XXSPLTIDP  LXVKQ      XXSPLTIW
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        r,         we,        ?wQ,
-                wa,        wa,
+                wa,        wa,        wa,
                 ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
                 ?wa,       v,         <??r>,     wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        we,        r,         r,
-                eF,        eQ,
+                eF,        eQ,        eW,
                 wQ,        Y,         r,         r,         wE,        jwM,
                 ?jwM,      W,         <nW>,      v,         wZ"))]
 
@@ -1215,44 +1215,44 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
-                vecperm,   vecperm,
+                vecperm,   vecperm,   vecperm,
                 store,     load,      store,     *,         vecsimple, vecsimple,
                 vecsimple, *,         *,         vecstore,  vecload")
    (set_attr "num_insns"
                "*,         *,         *,         2,         *,         2,
-                *,         *,
+                *,         *,         *,
                 2,         2,         2,         2,         *,         *,
                 *,         5,         2,         *,         *")
    (set_attr "max_prefixed_insns"
                "*,         *,         *,         *,         *,         2,
-                *,         *,
+                *,         *,         *,
                 2,         2,         2,         2,         *,         *,
                 *,         *,         *,         *,         *")
    (set_attr "length"
                "*,         *,         *,         8,         *,         8,
-                *,         *,
+                *,         *,         *,
                 8,         8,         8,         8,         *,         *,
                 *,         20,        8,         *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
-                p10,       p10,
+                p10,       p10,       p10,
                 *,         *,         *,         *,         p9v,       *,
                 <VSisa>,   *,         *,         *,         *")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
-;;		XXSPLTIDP  LXVKQ
+;;		XXSPLTIDP  LXVKQ      XXSPLTIW
 ;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
-                wa,        wa,
+                wa,        wa,        wa,
                 wa,        v,         ?wa,       v,         <??r>,
                 wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        Y,         r,         r,
-                eF,        eQ,
+                eF,        eQ,        eW,
                 wE,        jwM,       ?jwM,      W,         <nW>,
                 v,         wZ"))]
 
@@ -1264,17 +1264,17 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, load,      store,    *,
-                vecperm,   vecperm,
+                vecperm,   vecperm,   vecperm,
                 vecsimple, vecsimple, vecsimple, *,         *,
                 vecstore,  vecload")
    (set_attr "length"
                "*,         *,         *,         16,        16,        16,
-                *,         *,
+                *,         *,         *,
                 *,         *,         *,         20,        16,
                 *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
-                p10,       p10,
+                p10,       p10,       p10,
                 p9v,       *,         <VSisa>,   *,         *,
                 *,         *")])
 
@@ -4666,6 +4666,52 @@
    (set_attr "length" "*,8,*")
    (set_attr "isa" "*,p8v,*")])
 
+;; V8HI/V4SI/V4SF splat immediate constant with XXSPLTIW.  We don't need to add
+;; V16QI since the xxspltib instruction already handles this case.
+(define_insn "*vsx_splat_v8hi_xxspltiw"
+  [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa")
+	(vec_duplicate:V8HI (match_operand 1 "const_int_operand" "n")))]
+  "TARGET_PREFIXED && TARGET_VSX && TARGET_XXSPLTIW
+   && !s5bit_cint_operand (operands[1], VOIDmode)"
+{
+  HOST_WIDE_INT value = INTVAL (operands[1]) & 0xffff;
+
+  operands[2] = GEN_INT ((value << 16) | value);
+  return "xxspltiw %x0,%2";
+}
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
+(define_insn "*vsx_splat_v4si_xxspltiw"
+  [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa")
+	(vec_duplicate:V4SI (match_operand 1 "const_int_operand" "n")))]
+  "TARGET_PREFIXED && TARGET_VSX && TARGET_XXSPLTIW
+   && !s5bit_cint_operand (operands[1], VOIDmode)"
+{
+  /* The assembler doesn't like negative numbers.  */
+  operands[2] = GEN_INT (INTVAL (operands[1]) & 0xffffffff);
+  return "xxspltiw %x0,%2";
+}
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
+(define_insn "*vsx_splat_v4sf_xxspltiw"
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
+	(vec_duplicate:V4SF
+	 (match_operand 1 "const_double_operand" "j,F")))]
+  "TARGET_PREFIXED && TARGET_VSX && TARGET_XXSPLTIW"
+{
+  if (operands[1] == CONST0_RTX (V4SFmode))
+    return "xxlxor %x0,%x0,%x0";
+
+  /* The assembler doesn't like negative numbers.  */
+  long value = rs6000_const_f32_to_i32 (operands[1]);
+  operands[2] = GEN_INT (value & 0xffffffff);
+  return "xxspltiw %x0,%2";
+}
+  [(set_attr "type" "vecsimple,vecperm")
+   (set_attr "prefixed" "*,yes")])
+
 ;; V4SF/V4SI splat from a vector element
 (define_insn "vsx_xxspltw_<mode>"
   [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
new file mode 100644
index 00000000000..27764ddbc83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V16HI vector constants where the
+   first 4 elements are the same as the next 4 elements, etc.  */
+
+vector unsigned char
+v16qi_const_1 (void)
+{
+  return (vector unsigned char) { 1, 1, 1, 1, 1, 1, 1, 1,
+				  1, 1, 1, 1, 1, 1, 1, 1, }; /* VSLTPISB.  */
+}
+
+vector unsigned char
+v16qi_const_2 (void)
+{
+  return (vector unsigned char) { 1, 2, 3, 4, 1, 2, 3, 4,
+				  1, 2, 3, 4, 1, 2, 3, 4, }; /* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}              1 } } */
+/* { dg-final { scan-assembler-times {\mvspltisb\M|\mxxspltib\M} 1 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}                   } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}                    } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
new file mode 100644
index 00000000000..1f0475cf47a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
@@ -0,0 +1,67 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SF vector constants.  */
+
+vector float
+v4sf_const_1 (void)
+{
+  return (vector float) { 1.0f, 1.0f, 1.0f, 1.0f };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_nan (void)
+{
+  return (vector float) { __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf ("") };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_inf (void)
+{
+  return (vector float) { __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff () };		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_m0 (void)
+{
+  return (vector float) { -0.0f, -0.0f, -0.0f, -0.0f };	/* XXSPLTIB/VSLW.  */
+}
+
+vector float
+v4sf_splats_1 (void)
+{
+  return vec_splats (1.0f);				/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_nan (void)
+{
+  return vec_splats (__builtin_nanf (""));		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_inf (void)
+{
+  return vec_splats (__builtin_inff ());		/* XXSPLTIW.  */
+}
+
+vector float
+v8hi_splats_m0 (void)
+{
+  return vec_splats (-0.0f);				/* XXSPLTIB/VSLW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 6 } } */
+/* { dg-final { scan-assembler-times {\mxxspltib\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvslw\M}     2 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}      } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}       } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
new file mode 100644
index 00000000000..02d0c6d66a2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VEXTSB2W) is not done.  */
+
+vector int
+v4si_const_1 (void)
+{
+  return (vector int) { 1, 1, 1, 1 };			/* VSLTPISW.  */
+}
+
+vector int
+v4si_const_126 (void)
+{
+  return (vector int) { 126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector int
+v4si_const_1023 (void)
+{
+  return (vector int) { 1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector int
+v4si_splats_1 (void)
+{
+  return vec_splats (1);				/* VSLTPISW.  */
+}
+
+vector int
+v4si_splats_126 (void)
+{
+  return vec_splats (126);				/* XXSPLTIW.  */
+}
+
+vector int
+v8hi_splats_1023 (void)
+{
+  return vec_splats (1023);				/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mvspltisw\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvextsb2w\M}    } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
new file mode 100644
index 00000000000..59418d3bb0a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
@@ -0,0 +1,62 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V8HI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VUPKLSB) is not done.  */
+
+vector short
+v8hi_const_1 (void)
+{
+  return (vector short) { 1, 1, 1, 1, 1, 1, 1, 1 };	/* VSLTPISH.  */
+}
+
+vector short
+v8hi_const_126 (void)
+{
+  return (vector short) { 126, 126, 126, 126,
+			  126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_const_1023 (void)
+{
+  return (vector short) { 1023, 1023, 1023, 1023,
+			  1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1 (void)
+{
+  return vec_splats ((short)1);				/* VSLTPISH.  */
+}
+
+vector short
+v8hi_splats_126 (void)
+{
+  return vec_splats ((short)126);			/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1023 (void)
+{
+  return vec_splats ((short)1023);			/* XXSPLTIW.  */
+}
+
+/* Test that we can optimiza V8HI where all of the even elements are the same
+   and all of the odd elements are the same.  */
+vector short
+v8hi_const_1023_1000 (void)
+{
+  return (vector short) { 1023, 1000, 1023, 1000,
+			  1023, 1000, 1023, 1000 };	/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  5 } } */
+/* { dg-final { scan-assembler-times {\mvspltish\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvupklsb\M}     } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
index a135279b1d7..f49ef91422e 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -149,8 +149,6 @@ main (int argc, char *argv [])
   return 0;
 }
 
-/* { dg-final { scan-assembler-times {\mxxspltiw\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 1 } } */
 /* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */
-
-


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [gcc(refs/users/meissner/heads/work066)] Generate XXSPLTIW on power10.
@ 2021-08-25  3:41 Michael Meissner
  0 siblings, 0 replies; 3+ messages in thread
From: Michael Meissner @ 2021-08-25  3:41 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:43626b0ffaa26c4435b39f346765eb2313e0016a

commit 43626b0ffaa26c4435b39f346765eb2313e0016a
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Tue Aug 24 23:41:27 2021 -0400

    Generate XXSPLTIW on power10.
    
    This patch adds support to automatically generate the ISA 3.1 XXSPLTIW
    instruction for V8HImode, V4SImode, and V4SFmode vectors.  It does this by
    adding support for vector constants that can be used, and adding a
    VEC_DUPLICATE pattern to generate the actual XXSPLTIW instruction.
    
    I rewrote the XXSPLTW built-in functions to use VEC_DUPLICATE instead of
    UNSPEC.
    
    This patch also updates the insn counts in the vec-splati-runnable.c test to
    work with the new option to use XXSPLTIW to load up some vector constants.
    
    I added 4 new tests to test loading up V16QI, V8HI, V4SI, and V4SF vector
    constants.
    
    2021-08-24  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
            * config/rs6000/constraints.md (eW): New constraint.
            * config/rs6000/predicates.md (xxspltiw_operand): New predicate.
            (easy_vector_constant): If we can use XXSPLTIW, the vector
            constant is easy.
            * config/rs6000/rs6000-protos.h (xxspltiw_constant_p): New
            declaration.
            * config/rs6000/rs6000.c (xxspltib_constant_p): If we can generate
            XXSPLTIW, don't generate a XXSPLTIB and an extend instruction.
            (const_vector_all_elements_equal_p): New function.
            (xxspltiw_constant_p): New function.
            (output_vec_const_move): Add support for loading up vector
            constants with XXSPLTIW.
            (prefixed_permute_p): Recognize xxspltiw instructions as
            prefixed.
            * config/rs6000/rs6000.opt (-mxxspltiw): New debug switch.
            * config/rs6000/vsx.md (vsx_mov<mode>_64bit): Add support for
            constants loaded with XXSPLTIW.
            (vsx_mov<mode>_32bit): Likewise.
            (vsx_splat_v8hi_xxspltiw): New insn.
            (vsx_splat_v4si_xxspltiw): New insn.
            (vsx_splat_v4sf_xxspltiw): New insn.
    
    gcc/testsuite/
            * gcc.target/powerpc/vec-splat-constant-v16qi.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4si.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v8hi.c: New test.
            * gcc.target/powerpc/vec-splati-runnable.c: Update insn counts.

Diff:
---
 gcc/config/rs6000/constraints.md                   |   5 +
 gcc/config/rs6000/predicates.md                    |  13 ++
 gcc/config/rs6000/rs6000-protos.h                  |   1 +
 gcc/config/rs6000/rs6000.c                         | 174 +++++++++++++++++++++
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           |  74 +++++++--
 .../gcc.target/powerpc/vec-splat-constant-v16qi.c  |  27 ++++
 .../gcc.target/powerpc/vec-splat-constant-v4sf.c   |  67 ++++++++
 .../gcc.target/powerpc/vec-splat-constant-v4si.c   |  51 ++++++
 .../gcc.target/powerpc/vec-splat-constant-v8hi.c   |  62 ++++++++
 .../gcc.target/powerpc/vec-splati-runnable.c       |   4 +-
 11 files changed, 465 insertions(+), 17 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index 4637345f84b..82fecca4a91 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -218,6 +218,11 @@
   "A signed 34-bit integer constant if prefixed instructions are supported."
   (match_operand 0 "cint34_operand"))
 
+;; Vector constant that can be loaded with XXSPLTIW
+(define_constraint "eW"
+  "A vector constant that can be loaded with the XXSPLTIW instruction."
+  (match_operand 0 "xxspltiw_operand"))
+
 ;; KF/TF scalar than can be loaded with LXVKQ
 (define_constraint "eQ"
   "An IEEE 128-bit constant that can be loaded with the LXVKQ instruction."
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 1a87272604b..b601f73600f 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -650,6 +650,16 @@
   return num_insns == 1;
 })
 
+;; Return 1 if the operand is a CONST_VECTOR that can be loaded with the
+;; XXSPLTIW instruction.
+(define_predicate "xxspltiw_operand"
+  (match_code "const_vector")
+{
+  HOST_WIDE_INT xxspltiw_value = 0;
+
+  return xxspltiw_constant_p (op, mode, &xxspltiw_value);
+})
+
 ;; Return 1 if operand is a SF/DF CONST_DOUBLE or V2DF CONST_VECTOR that can be
 ;; loaded via the ISA 3.1 XXSPLTIDP instruction.
 (define_predicate "xxspltidp_operand"
@@ -681,6 +691,9 @@
       if (zero_constant (op, mode) || all_ones_constant (op, mode))
 	return true;
 
+      if (xxspltiw_operand (op, mode))
+	return true;
+
       if (xxspltidp_operand (op, mode))
 	return true;
 
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index de1c1ee9a8b..181d20d7e05 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -33,6 +33,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, int, int, int,
 extern int easy_altivec_constant (rtx, machine_mode);
 extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
 extern bool xxspltidp_constant_p (rtx, machine_mode, HOST_WIDE_INT *);
+extern bool xxspltiw_constant_p (rtx, machine_mode, HOST_WIDE_INT *);
 extern bool lxvkq_constant_p (rtx, machine_mode, int *);
 extern int vspltis_shifted (rtx);
 extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 4f12f9eb968..16c225a604f 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6526,6 +6526,10 @@ xxspltib_constant_p (rtx op,
   else if (IN_RANGE (value, -1, 0))
     *num_insns_ptr = 1;
 
+  /* See if we could generate XXSPLTIW directly.  */
+  else if (xxspltiw_operand (op, mode))
+    return false;
+
   else
     *num_insns_ptr = 2;
 
@@ -6533,6 +6537,164 @@ xxspltib_constant_p (rtx op,
   return true;
 }
 
+/* Return true if the argument is a constant vector where all elements are the
+   same.  */
+
+static bool
+const_vector_all_elements_equal_p (rtx op, machine_mode mode)
+{
+  if (!CONST_VECTOR_P (op))
+    return false;
+
+  rtx element = CONST_VECTOR_ELT (op, 0);
+  if (!CONST_INT_P (element) && !CONST_DOUBLE_P (element))
+    return false;
+
+  for (size_t i = 1; i < GET_MODE_NUNITS (mode); i++)
+    if (!rtx_equal_p (element, CONST_VECTOR_ELT (op, i)))
+      return false;
+
+  return true;
+}
+
+/* Return true if OP is of the given MODE and can be synthesized with ISA 3.1
+   XXSPLTIW instruction.
+
+   Return the constant via CONSTANT_PTR to use in the XXSPLTIW instruction.
+   The assembler does not like negative numbers for XXSPLTIW, so we need to
+   return a 16-bit unsigned value.  */
+
+bool
+xxspltiw_constant_p (rtx op,
+		     machine_mode mode,
+		     HOST_WIDE_INT *constant_ptr)
+{
+  HOST_WIDE_INT value;
+
+  *constant_ptr = 0;
+
+  if (!TARGET_PREFIXED || !TARGET_VSX || !TARGET_XXSPLTIW)
+    return false;
+
+  if (!CONST_VECTOR_P (op))
+    return true;
+
+  rtx element0 = CONST_VECTOR_ELT (op, 0);
+
+  switch (mode)
+    {
+      /* V4SImode constant vectors that have the same element are can be used
+	 with XXSPLTIW.  */
+    case V4SImode:
+      if (!const_vector_all_elements_equal_p (op, mode))
+	return false;
+
+      /* Don't return true if we can use the shorter vspltisw instruction.  */
+      value = INTVAL (element0);
+      if (EASY_VECTOR_15 (value))
+	return false;
+
+      *constant_ptr = value & 0xffffffff;
+      return true;
+
+      /* V4SFmode constant vectors that have the same element are
+	 can be used with XXSPLTIW.  */
+    case V4SFmode:
+      if (!const_vector_all_elements_equal_p (op, mode))
+	return false;
+
+      /* Don't return true for 0.0f, since that can be created with
+	 xxspltib or xxlxor.  */
+      if (element0 == CONST0_RTX (SFmode))
+	return false;
+
+      value = rs6000_const_f32_to_i32 (element0);
+      *constant_ptr = value & 0xffffffff;
+      return true;
+
+      /* V8Hmode constant vectors that have the same element are can be used
+	 with XXSPLTIW.  */
+    case V8HImode:
+      if (const_vector_all_elements_equal_p (op, mode))
+	{
+	  /* Don't return true if we can use the shorter vspltish instruction.  */
+	  value = INTVAL (element0);
+	  if (EASY_VECTOR_15 (value))
+	    return false;
+
+	  value &= 0xffff;
+	  *constant_ptr = (value << 16) | value;
+	  return true;
+	}
+
+      else
+	{
+	  /* Check if all even elements are the same and all odd elements are
+	     the same.  */
+	  rtx element1 = CONST_VECTOR_ELT (op, 1);
+
+	  if (!CONST_INT_P (element1))
+	    return false;
+
+	  for (size_t i = 2; i < GET_MODE_NUNITS (V8HImode); i += 2)
+	    if (!rtx_equal_p (element0, CONST_VECTOR_ELT (op, i))
+		|| !rtx_equal_p (element1, CONST_VECTOR_ELT (op, i + 1)))
+	      return false;
+
+	  HOST_WIDE_INT value0 = INTVAL (element0) & 0xffff;
+	  HOST_WIDE_INT value1 = INTVAL (element1) & 0xffff;
+
+	  if (!BYTES_BIG_ENDIAN)
+	    std::swap (value0, value1);
+
+	  *constant_ptr = (value0 << 16) | value1;
+	  return true;
+	}
+
+      /* V16QI constant vectors that have the first four elements identical to
+	 the next set of 4 elements, and so forth can generate XXSPLTIW.  */
+    case V16QImode:
+	{
+	  if (xxspltib_constant_nosplit (op, mode))
+	    return false;
+
+	  rtx element1 = CONST_VECTOR_ELT (op, 1);
+	  rtx element2 = CONST_VECTOR_ELT (op, 2);
+	  rtx element3 = CONST_VECTOR_ELT (op, 3);
+
+	  if (!CONST_INT_P (element0) || !CONST_INT_P (element1)
+	      || !CONST_INT_P (element2) || !CONST_INT_P (element3))
+	    return false;
+
+	  for (size_t i = 4; i < GET_MODE_NUNITS (V16QImode); i += 4)
+	    if (!rtx_equal_p (element0, CONST_VECTOR_ELT (op, i))
+		|| !rtx_equal_p (element1, CONST_VECTOR_ELT (op, i + 1))
+		|| !rtx_equal_p (element2, CONST_VECTOR_ELT (op, i + 2))
+		|| !rtx_equal_p (element3, CONST_VECTOR_ELT (op, i + 3)))
+	      return false;
+
+	  HOST_WIDE_INT value0 = INTVAL (element0) & 0xff;
+	  HOST_WIDE_INT value1 = INTVAL (element1) & 0xff;
+	  HOST_WIDE_INT value2 = INTVAL (element2) & 0xff;
+	  HOST_WIDE_INT value3 = INTVAL (element3) & 0xff;
+
+	  if (BYTES_BIG_ENDIAN)
+	    *constant_ptr = ((value0 << 24) | (value1 << 16) | (value2 << 8)
+			     | value3);
+	  else
+	    *constant_ptr = ((value3 << 24) | (value2 << 16) | (value1 << 8)
+			     | value0);
+
+	  return true;
+	}
+
+    default:
+      break;
+    }
+
+  return false;
+}
+
 /* Return true if OP is of the given MODE and can be synthesized with ISA 3.1
    XXSPLTIDP instruction.
 
@@ -6722,6 +6884,7 @@ output_vec_const_move (rtx *operands)
     {
       bool dest_vmx_p = ALTIVEC_REGNO_P (REGNO (dest));
       int xxspltib_value = 256;
+      HOST_WIDE_INT xxspltiw_value = 0;
       HOST_WIDE_INT xxspltidp_value = 0;
       int num_insns = -1;
       int lxvkq_immediate = 0;
@@ -6753,6 +6916,12 @@ output_vec_const_move (rtx *operands)
 	    gcc_unreachable ();
 	}
 
+      if (xxspltiw_constant_p (vec, mode, &xxspltiw_value))
+	{
+	  operands[2] = GEN_INT (xxspltiw_value);
+	  return "xxspltiw %x0,%2";
+	}
+
       if (xxspltidp_constant_p (vec, mode, &xxspltidp_value))
 	{
 	  operands[2] = GEN_INT (xxspltidp_value);
@@ -26435,6 +26604,11 @@ prefixed_permute_p (rtx_insn *insn)
 
   switch (mode)
     {
+    case V8HImode:
+    case V4SImode:
+    case V4SFmode:
+      return xxspltiw_operand (src, mode);
+
     case DFmode:
     case SFmode:
     case V2DFmode:
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 016b19e5d2d..ebdf7cd036d 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -644,6 +644,10 @@ mxxspltidp
 Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
 Generate (do not generate) XXSPLTIDP instructions.
 
+mxxspltiw
+Target Undocumented Var(TARGET_XXSPLTIW) Init(1) Save
+Generate (do not generate) XXSPLTIW instructions.
+
 mlxvkq
 Target Undocumented Var(TARGET_LXVKQ) Init(1) Save
 Generate (do not generate) LXVKQ instructions.
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 00f6d0eda14..4f716a9f2d2 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1191,19 +1191,19 @@
 ;; instruction). But generate XXLXOR/XXLORC if it will avoid a register move.
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
-;;		XXSPLTIDP  LXVKQ
+;;		XXSPLTIDP  LXVKQ      XXSPLTIW
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        r,         we,        ?wQ,
-                wa,        wa,
+                wa,        wa,        wa,
                 ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
                 ?wa,       v,         <??r>,     wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        we,        r,         r,
-                eF,        eQ,
+                eF,        eQ,        eW,
                 wQ,        Y,         r,         r,         wE,        jwM,
                 ?jwM,      W,         <nW>,      v,         wZ"))]
 
@@ -1215,44 +1215,44 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
-                vecperm,   vecperm,
+                vecperm,   vecperm,   vecperm,
                 store,     load,      store,     *,         vecsimple, vecsimple,
                 vecsimple, *,         *,         vecstore,  vecload")
    (set_attr "num_insns"
                "*,         *,         *,         2,         *,         2,
-                *,         *,
+                *,         *,         *,
                 2,         2,         2,         2,         *,         *,
                 *,         5,         2,         *,         *")
    (set_attr "max_prefixed_insns"
                "*,         *,         *,         *,         *,         2,
-                *,         *,
+                *,         *,         *,
                 2,         2,         2,         2,         *,         *,
                 *,         *,         *,         *,         *")
    (set_attr "length"
                "*,         *,         *,         8,         *,         8,
-                *,         *,
+                *,         *,         *,
                 8,         8,         8,         8,         *,         *,
                 *,         20,        8,         *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
-                p10,       p10,
+                p10,       p10,       p10,
                 *,         *,         *,         *,         p9v,       *,
                 <VSisa>,   *,         *,         *,         *")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
-;;		XXSPLTIDP  LXVKQ
+;;		XXSPLTIDP  LXVKQ      XXSPLTIW
 ;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
-                wa,        wa,
+                wa,        wa,        wa,
                 wa,        v,         ?wa,       v,         <??r>,
                 wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        Y,         r,         r,
-                eF,        eQ,
+                eF,        eQ,        eW,
                 wE,        jwM,       ?jwM,      W,         <nW>,
                 v,         wZ"))]
 
@@ -1264,17 +1264,17 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, load,      store,    *,
-                vecperm,   vecperm,
+                vecperm,   vecperm,   vecperm,
                 vecsimple, vecsimple, vecsimple, *,         *,
                 vecstore,  vecload")
    (set_attr "length"
                "*,         *,         *,         16,        16,        16,
-                *,         *,
+                *,         *,         *,
                 *,         *,         *,         20,        16,
                 *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
-                p10,       p10,
+                p10,       p10,       p10,
                 p9v,       *,         <VSisa>,   *,         *,
                 *,         *")])
 
@@ -4666,6 +4666,52 @@
    (set_attr "length" "*,8,*")
    (set_attr "isa" "*,p8v,*")])
 
+;; V8HI/V4SI/V4SF splat immediate constant with XXSPLTIW.  We don't need to add
+;; V16QI since the xxspltib instruction already handles this case.
+(define_insn "*vsx_splat_v8hi_xxspltiw"
+  [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa")
+	(vec_duplicate:V8HI (match_operand 1 "const_int_operand" "n")))]
+  "TARGET_PREFIXED && TARGET_VSX && TARGET_XXSPLTIW
+   && !s5bit_cint_operand (operands[1], VOIDmode)"
+{
+  HOST_WIDE_INT value = INTVAL (operands[1]) & 0xffff;
+
+  operands[2] = GEN_INT ((value << 16) | value);
+  return "xxspltiw %x0,%2";
+}
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
+(define_insn "*vsx_splat_v4si_xxspltiw"
+  [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa")
+	(vec_duplicate:V4SI (match_operand 1 "const_int_operand" "n")))]
+  "TARGET_PREFIXED && TARGET_VSX && TARGET_XXSPLTIW
+   && !s5bit_cint_operand (operands[1], VOIDmode)"
+{
+  /* The assembler doesn't like negative numbers.  */
+  operands[2] = GEN_INT (INTVAL (operands[1]) & 0xffffffff);
+  return "xxspltiw %x0,%2";
+}
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
+(define_insn "*vsx_splat_v4sf_xxspltiw"
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
+	(vec_duplicate:V4SF
+	 (match_operand 1 "const_double_operand" "j,F")))]
+  "TARGET_PREFIXED && TARGET_VSX && TARGET_XXSPLTIW"
+{
+  if (operands[1] == CONST0_RTX (V4SFmode))
+    return "xxlxor %x0,%x0,%x0";
+
+  /* The assembler doesn't like negative numbers.  */
+  long value = rs6000_const_f32_to_i32 (operands[1]);
+  operands[2] = GEN_INT (value & 0xffffffff);
+  return "xxspltiw %x0,%2";
+}
+  [(set_attr "type" "vecsimple,vecperm")
+   (set_attr "prefixed" "*,yes")])
+
 ;; V4SF/V4SI splat from a vector element
 (define_insn "vsx_xxspltw_<mode>"
   [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
new file mode 100644
index 00000000000..27764ddbc83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V16HI vector constants where the
+   first 4 elements are the same as the next 4 elements, etc.  */
+
+vector unsigned char
+v16qi_const_1 (void)
+{
+  return (vector unsigned char) { 1, 1, 1, 1, 1, 1, 1, 1,
+				  1, 1, 1, 1, 1, 1, 1, 1, }; /* VSLTPISB.  */
+}
+
+vector unsigned char
+v16qi_const_2 (void)
+{
+  return (vector unsigned char) { 1, 2, 3, 4, 1, 2, 3, 4,
+				  1, 2, 3, 4, 1, 2, 3, 4, }; /* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}              1 } } */
+/* { dg-final { scan-assembler-times {\mvspltisb\M|\mxxspltib\M} 1 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}                   } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}                    } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
new file mode 100644
index 00000000000..1f0475cf47a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
@@ -0,0 +1,67 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SF vector constants.  */
+
+vector float
+v4sf_const_1 (void)
+{
+  return (vector float) { 1.0f, 1.0f, 1.0f, 1.0f };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_nan (void)
+{
+  return (vector float) { __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf ("") };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_inf (void)
+{
+  return (vector float) { __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff () };		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_m0 (void)
+{
+  return (vector float) { -0.0f, -0.0f, -0.0f, -0.0f };	/* XXSPLTIB/VSLW.  */
+}
+
+vector float
+v4sf_splats_1 (void)
+{
+  return vec_splats (1.0f);				/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_nan (void)
+{
+  return vec_splats (__builtin_nanf (""));		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_inf (void)
+{
+  return vec_splats (__builtin_inff ());		/* XXSPLTIW.  */
+}
+
+vector float
+v8hi_splats_m0 (void)
+{
+  return vec_splats (-0.0f);				/* XXSPLTIB/VSLW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 6 } } */
+/* { dg-final { scan-assembler-times {\mxxspltib\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvslw\M}     2 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}      } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}       } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
new file mode 100644
index 00000000000..02d0c6d66a2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VEXTSB2W) is not done.  */
+
+vector int
+v4si_const_1 (void)
+{
+  return (vector int) { 1, 1, 1, 1 };			/* VSLTPISW.  */
+}
+
+vector int
+v4si_const_126 (void)
+{
+  return (vector int) { 126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector int
+v4si_const_1023 (void)
+{
+  return (vector int) { 1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector int
+v4si_splats_1 (void)
+{
+  return vec_splats (1);				/* VSLTPISW.  */
+}
+
+vector int
+v4si_splats_126 (void)
+{
+  return vec_splats (126);				/* XXSPLTIW.  */
+}
+
+vector int
+v8hi_splats_1023 (void)
+{
+  return vec_splats (1023);				/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mvspltisw\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvextsb2w\M}    } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
new file mode 100644
index 00000000000..59418d3bb0a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
@@ -0,0 +1,62 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V8HI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VUPKLSB) is not done.  */
+
+vector short
+v8hi_const_1 (void)
+{
+  return (vector short) { 1, 1, 1, 1, 1, 1, 1, 1 };	/* VSLTPISH.  */
+}
+
+vector short
+v8hi_const_126 (void)
+{
+  return (vector short) { 126, 126, 126, 126,
+			  126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_const_1023 (void)
+{
+  return (vector short) { 1023, 1023, 1023, 1023,
+			  1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1 (void)
+{
+  return vec_splats ((short)1);				/* VSLTPISH.  */
+}
+
+vector short
+v8hi_splats_126 (void)
+{
+  return vec_splats ((short)126);			/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1023 (void)
+{
+  return vec_splats ((short)1023);			/* XXSPLTIW.  */
+}
+
+/* Test that we can optimiza V8HI where all of the even elements are the same
+   and all of the odd elements are the same.  */
+vector short
+v8hi_const_1023_1000 (void)
+{
+  return (vector short) { 1023, 1000, 1023, 1000,
+			  1023, 1000, 1023, 1000 };	/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  5 } } */
+/* { dg-final { scan-assembler-times {\mvspltish\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvupklsb\M}     } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
index a135279b1d7..6ed60dfcb98 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -149,8 +149,6 @@ main (int argc, char *argv [])
   return 0;
 }
 
-/* { dg-final { scan-assembler-times {\mxxspltiw\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */
-
-


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [gcc(refs/users/meissner/heads/work066)] Generate XXSPLTIW on power10.
@ 2021-08-25  1:10 Michael Meissner
  0 siblings, 0 replies; 3+ messages in thread
From: Michael Meissner @ 2021-08-25  1:10 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:14a1c6fb85f8824bf1a0cf18a2034b461ce0669e

commit 14a1c6fb85f8824bf1a0cf18a2034b461ce0669e
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Tue Aug 24 21:09:39 2021 -0400

    Generate XXSPLTIW on power10.
    
    This patch adds support to automatically generate the ISA 3.1 XXSPLTIW
    instruction for V8HImode, V4SImode, and V4SFmode vectors.  It does this by
    adding support for vector constants that can be used, and adding a
    VEC_DUPLICATE pattern to generate the actual XXSPLTIW instruction.
    
    I rewrote the XXSPLTW built-in functions to use VEC_DUPLICATE instead of
    UNSPEC.
    
    This patch also updates the insn counts in the vec-splati-runnable.c test to
    work with the new option to use XXSPLTIW to load up some vector constants.
    
    I added 4 new tests to test loading up V16QI, V8HI, V4SI, and V4SF vector
    constants.
    
    2021-08-24  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
            * config/rs6000/constraints.md (eW): New constraint.
            * config/rs6000/predicates.md (xxspltiw_operand): New predicate.
            (easy_vector_constant): If we can use XXSPLTIW, the vector
            constant is easy.
            * config/rs6000/rs6000-protos.h (xxspltiw_constant_p): New
            declaration.
            * config/rs6000/rs6000.c (xxspltib_constant_p): If we can generate
            XXSPLTIW, don't generate a XXSPLTIB and an extend instruction.
            (const_vector_all_elements_equal_p): New function.
            (xxspltiw_constant_p): New function.
            (output_vec_const_move): Add support for loading up vector
            constants with XXSPLTIW.
            (prefixed_permute_p): Recognize xxspltiw instructions as
            prefixed.
            * config/rs6000/rs6000.opt (-mxxspltiw): New debug switch.
            * config/rs6000/vsx.md (vsx_mov<mode>_64bit): Add support for
            constants loaded with XXSPLTIW.
            (vsx_mov<mode>_32bit): Likewise.
            (vsx_splat_v8hi_xxspltiw): New insn.
            (vsx_splat_v4si_xxspltiw): New insn.
            (vsx_splat_v4sf_xxspltiw): New insn.
    
    gcc/testsuite/
            * gcc.target/powerpc/vec-splat-constant-v16qi.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4si.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v8hi.c: New test.
            * gcc.target/powerpc/vec-splati-runnable.c: Update insn counts.

Diff:
---
 gcc/config/rs6000/constraints.md                   |   5 +
 gcc/config/rs6000/predicates.md                    |  13 ++
 gcc/config/rs6000/rs6000-protos.h                  |   1 +
 gcc/config/rs6000/rs6000.c                         | 174 +++++++++++++++++++++
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           |  74 +++++++--
 .../gcc.target/powerpc/vec-splat-constant-v16qi.c  |  27 ++++
 .../gcc.target/powerpc/vec-splat-constant-v4sf.c   |  67 ++++++++
 .../gcc.target/powerpc/vec-splat-constant-v4si.c   |  51 ++++++
 .../gcc.target/powerpc/vec-splat-constant-v8hi.c   |  62 ++++++++
 .../gcc.target/powerpc/vec-splati-runnable.c       |   4 +-
 11 files changed, 465 insertions(+), 17 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index 4637345f84b..82fecca4a91 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -218,6 +218,11 @@
   "A signed 34-bit integer constant if prefixed instructions are supported."
   (match_operand 0 "cint34_operand"))
 
+;; Vector constant that can be loaded with XXSPLTIW
+(define_constraint "eW"
+  "A vector constant that can be loaded with the XXSPLTIW instruction."
+  (match_operand 0 "xxspltiw_operand"))
+
 ;; KF/TF scalar than can be loaded with LXVKQ
 (define_constraint "eQ"
   "An IEEE 128-bit constant that can be loaded with the LXVKQ instruction."
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 1a87272604b..b601f73600f 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -650,6 +650,16 @@
   return num_insns == 1;
 })
 
+;; Return 1 if the operand is a CONST_VECTOR that can be loaded with the
+;; XXSPLTIW instruction.
+(define_predicate "xxspltiw_operand"
+  (match_code "const_vector")
+{
+  HOST_WIDE_INT xxspltiw_value = 0;
+
+  return xxspltiw_constant_p (op, mode, &xxspltiw_value);
+})
+
 ;; Return 1 if operand is a SF/DF CONST_DOUBLE or V2DF CONST_VECTOR that can be
 ;; loaded via the ISA 3.1 XXSPLTIDP instruction.
 (define_predicate "xxspltidp_operand"
@@ -681,6 +691,9 @@
       if (zero_constant (op, mode) || all_ones_constant (op, mode))
 	return true;
 
+      if (xxspltiw_operand (op, mode))
+	return true;
+
       if (xxspltidp_operand (op, mode))
 	return true;
 
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index de1c1ee9a8b..181d20d7e05 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -33,6 +33,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, int, int, int,
 extern int easy_altivec_constant (rtx, machine_mode);
 extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
 extern bool xxspltidp_constant_p (rtx, machine_mode, HOST_WIDE_INT *);
+extern bool xxspltiw_constant_p (rtx, machine_mode, HOST_WIDE_INT *);
 extern bool lxvkq_constant_p (rtx, machine_mode, int *);
 extern int vspltis_shifted (rtx);
 extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 4f12f9eb968..16c225a604f 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6526,6 +6526,10 @@ xxspltib_constant_p (rtx op,
   else if (IN_RANGE (value, -1, 0))
     *num_insns_ptr = 1;
 
+  /* See if we could generate XXSPLTIW directly.  */
+  else if (xxspltiw_operand (op, mode))
+    return false;
+
   else
     *num_insns_ptr = 2;
 
@@ -6533,6 +6537,164 @@ xxspltib_constant_p (rtx op,
   return true;
 }
 
+/* Return true if the argument is a constant vector where all elements are the
+   same.  */
+
+static bool
+const_vector_all_elements_equal_p (rtx op, machine_mode mode)
+{
+  if (!CONST_VECTOR_P (op))
+    return false;
+
+  rtx element = CONST_VECTOR_ELT (op, 0);
+  if (!CONST_INT_P (element) && !CONST_DOUBLE_P (element))
+    return false;
+
+  for (size_t i = 1; i < GET_MODE_NUNITS (mode); i++)
+    if (!rtx_equal_p (element, CONST_VECTOR_ELT (op, i)))
+      return false;
+
+  return true;
+}
+
+/* Return true if OP is of the given MODE and can be synthesized with ISA 3.1
+   XXSPLTIW instruction.
+
+   Return the constant via CONSTANT_PTR to use in the XXSPLTIW instruction.
+   The assembler does not like negative numbers for XXSPLTIW, so we need to
+   return a 16-bit unsigned value.  */
+
+bool
+xxspltiw_constant_p (rtx op,
+		     machine_mode mode,
+		     HOST_WIDE_INT *constant_ptr)
+{
+  HOST_WIDE_INT value;
+
+  *constant_ptr = 0;
+
+  if (!TARGET_PREFIXED || !TARGET_VSX || !TARGET_XXSPLTIW)
+    return false;
+
+  if (!CONST_VECTOR_P (op))
+    return true;
+
+  rtx element0 = CONST_VECTOR_ELT (op, 0);
+
+  switch (mode)
+    {
+      /* V4SImode constant vectors that have the same element are can be used
+	 with XXSPLTIW.  */
+    case V4SImode:
+      if (!const_vector_all_elements_equal_p (op, mode))
+	return false;
+
+      /* Don't return true if we can use the shorter vspltisw instruction.  */
+      value = INTVAL (element0);
+      if (EASY_VECTOR_15 (value))
+	return false;
+
+      *constant_ptr = value & 0xffffffff;
+      return true;
+
+      /* V4SFmode constant vectors that have the same element are
+	 can be used with XXSPLTIW.  */
+    case V4SFmode:
+      if (!const_vector_all_elements_equal_p (op, mode))
+	return false;
+
+      /* Don't return true for 0.0f, since that can be created with
+	 xxspltib or xxlxor.  */
+      if (element0 == CONST0_RTX (SFmode))
+	return false;
+
+      value = rs6000_const_f32_to_i32 (element0);
+      *constant_ptr = value & 0xffffffff;
+      return true;
+
+      /* V8Hmode constant vectors that have the same element are can be used
+	 with XXSPLTIW.  */
+    case V8HImode:
+      if (const_vector_all_elements_equal_p (op, mode))
+	{
+	  /* Don't return true if we can use the shorter vspltish instruction.  */
+	  value = INTVAL (element0);
+	  if (EASY_VECTOR_15 (value))
+	    return false;
+
+	  value &= 0xffff;
+	  *constant_ptr = (value << 16) | value;
+	  return true;
+	}
+
+      else
+	{
+	  /* Check if all even elements are the same and all odd elements are
+	     the same.  */
+	  rtx element1 = CONST_VECTOR_ELT (op, 1);
+
+	  if (!CONST_INT_P (element1))
+	    return false;
+
+	  for (size_t i = 2; i < GET_MODE_NUNITS (V8HImode); i += 2)
+	    if (!rtx_equal_p (element0, CONST_VECTOR_ELT (op, i))
+		|| !rtx_equal_p (element1, CONST_VECTOR_ELT (op, i + 1)))
+	      return false;
+
+	  HOST_WIDE_INT value0 = INTVAL (element0) & 0xffff;
+	  HOST_WIDE_INT value1 = INTVAL (element1) & 0xffff;
+
+	  if (!BYTES_BIG_ENDIAN)
+	    std::swap (value0, value1);
+
+	  *constant_ptr = (value0 << 16) | value1;
+	  return true;
+	}
+
+      /* V16QI constant vectors that have the first four elements identical to
+	 the next set of 4 elements, and so forth can generate XXSPLTIW.  */
+    case V16QImode:
+	{
+	  if (xxspltib_constant_nosplit (op, mode))
+	    return false;
+
+	  rtx element1 = CONST_VECTOR_ELT (op, 1);
+	  rtx element2 = CONST_VECTOR_ELT (op, 2);
+	  rtx element3 = CONST_VECTOR_ELT (op, 3);
+
+	  if (!CONST_INT_P (element0) || !CONST_INT_P (element1)
+	      || !CONST_INT_P (element2) || !CONST_INT_P (element3))
+	    return false;
+
+	  for (size_t i = 4; i < GET_MODE_NUNITS (V16QImode); i += 4)
+	    if (!rtx_equal_p (element0, CONST_VECTOR_ELT (op, i))
+		|| !rtx_equal_p (element1, CONST_VECTOR_ELT (op, i + 1))
+		|| !rtx_equal_p (element2, CONST_VECTOR_ELT (op, i + 2))
+		|| !rtx_equal_p (element3, CONST_VECTOR_ELT (op, i + 3)))
+	      return false;
+
+	  HOST_WIDE_INT value0 = INTVAL (element0) & 0xff;
+	  HOST_WIDE_INT value1 = INTVAL (element1) & 0xff;
+	  HOST_WIDE_INT value2 = INTVAL (element2) & 0xff;
+	  HOST_WIDE_INT value3 = INTVAL (element3) & 0xff;
+
+	  if (BYTES_BIG_ENDIAN)
+	    *constant_ptr = ((value0 << 24) | (value1 << 16) | (value2 << 8)
+			     | value3);
+	  else
+	    *constant_ptr = ((value3 << 24) | (value2 << 16) | (value1 << 8)
+			     | value0);
+
+	  return true;
+	}
+
+    default:
+      break;
+    }
+
+  return false;
+}
+
 /* Return true if OP is of the given MODE and can be synthesized with ISA 3.1
    XXSPLTIDP instruction.
 
@@ -6722,6 +6884,7 @@ output_vec_const_move (rtx *operands)
     {
       bool dest_vmx_p = ALTIVEC_REGNO_P (REGNO (dest));
       int xxspltib_value = 256;
+      HOST_WIDE_INT xxspltiw_value = 0;
       HOST_WIDE_INT xxspltidp_value = 0;
       int num_insns = -1;
       int lxvkq_immediate = 0;
@@ -6753,6 +6916,12 @@ output_vec_const_move (rtx *operands)
 	    gcc_unreachable ();
 	}
 
+      if (xxspltiw_constant_p (vec, mode, &xxspltiw_value))
+	{
+	  operands[2] = GEN_INT (xxspltiw_value);
+	  return "xxspltiw %x0,%2";
+	}
+
       if (xxspltidp_constant_p (vec, mode, &xxspltidp_value))
 	{
 	  operands[2] = GEN_INT (xxspltidp_value);
@@ -26435,6 +26604,11 @@ prefixed_permute_p (rtx_insn *insn)
 
   switch (mode)
     {
+    case V8HImode:
+    case V4SImode:
+    case V4SFmode:
+      return xxspltiw_operand (src, mode);
+
     case DFmode:
     case SFmode:
     case V2DFmode:
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 016b19e5d2d..ebdf7cd036d 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -644,6 +644,10 @@ mxxspltidp
 Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
 Generate (do not generate) XXSPLTIDP instructions.
 
+mxxspltiw
+Target Undocumented Var(TARGET_XXSPLTIW) Init(1) Save
+Generate (do not generate) XXSPLTIW instructions.
+
 mlxvkq
 Target Undocumented Var(TARGET_LXVKQ) Init(1) Save
 Generate (do not generate) LXVKQ instructions.
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 00f6d0eda14..4f716a9f2d2 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1191,19 +1191,19 @@
 ;; instruction). But generate XXLXOR/XXLORC if it will avoid a register move.
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
-;;		XXSPLTIDP  LXVKQ
+;;		XXSPLTIDP  LXVKQ      XXSPLTIW
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        r,         we,        ?wQ,
-                wa,        wa,
+                wa,        wa,        wa,
                 ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
                 ?wa,       v,         <??r>,     wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        we,        r,         r,
-                eF,        eQ,
+                eF,        eQ,        eW,
                 wQ,        Y,         r,         r,         wE,        jwM,
                 ?jwM,      W,         <nW>,      v,         wZ"))]
 
@@ -1215,44 +1215,44 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
-                vecperm,   vecperm,
+                vecperm,   vecperm,   vecperm,
                 store,     load,      store,     *,         vecsimple, vecsimple,
                 vecsimple, *,         *,         vecstore,  vecload")
    (set_attr "num_insns"
                "*,         *,         *,         2,         *,         2,
-                *,         *,
+                *,         *,         *,
                 2,         2,         2,         2,         *,         *,
                 *,         5,         2,         *,         *")
    (set_attr "max_prefixed_insns"
                "*,         *,         *,         *,         *,         2,
-                *,         *,
+                *,         *,         *,
                 2,         2,         2,         2,         *,         *,
                 *,         *,         *,         *,         *")
    (set_attr "length"
                "*,         *,         *,         8,         *,         8,
-                *,         *,
+                *,         *,         *,
                 8,         8,         8,         8,         *,         *,
                 *,         20,        8,         *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
-                p10,       p10,
+                p10,       p10,       p10,
                 *,         *,         *,         *,         p9v,       *,
                 <VSisa>,   *,         *,         *,         *")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
-;;		XXSPLTIDP  LXVKQ
+;;		XXSPLTIDP  LXVKQ      XXSPLTIW
 ;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
-                wa,        wa,
+                wa,        wa,        wa,
                 wa,        v,         ?wa,       v,         <??r>,
                 wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        Y,         r,         r,
-                eF,        eQ,
+                eF,        eQ,        eW,
                 wE,        jwM,       ?jwM,      W,         <nW>,
                 v,         wZ"))]
 
@@ -1264,17 +1264,17 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, load,      store,    *,
-                vecperm,   vecperm,
+                vecperm,   vecperm,   vecperm,
                 vecsimple, vecsimple, vecsimple, *,         *,
                 vecstore,  vecload")
    (set_attr "length"
                "*,         *,         *,         16,        16,        16,
-                *,         *,
+                *,         *,         *,
                 *,         *,         *,         20,        16,
                 *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
-                p10,       p10,
+                p10,       p10,       p10,
                 p9v,       *,         <VSisa>,   *,         *,
                 *,         *")])
 
@@ -4666,6 +4666,52 @@
    (set_attr "length" "*,8,*")
    (set_attr "isa" "*,p8v,*")])
 
+;; V8HI/V4SI/V4SF splat immediate constant with XXSPLTIW.  We don't need to add
+;; V16QI since the xxspltib instruction already handles this case.
+(define_insn "*vsx_splat_v8hi_xxspltiw"
+  [(set (match_operand:V8HI 0 "vsx_register_operand" "=wa")
+	(vec_duplicate:V8HI (match_operand 1 "const_int_operand" "n")))]
+  "TARGET_PREFIXED && TARGET_VSX && TARGET_XXSPLTIW
+   && !s5bit_cint_operand (operands[1], VOIDmode)"
+{
+  HOST_WIDE_INT value = INTVAL (operands[1]) & 0xffff;
+
+  operands[2] = GEN_INT ((value << 16) | value);
+  return "xxspltiw %x0,%2";
+}
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
+(define_insn "*vsx_splat_v4si_xxspltiw"
+  [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa")
+	(vec_duplicate:V4SI (match_operand 1 "const_int_operand" "n")))]
+  "TARGET_PREFIXED && TARGET_VSX && TARGET_XXSPLTIW
+   && !s5bit_cint_operand (operands[1], VOIDmode)"
+{
+  /* The assembler doesn't like negative numbers.  */
+  operands[2] = GEN_INT (INTVAL (operands[1]) & 0xffffffff);
+  return "xxspltiw %x0,%2";
+}
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
+(define_insn "*vsx_splat_v4sf_xxspltiw"
+  [(set (match_operand:V4SF 0 "vsx_register_operand" "=wa,wa")
+	(vec_duplicate:V4SF
+	 (match_operand 1 "const_double_operand" "j,F")))]
+  "TARGET_PREFIXED && TARGET_VSX && TARGET_XXSPLTIW"
+{
+  if (operands[1] == CONST0_RTX (V4SFmode))
+    return "xxlxor %x0,%x0,%x0";
+
+  /* The assembler doesn't like negative numbers.  */
+  long value = rs6000_const_f32_to_i32 (operands[1]);
+  operands[2] = GEN_INT (value & 0xffffffff);
+  return "xxspltiw %x0,%2";
+}
+  [(set_attr "type" "vecsimple,vecperm")
+   (set_attr "prefixed" "*,yes")])
+
 ;; V4SF/V4SI splat from a vector element
 (define_insn "vsx_xxspltw_<mode>"
   [(set (match_operand:VSX_W 0 "vsx_register_operand" "=wa")
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
new file mode 100644
index 00000000000..27764ddbc83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V16HI vector constants where the
+   first 4 elements are the same as the next 4 elements, etc.  */
+
+vector unsigned char
+v16qi_const_1 (void)
+{
+  return (vector unsigned char) { 1, 1, 1, 1, 1, 1, 1, 1,
+				  1, 1, 1, 1, 1, 1, 1, 1, }; /* VSLTPISB.  */
+}
+
+vector unsigned char
+v16qi_const_2 (void)
+{
+  return (vector unsigned char) { 1, 2, 3, 4, 1, 2, 3, 4,
+				  1, 2, 3, 4, 1, 2, 3, 4, }; /* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}              1 } } */
+/* { dg-final { scan-assembler-times {\mvspltisb\M|\mxxspltib\M} 1 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}                   } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}                    } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
new file mode 100644
index 00000000000..1f0475cf47a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
@@ -0,0 +1,67 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SF vector constants.  */
+
+vector float
+v4sf_const_1 (void)
+{
+  return (vector float) { 1.0f, 1.0f, 1.0f, 1.0f };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_nan (void)
+{
+  return (vector float) { __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf ("") };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_inf (void)
+{
+  return (vector float) { __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff () };		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_m0 (void)
+{
+  return (vector float) { -0.0f, -0.0f, -0.0f, -0.0f };	/* XXSPLTIB/VSLW.  */
+}
+
+vector float
+v4sf_splats_1 (void)
+{
+  return vec_splats (1.0f);				/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_nan (void)
+{
+  return vec_splats (__builtin_nanf (""));		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_inf (void)
+{
+  return vec_splats (__builtin_inff ());		/* XXSPLTIW.  */
+}
+
+vector float
+v8hi_splats_m0 (void)
+{
+  return vec_splats (-0.0f);				/* XXSPLTIB/VSLW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 6 } } */
+/* { dg-final { scan-assembler-times {\mxxspltib\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvslw\M}     2 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}      } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}       } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
new file mode 100644
index 00000000000..02d0c6d66a2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VEXTSB2W) is not done.  */
+
+vector int
+v4si_const_1 (void)
+{
+  return (vector int) { 1, 1, 1, 1 };			/* VSLTPISW.  */
+}
+
+vector int
+v4si_const_126 (void)
+{
+  return (vector int) { 126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector int
+v4si_const_1023 (void)
+{
+  return (vector int) { 1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector int
+v4si_splats_1 (void)
+{
+  return vec_splats (1);				/* VSLTPISW.  */
+}
+
+vector int
+v4si_splats_126 (void)
+{
+  return vec_splats (126);				/* XXSPLTIW.  */
+}
+
+vector int
+v8hi_splats_1023 (void)
+{
+  return vec_splats (1023);				/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mvspltisw\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvextsb2w\M}    } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
new file mode 100644
index 00000000000..59418d3bb0a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
@@ -0,0 +1,62 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V8HI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VUPKLSB) is not done.  */
+
+vector short
+v8hi_const_1 (void)
+{
+  return (vector short) { 1, 1, 1, 1, 1, 1, 1, 1 };	/* VSLTPISH.  */
+}
+
+vector short
+v8hi_const_126 (void)
+{
+  return (vector short) { 126, 126, 126, 126,
+			  126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_const_1023 (void)
+{
+  return (vector short) { 1023, 1023, 1023, 1023,
+			  1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1 (void)
+{
+  return vec_splats ((short)1);				/* VSLTPISH.  */
+}
+
+vector short
+v8hi_splats_126 (void)
+{
+  return vec_splats ((short)126);			/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1023 (void)
+{
+  return vec_splats ((short)1023);			/* XXSPLTIW.  */
+}
+
+/* Test that we can optimiza V8HI where all of the even elements are the same
+   and all of the odd elements are the same.  */
+vector short
+v8hi_const_1023_1000 (void)
+{
+  return (vector short) { 1023, 1000, 1023, 1000,
+			  1023, 1000, 1023, 1000 };	/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  5 } } */
+/* { dg-final { scan-assembler-times {\mvspltish\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvupklsb\M}     } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
index a135279b1d7..f49ef91422e 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -149,8 +149,6 @@ main (int argc, char *argv [])
   return 0;
 }
 
-/* { dg-final { scan-assembler-times {\mxxspltiw\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 1 } } */
 /* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
 /* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */
-
-


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-08-25  3:41 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-24 22:28 [gcc(refs/users/meissner/heads/work066)] Generate XXSPLTIW on power10 Michael Meissner
2021-08-25  1:10 Michael Meissner
2021-08-25  3:41 Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).