public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc(refs/users/meissner/heads/work071)] Generate XXSPLTIW on power10.
@ 2021-10-15 15:15 Michael Meissner
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2021-10-15 15:15 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:7b391b707007f5259712d5aa208a3525b0ad1ac5

commit 7b391b707007f5259712d5aa208a3525b0ad1ac5
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Fri Oct 15 11:15:30 2021 -0400

    Generate XXSPLTIW on power10.
    
    This patch adds support to automatically generate the ISA 3.1 XXSPLTIW
    instruction for V8HImode, V4SImode, and V4SFmode vectors.  It does this by
    adding support for vector constants that can be used, and adding a
    VEC_DUPLICATE pattern to generate the actual XXSPLTIW instruction.
    
    The eV constraint added with the XXSPLTIDP patch will also recognize use
    of the XXSPLTIW instruction.  I have not updated the eS constraint because
    right now I didn't add support to use XXSPLTIW to load SImode and HImode
    constants into vector registers.
    
    I rewrote the XXSPLTW built-in functions to use VEC_DUPLICATE instead of
    UNSPEC.
    
    I added 4 new tests to test loading up V16QI, V8HI, V4SI, and V4SF vector
    constants.
    
    2021-10-15  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/predicates.md (easy_fp_constant): Add support for
            XXSPLTIW.
            (easy_vector_constant_prefixed): Likewise.
            (easy_vector_constant): Likewise.
            * config/rs6000/rs6000-protos.h (rs6000_vec_const): Add field for
            XXSPLTIW.
            (vec_const_use_xxspltiw): New declaration.
            * config/rs6000/rs6000.c (xxspltib_constant_p): If we can generate
            XXSPLTIW, don't do XXSPLTIB and sign extend.
            (output_vec_const_move): Add support for XXSPLTIW.
            (prefixed_xxsplti_p): Recognize XXSPLTIW instructions as
            prefixed.
            (vec_const_use_xxspltiw): New function.
            * config/rs6000/rs6000.opt (-mxxspltiw): New debug switch.
            * config/rs6000/vsx.md (vsx_mov<mode>_64bit): Update comment.
            (vsx_mov<mode>_32bit): Likewise.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/vec-splat-constant-v16qi.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4si.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v8hi.c: New test.
            * gcc.target/powerpc/vec-splati-runnable.c: Update insn count.

Diff:
---
 gcc/config/rs6000/predicates.md                    |  11 +-
 gcc/config/rs6000/rs6000-protos.h                  |   2 +
 gcc/config/rs6000/rs6000.c                         | 120 +++++++++++++++++++--
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           |   4 +-
 .../gcc.target/powerpc/vec-splat-constant-v16qi.c  |  27 +++++
 .../gcc.target/powerpc/vec-splat-constant-v4sf.c   |  67 ++++++++++++
 .../gcc.target/powerpc/vec-splat-constant-v4si.c   |  51 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-v8hi.c   |  62 +++++++++++
 .../gcc.target/powerpc/vec-splati-runnable.c       |   2 +-
 10 files changed, 340 insertions(+), 10 deletions(-)

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 517ce08f03d..252abbbaf9a 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -611,6 +611,9 @@
 
       if (vec_const_use_xxspltidp (&vec_const))
 	return true;
+
+      if (vec_const_use_xxspltiw (&vec_const))
+	return true;
     }
 
   /* Otherwise consider floating point constants hard, so that the
@@ -644,7 +647,7 @@
 })
 
 ;; Return 1 if the operand is a scalar constant that can be loaded to a VSX
-;; register with one prefixed instruction, such as XXSPLTIDP.
+;; register with one prefixed instruction, such as XXSPLTIDP or XXSPLTIW.
 ;;
 ;; We have to have separate predicates and constraints for scalars and vectors,
 ;; otherwise things get messed up with TImode when you try to load very large
@@ -666,6 +669,9 @@
   if (vec_const_use_xxspltidp (&vec_const))
     return true;
 
+  if (vec_const_use_xxspltiw (&vec_const))
+    return true;
+
   return false;
 })
 
@@ -744,6 +750,9 @@
 
 	  if (vec_const_use_xxspltidp (&vec_const))
 	    return true;
+
+	  if (vec_const_use_xxspltiw (&vec_const))
+	    return true;
 	}
 
       return easy_altivec_constant (op, mode);
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 6e8b81cb134..52f094dd410 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -240,11 +240,13 @@ typedef struct {
   unsigned char bytes[VECTOR_CONST_BYTES];
   machine_mode orig_mode;		/* Original mode.  */
   unsigned int xxspltidp_immediate;	/* Immediate value for XXSPLTIDP.  */
+  unsigned int xxspltiw_immediate;	/* Immediate value for XXSPLTIW.  */
   unsigned int lxvkq_immediate;		/* Immediate to use with LXVKQ.  */
 } rs6000_vec_const;
 
 extern bool vec_const_to_bytes (rtx, machine_mode, rs6000_vec_const *);
 extern bool vec_const_use_xxspltidp (rs6000_vec_const *);
+extern bool vec_const_use_xxspltiw (rs6000_vec_const *);
 extern bool vec_const_use_lxvkq (rs6000_vec_const *);
 #endif /* RTX_CODE */
 
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index d238dd84fe7..838161fb23a 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6925,12 +6925,17 @@ xxspltib_constant_p (rtx op,
   else
     return false;
 
-  /* See if we could generate vspltisw/vspltish directly instead of xxspltib +
-     sign extend.  Special case 0/-1 to allow getting any VSX register instead
-     of an Altivec register.  */
-  if ((mode == V4SImode || mode == V8HImode) && !IN_RANGE (value, -1, 0)
-      && EASY_VECTOR_15 (value))
-    return false;
+  /* See if we could generate vspltisw/vspltish/xxspltiw directly instead of
+     xxspltib + sign extend.  Special case 0/-1 to allow getting any VSX
+     register instead of an Altivec register.  */
+  if ((mode == V4SImode || mode == V8HImode) && !IN_RANGE (value, -1, 0))
+    {
+      if (EASY_VECTOR_15 (value))
+	return false;
+
+      if (TARGET_XXSPLTIW && TARGET_PREFIXED && TARGET_VSX)
+	return false;
+    }
 
   /* Return # of instructions and the constant byte for XXSPLTIB.  */
   if (mode == V16QImode)
@@ -7004,6 +7009,52 @@ output_vec_const_move (rtx *operands)
 	      operands[2] = GEN_INT (vec_const.xxspltidp_immediate);
 	      return "xxspltidp %x0,%2";
 	    }
+
+	  if (vec_const_use_xxspltiw (&vec_const))
+	    {
+	      HOST_WIDE_INT imm = vec_const.xxspltiw_immediate;
+
+	      /* See if we can generate the shorter VSPLTISB, VSPLTISH, or
+		 VSPLTISW instead of XXSPLTIW.  */
+	      if (dest_vmx_p)
+		{
+		  HOST_WIDE_INT sign_imm
+		    = ((imm & 0xffffffff) ^ 0x80000000) - 0x80000000;
+
+		  if (EASY_VECTOR_15 (sign_imm))
+		    {
+		      operands[2] = GEN_INT (sign_imm);
+		      return "vspltisw %0,%2";
+		    }
+
+		  if (vec_const.bytes[0] == vec_const.bytes[1]
+		      && vec_const.bytes[0] == vec_const.bytes[2]
+		      && vec_const.bytes[0] == vec_const.bytes[3])
+		    {
+		      HOST_WIDE_INT sign_imm8 = ((imm & 0xff) ^ 0x80) - 0x80;
+		      if (EASY_VECTOR_15 (sign_imm8))
+			{
+			  operands[2] = GEN_INT (sign_imm8);
+			  return "vspltisb %0,%2";
+			}
+		    }
+
+		  if (vec_const.h_words[0] == vec_const.h_words[1])
+		    {
+		      HOST_WIDE_INT sign_imm16
+			= ((imm & 0xffff) ^ 0x8000) - 0x8000;
+
+		      if (EASY_VECTOR_15 (sign_imm16))
+			{
+			  operands[2] = GEN_INT (sign_imm16);
+			  return "vspltish %0,%2";
+			}
+		    }
+		}
+
+	      operands[2] = GEN_INT (imm);
+	      return "xxspltiw %x0,%2";
+	    }
 	}
 
       if (TARGET_P9_VECTOR
@@ -26770,6 +26821,9 @@ prefixed_xxsplti_p (rtx_insn *insn)
     {
       if (vec_const_use_xxspltidp (&vec_const))
 	return true;
+
+      if (vec_const_use_xxspltiw (&vec_const))
+	return true;
     }
 
   return false;
@@ -28784,6 +28838,60 @@ vec_const_use_xxspltidp (rs6000_vec_const *vec_const)
   return true;
 }
 
+/* Determine if a vector constant can be loaded with XXSPLTIW.  If so,
+   fill out the fields used to generate the instruction.  */
+
+bool
+vec_const_use_xxspltiw (rs6000_vec_const *vec_const)
+{
+  if (!TARGET_XXSPLTIW || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  /* Make sure that each of the 4 32-bit segments are the same.  */
+  unsigned int value = vec_const->words[0];
+  if (value != vec_const->words[1]
+      || value != vec_const->words[2]
+      || value != vec_const->words[3])
+    return false;
+
+  /* Avoid values that are easy to create with other instructions (0.0 for
+     floating point, and values that can be loaded with VSPLTISW, VSPLTISH,
+     VSPLTISB, or XXSPLTISB.  */
+  if (value == 0)
+    return false;
+
+  machine_mode mode = vec_const->orig_mode;
+  if (mode == VOIDmode)
+    mode = SImode;
+
+  if (!FLOAT_MODE_P (mode))
+    {
+      /* Can we use VSPLTISW to load the constant?  */
+      int sign_value = ((value & 0xffffffff) ^ 0x80000000) - 0x80000000;
+      if (EASY_VECTOR_15 (sign_value))
+	return false;
+
+      /* Can we use VSPLTISH to load the constant?  */
+      if (vec_const->h_words[0] == vec_const->h_words[1])
+	{
+	  int sign_value16 = ((value & 0xffff) ^ 0x8000) - 0x8000;
+	  if (EASY_VECTOR_15 (sign_value16))
+	    return false;
+	}
+
+      /* Can we use XXSPLTISB/VSPLTISB to load the constant?  */
+      if (vec_const->bytes[0] == vec_const->bytes[1]
+	  && vec_const->bytes[0] == vec_const->bytes[2]
+	  && vec_const->bytes[0] == vec_const->bytes[3])
+	return false;
+    }
+
+  /* Record the immediate in the vec_const structure for XXSPLTIW.  */
+  vec_const->xxspltiw_immediate = value;
+
+  return true;
+}
+
 /* Determine if a vector constant can be loaded with LXVKQ.  If so, fill out
    the fields used to generate the instruction.  */
 
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index c9eb78952d6..015bf91b6d5 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -644,6 +644,10 @@ mxxspltidp
 Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
 Generate (do not generate) XXSPLTIDP instructions.
 
+mxxspltiw
+Target Undocumented Var(TARGET_XXSPLTIW) Init(1) Save
+Generate (do not generate) XXSPLTIW instructions.
+
 mlxvkq
 Target Undocumented Var(TARGET_LXVKQ) Init(1) Save
 Generate (do not generate) LXVKQ instructions.
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 15a22525000..07b0b671920 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1192,7 +1192,7 @@
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
-;;              XXLSPLTIDP LXVKQ
+;;              XXLSPLTI*  LXVKQ
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
@@ -1241,7 +1241,7 @@
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
 ;;              XXSPLTIB   VSPLTISW   VSX 0/-1
-;;              XXSPLTIDP  LXVKQ
+;;              XXSPLTI*   LXVKQ
 ;;              VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
new file mode 100644
index 00000000000..2707d86e6fd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V16HI vector constants where the
+   first 4 elements are the same as the next 4 elements, etc.  */
+
+vector unsigned char
+v16qi_const_1 (void)
+{
+  return (vector unsigned char) { 1, 1, 1, 1, 1, 1, 1, 1,
+				  1, 1, 1, 1, 1, 1, 1, 1, }; /* VSLTPISB.  */
+}
+
+vector unsigned char
+v16qi_const_2 (void)
+{
+  return (vector unsigned char) { 1, 2, 3, 4, 1, 2, 3, 4,
+				  1, 2, 3, 4, 1, 2, 3, 4, }; /* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}              1 } } */
+/* { dg-final { scan-assembler-times {\mvspltisb\M|\mxxspltib\M} 1 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}                   } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}                    } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
new file mode 100644
index 00000000000..05d4ee3f5cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
@@ -0,0 +1,67 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SF vector constants.  */
+
+vector float
+v4sf_const_1 (void)
+{
+  return (vector float) { 1.0f, 1.0f, 1.0f, 1.0f };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_nan (void)
+{
+  return (vector float) { __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf ("") };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_inf (void)
+{
+  return (vector float) { __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff () };		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_m0 (void)
+{
+  return (vector float) { -0.0f, -0.0f, -0.0f, -0.0f };	/* XXSPLTIB/VSLW.  */
+}
+
+vector float
+v4sf_splats_1 (void)
+{
+  return vec_splats (1.0f);				/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_nan (void)
+{
+  return vec_splats (__builtin_nanf (""));		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_inf (void)
+{
+  return vec_splats (__builtin_inff ());		/* XXSPLTIW.  */
+}
+
+vector float
+v8hi_splats_m0 (void)
+{
+  return vec_splats (-0.0f);				/* XXSPLTIB/VSLW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 6 } } */
+/* { dg-final { scan-assembler-times {\mxxspltib\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvslw\M}     2 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}      } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}       } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
new file mode 100644
index 00000000000..da909e948b2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VEXTSB2W) is not done.  */
+
+vector int
+v4si_const_1 (void)
+{
+  return (vector int) { 1, 1, 1, 1 };			/* VSLTPISW.  */
+}
+
+vector int
+v4si_const_126 (void)
+{
+  return (vector int) { 126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector int
+v4si_const_1023 (void)
+{
+  return (vector int) { 1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector int
+v4si_splats_1 (void)
+{
+  return vec_splats (1);				/* VSLTPISW.  */
+}
+
+vector int
+v4si_splats_126 (void)
+{
+  return vec_splats (126);				/* XXSPLTIW.  */
+}
+
+vector int
+v8hi_splats_1023 (void)
+{
+  return vec_splats (1023);				/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mvspltisw\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvextsb2w\M}    } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
new file mode 100644
index 00000000000..290e05d4a64
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
@@ -0,0 +1,62 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V8HI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VUPKLSB) is not done.  */
+
+vector short
+v8hi_const_1 (void)
+{
+  return (vector short) { 1, 1, 1, 1, 1, 1, 1, 1 };	/* VSLTPISH.  */
+}
+
+vector short
+v8hi_const_126 (void)
+{
+  return (vector short) { 126, 126, 126, 126,
+			  126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_const_1023 (void)
+{
+  return (vector short) { 1023, 1023, 1023, 1023,
+			  1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1 (void)
+{
+  return vec_splats ((short)1);				/* VSLTPISH.  */
+}
+
+vector short
+v8hi_splats_126 (void)
+{
+  return vec_splats ((short)126);			/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1023 (void)
+{
+  return vec_splats ((short)1023);			/* XXSPLTIW.  */
+}
+
+/* Test that we can optimiza V8HI where all of the even elements are the same
+   and all of the odd elements are the same.  */
+vector short
+v8hi_const_1023_1000 (void)
+{
+  return (vector short) { 1023, 1000, 1023, 1000,
+			  1023, 1000, 1023, 1000 };	/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  5 } } */
+/* { dg-final { scan-assembler-times {\mvspltish\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvupklsb\M}     } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
index 5f84930e1a7..6c01666b625 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -149,7 +149,7 @@ main (int argc, char *argv [])
   return 0;
 }
 
-/* { dg-final { scan-assembler-times {\mxxspltiw\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mxxspltidp\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [gcc(refs/users/meissner/heads/work071)] Generate XXSPLTIW on power10.
@ 2021-10-21  2:53 Michael Meissner
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2021-10-21  2:53 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:2292061afaafff1018f64960b75b354b5b672d04

commit 2292061afaafff1018f64960b75b354b5b672d04
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Wed Oct 20 22:53:01 2021 -0400

    Generate XXSPLTIW on power10.
    
    This patch adds support to automatically generate the ISA 3.1 XXSPLTIW
    instruction for V8HImode, V4SImode, and V4SFmode vectors.  It does this by
    adding support for vector constants that can be used, and adding a
    VEC_DUPLICATE pattern to generate the actual XXSPLTIW instruction.
    
    The eP constraint added with the XXSPLTIDP patch will now also recognize
    use of the XXSPLTIW instruction.
    
    I added 4 new tests to test loading up V16QI, V8HI, V4SI, and V4SF vector
    constants.
    
    2021-10-20  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/constraints.md (eP): Update comment.
            * config/rs6000/predicates.md (easy_fp_constant): Add support for
            generating XXSPLTIW.
            (vsx_prefixed_constant): Likewise.
            (easy_vector_constant): Likewise.
            * config/rs6000/rs6000-protos.h (constant_generates_xxspltiw): New
            declaration.
            * config/rs6000/rs6000.c (xxspltib_constant_p): If we can generate
            XXSPLTIW, don't do XXSPLTIB and sign extend.
            (output_vec_const_move): Add support for XXSPLTIW.
            (prefixed_xxsplti_p): Recognize XXSPLTIW instructions as
            prefixed.
            (constant_generates_xxspltiw): New function.
            * config/rs6000/rs6000.md (UNSPEC_XXSPLTIW_CONST): New unspec.
            (xxspltiw_<mode>_internal): New insns.
            (VSX prefixed constant splitter): Add XXSPLTIW support.
            * config/rs6000/rs6000.opt (-msplat-word-constant): New debug
            switch.
            * config/rs6000/vsx.md (vsx_mov<mode>_64bit): Update comment.
            (vsx_mov<mode>_32bit): Likewise.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/vec-splat-constant-v16qi.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4si.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v8hi.c: New test.
            * gcc.target/powerpc/vec-splati-runnable.c: Update insn count.

Diff:
---
 gcc/config/rs6000/constraints.md                   |  2 +-
 gcc/config/rs6000/predicates.md                    | 11 +++-
 gcc/config/rs6000/rs6000-protos.h                  |  1 +
 gcc/config/rs6000/rs6000.c                         | 54 +++++++++++++++++
 gcc/config/rs6000/rs6000.md                        | 17 ++++++
 gcc/config/rs6000/rs6000.opt                       |  4 ++
 .../gcc.target/powerpc/vec-splat-constant-v16qi.c  | 27 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-v4sf.c   | 67 ++++++++++++++++++++++
 .../gcc.target/powerpc/vec-splat-constant-v4si.c   | 51 ++++++++++++++++
 .../gcc.target/powerpc/vec-splat-constant-v8hi.c   | 62 ++++++++++++++++++++
 .../gcc.target/powerpc/vec-splati-runnable.c       |  2 +-
 11 files changed, 295 insertions(+), 3 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index 7d594872a78..0f0513f2171 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -214,7 +214,7 @@
   (match_operand 0 "cint34_operand"))
 
 ;; A SF/DF scalar constant or a vector constant that can be loaded into vector
-;; registers with one prefixed instruction such as XXSPLTIDP.
+;; registers with one prefixed instruction such as XXSPLTIDP or XXSPLTIW.
 (define_constraint "eP"
   "A constant that can be loaded into a VSX register with one prefixed insn."
   (match_operand 0 "vsx_prefixed_constant"))
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index fefa420ed67..4b07850eb64 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -608,6 +608,9 @@
     {
       if (constant_generates_xxspltidp (&vsx_const))
 	return true;
+
+      if (constant_generates_xxspltiw (&vsx_const))
+	return true;
     }
 
   /* Otherwise consider floating point constants hard, so that the
@@ -620,7 +623,7 @@
 
 ;; Return 1 if the operand is a 64-bit floating point scalar constant or a
 ;; vector constant that can be loaded to a VSX register with one prefixed
-;; instruction, such as XXSPLTIDP.
+;; instruction, such as XXSPLTIDP or XXSPLTIW.
 ;;
 ;; In addition regular constants, we also recognize constants formed with the
 ;; VEC_DUPLICATE insn from scalar constants.
@@ -651,6 +654,9 @@
   if (constant_generates_xxspltidp (&vsx_const))
     return true;
 
+  if (constant_generates_xxspltiw (&vsx_const))
+    return true;
+
   return false;
 })
 
@@ -706,6 +712,9 @@
 	{
 	  if (constant_generates_xxspltidp (&vsx_const))
 	    return true;
+
+	  if (constant_generates_xxspltiw (&vsx_const))
+	    return true;
 	}
 
       if (TARGET_P9_VECTOR
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index ec4f78d9241..0b93bc3cc0e 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -259,6 +259,7 @@ typedef struct {
 extern bool constant_to_bytes (rtx, machine_mode, rs6000_const *,
 			       rs6000_const_splat);
 extern unsigned constant_generates_xxspltidp (rs6000_const *);
+extern unsigned constant_generates_xxspltiw (rs6000_const *);
 #endif /* RTX_CODE */
 
 #ifdef TREE_CODE
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index b041db3c728..59b338085b1 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6939,6 +6939,11 @@ xxspltib_constant_p (rtx op,
   else if (IN_RANGE (value, -1, 0))
     *num_insns_ptr = 1;
 
+  /* If we can generate XXSPLTIW or XXSPLTIDP, don't generate XXSPLTIB and a
+     sign extend operation.  */
+  else if (vsx_prefixed_constant (op, mode))
+    return false;
+
   else
     *num_insns_ptr = 2;
 
@@ -7002,6 +7007,13 @@ output_vec_const_move (rtx *operands)
 		  operands[2] = GEN_INT (imm);
 		  return "xxspltidp %x0,%2";
 		}
+
+	      imm = constant_generates_xxspltiw (&vsx_const);
+	      if (imm)
+		{
+		  operands[2] = GEN_INT (imm);
+		  return "xxspltiw %x0,%2";
+		}
 	    }
 	}
 
@@ -26769,6 +26781,9 @@ prefixed_xxsplti_p (rtx_insn *insn)
     {
       if (constant_generates_xxspltidp (&vsx_const))
 	return true;
+
+      if (constant_generates_xxspltiw (&vsx_const))
+	return true;
     }
 
   return false;
@@ -29005,6 +29020,45 @@ constant_generates_xxspltidp (rs6000_const *vsx_const)
   return sf_value;
 }
 
+/* Determine if a vector constant can be loaded with XXSPLTIW.  Return zero if
+   the XXSPLTIW instruction cannot be used.  Otherwise return the immediate
+   value to be used with the XXSPLTIW instruction.  */
+
+unsigned
+constant_generates_xxspltiw (rs6000_const *vsx_const)
+{
+  if (!TARGET_SPLAT_WORD_CONSTANT || !TARGET_PREFIXED || !TARGET_VSX)
+    return 0;
+
+  /* Only recognize XXSPLTIW for 16-byte vector constants (or 8-byte scalar
+     constants that have been splatted to 128-bits).  */
+  if (vsx_const->total_size != 16)
+    return 0;
+
+  if (!vsx_const->all_words_same)
+    return 0;
+
+  /* If we can use XXSPLTIB, don't generate XXSPLTIW.  */
+  if (vsx_const->all_bytes_same)
+    return 0;
+
+  /* See if we can use VSPLTISH or VSPLTISW.  */
+  if (vsx_const->all_half_words_same)
+    {
+      unsigned short h_word = vsx_const->half_words[0];
+      short sign_h_word = ((h_word & 0xffff) ^ 0x8000) - 0x8000;
+      if (EASY_VECTOR_15 (sign_h_word))
+	return 0;
+    }
+
+  unsigned int word = vsx_const->words[0];
+  int sign_word = ((word & 0xffffffff) ^ 0x80000000) - 0x80000000;
+  if (EASY_VECTOR_15 (sign_word))
+    return 0;
+
+  return vsx_const->words[0];
+}
+
 \f
 struct gcc_target targetm = TARGET_INITIALIZER;
 
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 2633ad9f815..3c94e547939 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -157,6 +157,7 @@
    UNSPEC_HASHST
    UNSPEC_HASHCHK
    UNSPEC_XXSPLTIDP_CONST
+   UNSPEC_XXSPLTIW_CONST
   ])
 
 ;;
@@ -8232,6 +8233,15 @@
   [(set_attr "type" "vecperm")
    (set_attr "prefixed" "yes")])
 
+(define_insn "xxspltiw_<mode>_internal"
+  [(set (match_operand:SFDF 0 "register_operand" "=wa")
+	(unspec:SFDF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
+		     UNSPEC_XXSPLTIW_CONST))]
+  "TARGET_POWER10"
+  "xxspltiw %x0,%1"
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
 (define_split
   [(set (match_operand:SFDF 0 "vsx_register_operand")
 	(match_operand:SFDF 1 "vsx_prefixed_constant"))]
@@ -8252,6 +8262,13 @@
       DONE;
     }
 
+  imm = constant_generates_xxspltiw (&vsx_const);
+  if (imm)
+    {
+      emit_insn (gen_xxspltiw_<mode>_internal (dest, GEN_INT (imm)));
+      DONE;
+    }
+
   else
     gcc_unreachable ();
 })
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 429da57d19d..ec607a7aee7 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -644,6 +644,10 @@ msplat-float-constant
 Target Var(TARGET_SPLAT_FLOAT_CONSTANT) Init(1) Save
 Generate (do not generate) code that uses the XXSPLTIDP instruction.
 
+msplat-word-constant
+Target Var(TARGET_SPLAT_WORD_CONSTANT) Init(1) Save
+Generate (do not generate) code that uses the XXSPLTIW instruction.
+
 -param=rs6000-density-pct-threshold=
 Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) IntegerRange(0, 100) Param
 When costing for loop vectorization, we probably need to penalize the loop body
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
new file mode 100644
index 00000000000..27764ddbc83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V16HI vector constants where the
+   first 4 elements are the same as the next 4 elements, etc.  */
+
+vector unsigned char
+v16qi_const_1 (void)
+{
+  return (vector unsigned char) { 1, 1, 1, 1, 1, 1, 1, 1,
+				  1, 1, 1, 1, 1, 1, 1, 1, }; /* VSLTPISB.  */
+}
+
+vector unsigned char
+v16qi_const_2 (void)
+{
+  return (vector unsigned char) { 1, 2, 3, 4, 1, 2, 3, 4,
+				  1, 2, 3, 4, 1, 2, 3, 4, }; /* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}              1 } } */
+/* { dg-final { scan-assembler-times {\mvspltisb\M|\mxxspltib\M} 1 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}                   } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}                    } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
new file mode 100644
index 00000000000..1f0475cf47a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
@@ -0,0 +1,67 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SF vector constants.  */
+
+vector float
+v4sf_const_1 (void)
+{
+  return (vector float) { 1.0f, 1.0f, 1.0f, 1.0f };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_nan (void)
+{
+  return (vector float) { __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf ("") };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_inf (void)
+{
+  return (vector float) { __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff () };		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_m0 (void)
+{
+  return (vector float) { -0.0f, -0.0f, -0.0f, -0.0f };	/* XXSPLTIB/VSLW.  */
+}
+
+vector float
+v4sf_splats_1 (void)
+{
+  return vec_splats (1.0f);				/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_nan (void)
+{
+  return vec_splats (__builtin_nanf (""));		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_inf (void)
+{
+  return vec_splats (__builtin_inff ());		/* XXSPLTIW.  */
+}
+
+vector float
+v8hi_splats_m0 (void)
+{
+  return vec_splats (-0.0f);				/* XXSPLTIB/VSLW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 6 } } */
+/* { dg-final { scan-assembler-times {\mxxspltib\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvslw\M}     2 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}      } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}       } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
new file mode 100644
index 00000000000..02d0c6d66a2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VEXTSB2W) is not done.  */
+
+vector int
+v4si_const_1 (void)
+{
+  return (vector int) { 1, 1, 1, 1 };			/* VSLTPISW.  */
+}
+
+vector int
+v4si_const_126 (void)
+{
+  return (vector int) { 126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector int
+v4si_const_1023 (void)
+{
+  return (vector int) { 1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector int
+v4si_splats_1 (void)
+{
+  return vec_splats (1);				/* VSLTPISW.  */
+}
+
+vector int
+v4si_splats_126 (void)
+{
+  return vec_splats (126);				/* XXSPLTIW.  */
+}
+
+vector int
+v8hi_splats_1023 (void)
+{
+  return vec_splats (1023);				/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mvspltisw\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvextsb2w\M}    } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
new file mode 100644
index 00000000000..59418d3bb0a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
@@ -0,0 +1,62 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V8HI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VUPKLSB) is not done.  */
+
+vector short
+v8hi_const_1 (void)
+{
+  return (vector short) { 1, 1, 1, 1, 1, 1, 1, 1 };	/* VSLTPISH.  */
+}
+
+vector short
+v8hi_const_126 (void)
+{
+  return (vector short) { 126, 126, 126, 126,
+			  126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_const_1023 (void)
+{
+  return (vector short) { 1023, 1023, 1023, 1023,
+			  1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1 (void)
+{
+  return vec_splats ((short)1);				/* VSLTPISH.  */
+}
+
+vector short
+v8hi_splats_126 (void)
+{
+  return vec_splats ((short)126);			/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1023 (void)
+{
+  return vec_splats ((short)1023);			/* XXSPLTIW.  */
+}
+
+/* Test that we can optimiza V8HI where all of the even elements are the same
+   and all of the odd elements are the same.  */
+vector short
+v8hi_const_1023_1000 (void)
+{
+  return (vector short) { 1023, 1000, 1023, 1000,
+			  1023, 1000, 1023, 1000 };	/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  5 } } */
+/* { dg-final { scan-assembler-times {\mvspltish\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvupklsb\M}     } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
index 5f84930e1a7..6c01666b625 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -149,7 +149,7 @@ main (int argc, char *argv [])
   return 0;
 }
 
-/* { dg-final { scan-assembler-times {\mxxspltiw\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mxxspltidp\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [gcc(refs/users/meissner/heads/work071)] Generate XXSPLTIW on power10.
@ 2021-10-21  2:38 Michael Meissner
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2021-10-21  2:38 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:3eb92582ccca03ed0b253ba3baadf8cbfc63d426

commit 3eb92582ccca03ed0b253ba3baadf8cbfc63d426
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Wed Oct 20 22:37:36 2021 -0400

    Generate XXSPLTIW on power10.
    
    This patch adds support to automatically generate the ISA 3.1 XXSPLTIW
    instruction for V8HImode, V4SImode, and V4SFmode vectors.  It does this by
    adding support for vector constants that can be used, and adding a
    VEC_DUPLICATE pattern to generate the actual XXSPLTIW instruction.
    
    The eP constraint added with the XXSPLTIDP patch will now also recognize
    use of the XXSPLTIW instruction.
    
    I added 4 new tests to test loading up V16QI, V8HI, V4SI, and V4SF vector
    constants.
    
    2021-10-20  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/constraints.md (eP): Update comment.
            * config/rs6000/predicates.md (easy_fp_constant): Add support for
            generating XXSPLTIW.
            (vsx_prefixed_constant): Likewise.
            (easy_vector_constant): Likewise.
            * config/rs6000/rs6000-protos.h (constant_generates_xxspltiw): New
            declaration.
            * config/rs6000/rs6000.c (xxspltib_constant_p): If we can generate
            XXSPLTIW, don't do XXSPLTIB and sign extend.
            (output_vec_const_move): Add support for XXSPLTIW.
            (prefixed_xxsplti_p): Recognize XXSPLTIW instructions as
            prefixed.
            (constant_generates_xxspltiw): New function.
            * config/rs6000/rs6000.md (UNSPEC_XXSPLTIW_CONST): New unspec.
            (xxspltiw_<mode>_internal): New insns.
            (VSX prefixed constant splitter): Add XXSPLTIW support.
            * config/rs6000/rs6000.opt (-msplat-word-constant): New debug
            switch.
            * config/rs6000/vsx.md (vsx_mov<mode>_64bit): Update comment.
            (vsx_mov<mode>_32bit): Likewise.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/vec-splat-constant-v16qi.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4si.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v8hi.c: New test.
            * gcc.target/powerpc/vec-splati-runnable.c: Update insn count.

Diff:
---
 gcc/config/rs6000/constraints.md                   |  2 +-
 gcc/config/rs6000/predicates.md                    | 11 +++-
 gcc/config/rs6000/rs6000-protos.h                  |  1 +
 gcc/config/rs6000/rs6000.c                         | 54 +++++++++++++++++
 gcc/config/rs6000/rs6000.md                        | 17 ++++++
 gcc/config/rs6000/rs6000.opt                       |  4 ++
 .../gcc.target/powerpc/vec-splat-constant-v16qi.c  | 27 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-v4sf.c   | 67 ++++++++++++++++++++++
 .../gcc.target/powerpc/vec-splat-constant-v4si.c   | 51 ++++++++++++++++
 .../gcc.target/powerpc/vec-splat-constant-v8hi.c   | 62 ++++++++++++++++++++
 .../gcc.target/powerpc/vec-splati-runnable.c       |  2 +-
 11 files changed, 295 insertions(+), 3 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index 7d594872a78..0f0513f2171 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -214,7 +214,7 @@
   (match_operand 0 "cint34_operand"))
 
 ;; A SF/DF scalar constant or a vector constant that can be loaded into vector
-;; registers with one prefixed instruction such as XXSPLTIDP.
+;; registers with one prefixed instruction such as XXSPLTIDP or XXSPLTIW.
 (define_constraint "eP"
   "A constant that can be loaded into a VSX register with one prefixed insn."
   (match_operand 0 "vsx_prefixed_constant"))
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index fefa420ed67..4b07850eb64 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -608,6 +608,9 @@
     {
       if (constant_generates_xxspltidp (&vsx_const))
 	return true;
+
+      if (constant_generates_xxspltiw (&vsx_const))
+	return true;
     }
 
   /* Otherwise consider floating point constants hard, so that the
@@ -620,7 +623,7 @@
 
 ;; Return 1 if the operand is a 64-bit floating point scalar constant or a
 ;; vector constant that can be loaded to a VSX register with one prefixed
-;; instruction, such as XXSPLTIDP.
+;; instruction, such as XXSPLTIDP or XXSPLTIW.
 ;;
 ;; In addition regular constants, we also recognize constants formed with the
 ;; VEC_DUPLICATE insn from scalar constants.
@@ -651,6 +654,9 @@
   if (constant_generates_xxspltidp (&vsx_const))
     return true;
 
+  if (constant_generates_xxspltiw (&vsx_const))
+    return true;
+
   return false;
 })
 
@@ -706,6 +712,9 @@
 	{
 	  if (constant_generates_xxspltidp (&vsx_const))
 	    return true;
+
+	  if (constant_generates_xxspltiw (&vsx_const))
+	    return true;
 	}
 
       if (TARGET_P9_VECTOR
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index ec4f78d9241..0b93bc3cc0e 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -259,6 +259,7 @@ typedef struct {
 extern bool constant_to_bytes (rtx, machine_mode, rs6000_const *,
 			       rs6000_const_splat);
 extern unsigned constant_generates_xxspltidp (rs6000_const *);
+extern unsigned constant_generates_xxspltiw (rs6000_const *);
 #endif /* RTX_CODE */
 
 #ifdef TREE_CODE
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index b041db3c728..59b338085b1 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6939,6 +6939,11 @@ xxspltib_constant_p (rtx op,
   else if (IN_RANGE (value, -1, 0))
     *num_insns_ptr = 1;
 
+  /* If we can generate XXSPLTIW or XXSPLTIDP, don't generate XXSPLTIB and a
+     sign extend operation.  */
+  else if (vsx_prefixed_constant (op, mode))
+    return false;
+
   else
     *num_insns_ptr = 2;
 
@@ -7002,6 +7007,13 @@ output_vec_const_move (rtx *operands)
 		  operands[2] = GEN_INT (imm);
 		  return "xxspltidp %x0,%2";
 		}
+
+	      imm = constant_generates_xxspltiw (&vsx_const);
+	      if (imm)
+		{
+		  operands[2] = GEN_INT (imm);
+		  return "xxspltiw %x0,%2";
+		}
 	    }
 	}
 
@@ -26769,6 +26781,9 @@ prefixed_xxsplti_p (rtx_insn *insn)
     {
       if (constant_generates_xxspltidp (&vsx_const))
 	return true;
+
+      if (constant_generates_xxspltiw (&vsx_const))
+	return true;
     }
 
   return false;
@@ -29005,6 +29020,45 @@ constant_generates_xxspltidp (rs6000_const *vsx_const)
   return sf_value;
 }
 
+/* Determine if a vector constant can be loaded with XXSPLTIW.  Return zero if
+   the XXSPLTIW instruction cannot be used.  Otherwise return the immediate
+   value to be used with the XXSPLTIW instruction.  */
+
+unsigned
+constant_generates_xxspltiw (rs6000_const *vsx_const)
+{
+  if (!TARGET_SPLAT_WORD_CONSTANT || !TARGET_PREFIXED || !TARGET_VSX)
+    return 0;
+
+  /* Only recognize XXSPLTIW for 16-byte vector constants (or 8-byte scalar
+     constants that have been splatted to 128-bits).  */
+  if (vsx_const->total_size != 16)
+    return 0;
+
+  if (!vsx_const->all_words_same)
+    return 0;
+
+  /* If we can use XXSPLTIB, don't generate XXSPLTIW.  */
+  if (vsx_const->all_bytes_same)
+    return 0;
+
+  /* See if we can use VSPLTISH or VSPLTISW.  */
+  if (vsx_const->all_half_words_same)
+    {
+      unsigned short h_word = vsx_const->half_words[0];
+      short sign_h_word = ((h_word & 0xffff) ^ 0x8000) - 0x8000;
+      if (EASY_VECTOR_15 (sign_h_word))
+	return 0;
+    }
+
+  unsigned int word = vsx_const->words[0];
+  int sign_word = ((word & 0xffffffff) ^ 0x80000000) - 0x80000000;
+  if (EASY_VECTOR_15 (sign_word))
+    return 0;
+
+  return vsx_const->words[0];
+}
+
 \f
 struct gcc_target targetm = TARGET_INITIALIZER;
 
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 2633ad9f815..3c94e547939 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -157,6 +157,7 @@
    UNSPEC_HASHST
    UNSPEC_HASHCHK
    UNSPEC_XXSPLTIDP_CONST
+   UNSPEC_XXSPLTIW_CONST
   ])
 
 ;;
@@ -8232,6 +8233,15 @@
   [(set_attr "type" "vecperm")
    (set_attr "prefixed" "yes")])
 
+(define_insn "xxspltiw_<mode>_internal"
+  [(set (match_operand:SFDF 0 "register_operand" "=wa")
+	(unspec:SFDF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
+		     UNSPEC_XXSPLTIW_CONST))]
+  "TARGET_POWER10"
+  "xxspltiw %x0,%1"
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
 (define_split
   [(set (match_operand:SFDF 0 "vsx_register_operand")
 	(match_operand:SFDF 1 "vsx_prefixed_constant"))]
@@ -8252,6 +8262,13 @@
       DONE;
     }
 
+  imm = constant_generates_xxspltiw (&vsx_const);
+  if (imm)
+    {
+      emit_insn (gen_xxspltiw_<mode>_internal (dest, GEN_INT (imm)));
+      DONE;
+    }
+
   else
     gcc_unreachable ();
 })
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 429da57d19d..ec607a7aee7 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -644,6 +644,10 @@ msplat-float-constant
 Target Var(TARGET_SPLAT_FLOAT_CONSTANT) Init(1) Save
 Generate (do not generate) code that uses the XXSPLTIDP instruction.
 
+msplat-word-constant
+Target Var(TARGET_SPLAT_WORD_CONSTANT) Init(1) Save
+Generate (do not generate) code that uses the XXSPLTIW instruction.
+
 -param=rs6000-density-pct-threshold=
 Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) IntegerRange(0, 100) Param
 When costing for loop vectorization, we probably need to penalize the loop body
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
new file mode 100644
index 00000000000..2707d86e6fd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V16HI vector constants where the
+   first 4 elements are the same as the next 4 elements, etc.  */
+
+vector unsigned char
+v16qi_const_1 (void)
+{
+  return (vector unsigned char) { 1, 1, 1, 1, 1, 1, 1, 1,
+				  1, 1, 1, 1, 1, 1, 1, 1, }; /* VSLTPISB.  */
+}
+
+vector unsigned char
+v16qi_const_2 (void)
+{
+  return (vector unsigned char) { 1, 2, 3, 4, 1, 2, 3, 4,
+				  1, 2, 3, 4, 1, 2, 3, 4, }; /* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}              1 } } */
+/* { dg-final { scan-assembler-times {\mvspltisb\M|\mxxspltib\M} 1 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}                   } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}                    } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
new file mode 100644
index 00000000000..05d4ee3f5cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
@@ -0,0 +1,67 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SF vector constants.  */
+
+vector float
+v4sf_const_1 (void)
+{
+  return (vector float) { 1.0f, 1.0f, 1.0f, 1.0f };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_nan (void)
+{
+  return (vector float) { __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf ("") };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_inf (void)
+{
+  return (vector float) { __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff () };		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_m0 (void)
+{
+  return (vector float) { -0.0f, -0.0f, -0.0f, -0.0f };	/* XXSPLTIB/VSLW.  */
+}
+
+vector float
+v4sf_splats_1 (void)
+{
+  return vec_splats (1.0f);				/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_nan (void)
+{
+  return vec_splats (__builtin_nanf (""));		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_inf (void)
+{
+  return vec_splats (__builtin_inff ());		/* XXSPLTIW.  */
+}
+
+vector float
+v8hi_splats_m0 (void)
+{
+  return vec_splats (-0.0f);				/* XXSPLTIB/VSLW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 6 } } */
+/* { dg-final { scan-assembler-times {\mxxspltib\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvslw\M}     2 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}      } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}       } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
new file mode 100644
index 00000000000..da909e948b2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VEXTSB2W) is not done.  */
+
+vector int
+v4si_const_1 (void)
+{
+  return (vector int) { 1, 1, 1, 1 };			/* VSLTPISW.  */
+}
+
+vector int
+v4si_const_126 (void)
+{
+  return (vector int) { 126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector int
+v4si_const_1023 (void)
+{
+  return (vector int) { 1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector int
+v4si_splats_1 (void)
+{
+  return vec_splats (1);				/* VSLTPISW.  */
+}
+
+vector int
+v4si_splats_126 (void)
+{
+  return vec_splats (126);				/* XXSPLTIW.  */
+}
+
+vector int
+v8hi_splats_1023 (void)
+{
+  return vec_splats (1023);				/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mvspltisw\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvextsb2w\M}    } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
new file mode 100644
index 00000000000..290e05d4a64
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
@@ -0,0 +1,62 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V8HI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VUPKLSB) is not done.  */
+
+vector short
+v8hi_const_1 (void)
+{
+  return (vector short) { 1, 1, 1, 1, 1, 1, 1, 1 };	/* VSLTPISH.  */
+}
+
+vector short
+v8hi_const_126 (void)
+{
+  return (vector short) { 126, 126, 126, 126,
+			  126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_const_1023 (void)
+{
+  return (vector short) { 1023, 1023, 1023, 1023,
+			  1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1 (void)
+{
+  return vec_splats ((short)1);				/* VSLTPISH.  */
+}
+
+vector short
+v8hi_splats_126 (void)
+{
+  return vec_splats ((short)126);			/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1023 (void)
+{
+  return vec_splats ((short)1023);			/* XXSPLTIW.  */
+}
+
+/* Test that we can optimiza V8HI where all of the even elements are the same
+   and all of the odd elements are the same.  */
+vector short
+v8hi_const_1023_1000 (void)
+{
+  return (vector short) { 1023, 1000, 1023, 1000,
+			  1023, 1000, 1023, 1000 };	/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  5 } } */
+/* { dg-final { scan-assembler-times {\mvspltish\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvupklsb\M}     } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
index 5f84930e1a7..6c01666b625 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -149,7 +149,7 @@ main (int argc, char *argv [])
   return 0;
 }
 
-/* { dg-final { scan-assembler-times {\mxxspltiw\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mxxspltidp\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [gcc(refs/users/meissner/heads/work071)] Generate XXSPLTIW on power10.
@ 2021-10-21  0:40 Michael Meissner
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2021-10-21  0:40 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:3914142e562b537493e17108fb15c53c8a96b153

commit 3914142e562b537493e17108fb15c53c8a96b153
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Wed Oct 20 20:39:45 2021 -0400

    Generate XXSPLTIW on power10.
    
    This patch adds support to automatically generate the ISA 3.1 XXSPLTIW
    instruction for V8HImode, V4SImode, and V4SFmode vectors.  It does this by
    adding support for vector constants that can be used, and adding a
    VEC_DUPLICATE pattern to generate the actual XXSPLTIW instruction.
    
    The eP constraint added with the XXSPLTIDP patch will now also recognize
    use of the XXSPLTIW instruction.
    
    I added 4 new tests to test loading up V16QI, V8HI, V4SI, and V4SF vector
    constants.
    
    2021-10-20  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/predicates.md (easy_fp_constant): Add support for
            generating XXSPLTIW.
            (vsx_prefixed_constant): Likewise.
            (easy_vector_constant): Likewise.
            * config/rs6000/rs6000-protos.h (constant_generates_xxspltiw): New
            declaration.
            * config/rs6000/rs6000.c (xxspltib_constant_p): If we can generate
            XXSPLTIW, don't do XXSPLTIB and sign extend.
            (output_vec_const_move): Add support for XXSPLTIW.
            (prefixed_xxsplti_p): Recognize XXSPLTIW instructions as
            prefixed.
            (constant_generates_xxspltiw): New function.
            * config/rs6000/rs6000.md (UNSPEC_XXSPLTIW_CONST): New unspec.
            (xxspltiw_<mode>_internal): New insns.
            (VSX prefixed constant splitter): Add XXSPLTIW support.
            * config/rs6000/rs6000.opt (-msplat-word-constant): New debug
            switch.
            * config/rs6000/vsx.md (vsx_mov<mode>_64bit): Update comment.
            (vsx_mov<mode>_32bit): Likewise.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/vec-splat-constant-v16qi.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4si.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v8hi.c: New test.
            * gcc.target/powerpc/vec-splati-runnable.c: Update insn count.

Diff:
---
 gcc/config/rs6000/predicates.md                    | 11 +++-
 gcc/config/rs6000/rs6000-protos.h                  |  1 +
 gcc/config/rs6000/rs6000.c                         | 49 ++++++++++++++++
 gcc/config/rs6000/rs6000.md                        | 17 ++++++
 gcc/config/rs6000/rs6000.opt                       |  4 ++
 .../gcc.target/powerpc/vec-splat-constant-v16qi.c  | 27 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-v4sf.c   | 67 ++++++++++++++++++++++
 .../gcc.target/powerpc/vec-splat-constant-v4si.c   | 51 ++++++++++++++++
 .../gcc.target/powerpc/vec-splat-constant-v8hi.c   | 62 ++++++++++++++++++++
 .../gcc.target/powerpc/vec-splati-runnable.c       |  2 +-
 10 files changed, 289 insertions(+), 2 deletions(-)

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index fefa420ed67..4b07850eb64 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -608,6 +608,9 @@
     {
       if (constant_generates_xxspltidp (&vsx_const))
 	return true;
+
+      if (constant_generates_xxspltiw (&vsx_const))
+	return true;
     }
 
   /* Otherwise consider floating point constants hard, so that the
@@ -620,7 +623,7 @@
 
 ;; Return 1 if the operand is a 64-bit floating point scalar constant or a
 ;; vector constant that can be loaded to a VSX register with one prefixed
-;; instruction, such as XXSPLTIDP.
+;; instruction, such as XXSPLTIDP or XXSPLTIW.
 ;;
 ;; In addition regular constants, we also recognize constants formed with the
 ;; VEC_DUPLICATE insn from scalar constants.
@@ -651,6 +654,9 @@
   if (constant_generates_xxspltidp (&vsx_const))
     return true;
 
+  if (constant_generates_xxspltiw (&vsx_const))
+    return true;
+
   return false;
 })
 
@@ -706,6 +712,9 @@
 	{
 	  if (constant_generates_xxspltidp (&vsx_const))
 	    return true;
+
+	  if (constant_generates_xxspltiw (&vsx_const))
+	    return true;
 	}
 
       if (TARGET_P9_VECTOR
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index ec4f78d9241..0b93bc3cc0e 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -259,6 +259,7 @@ typedef struct {
 extern bool constant_to_bytes (rtx, machine_mode, rs6000_const *,
 			       rs6000_const_splat);
 extern unsigned constant_generates_xxspltidp (rs6000_const *);
+extern unsigned constant_generates_xxspltiw (rs6000_const *);
 #endif /* RTX_CODE */
 
 #ifdef TREE_CODE
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index b041db3c728..4f24d9491da 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6939,6 +6939,11 @@ xxspltib_constant_p (rtx op,
   else if (IN_RANGE (value, -1, 0))
     *num_insns_ptr = 1;
 
+  /* If we can generate XXSPLTIW or XXSPLTIDP, don't generate XXSPLTIB and a
+     sign extend operation.  */
+  else if (vsx_prefixed_constant (op, mode))
+    return false;
+
   else
     *num_insns_ptr = 2;
 
@@ -7002,6 +7007,13 @@ output_vec_const_move (rtx *operands)
 		  operands[2] = GEN_INT (imm);
 		  return "xxspltidp %x0,%2";
 		}
+
+	      imm = constant_generates_xxspltiw (&vsx_const);
+	      if (imm)
+		{
+		  operands[2] = GEN_INT (imm);
+		  return "xxspltiw %x0,%2";
+		}
 	    }
 	}
 
@@ -26769,6 +26781,9 @@ prefixed_xxsplti_p (rtx_insn *insn)
     {
       if (constant_generates_xxspltidp (&vsx_const))
 	return true;
+
+      if (constant_generates_xxspltiw (&vsx_const))
+	return true;
     }
 
   return false;
@@ -29005,6 +29020,40 @@ constant_generates_xxspltidp (rs6000_const *vsx_const)
   return sf_value;
 }
 
+/* Determine if a vector constant can be loaded with XXSPLTIW.  Return zero if
+   the XXSPLTIW instruction cannot be used.  Otherwise return the immediate
+   value to be used with the XXSPLTIW instruction.  */
+
+unsigned
+constant_generates_xxspltiw (rs6000_const *vsx_const)
+{
+  if (!TARGET_SPLAT_WORD_CONSTANT || !TARGET_PREFIXED || !TARGET_VSX)
+    return 0;
+
+  if (!vsx_const->all_words_same)
+    return 0;
+
+  /* If we can use XXSPLTIB, don't generate XXSPLTIW.  */
+  if (vsx_const->all_bytes_same)
+    return 0;
+
+  /* See if we can use VSPLTISH or VSPLTISW.  */
+  if (vsx_const->all_half_words_same)
+    {
+      unsigned short h_word = vsx_const->half_words[0];
+      short sign_h_word = ((h_word & 0xffff) ^ 0x8000) - 0x8000;
+      if (EASY_VECTOR_15 (sign_h_word))
+	return 0;
+    }
+
+  unsigned int word = vsx_const->words[0];
+  int sign_word = ((word & 0xffffffff) ^ 0x80000000) - 0x80000000;
+  if (EASY_VECTOR_15 (sign_word))
+    return 0;
+
+  return vsx_const->words[0];
+}
+
 \f
 struct gcc_target targetm = TARGET_INITIALIZER;
 
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 2633ad9f815..3c94e547939 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -157,6 +157,7 @@
    UNSPEC_HASHST
    UNSPEC_HASHCHK
    UNSPEC_XXSPLTIDP_CONST
+   UNSPEC_XXSPLTIW_CONST
   ])
 
 ;;
@@ -8232,6 +8233,15 @@
   [(set_attr "type" "vecperm")
    (set_attr "prefixed" "yes")])
 
+(define_insn "xxspltiw_<mode>_internal"
+  [(set (match_operand:SFDF 0 "register_operand" "=wa")
+	(unspec:SFDF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
+		     UNSPEC_XXSPLTIW_CONST))]
+  "TARGET_POWER10"
+  "xxspltiw %x0,%1"
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
 (define_split
   [(set (match_operand:SFDF 0 "vsx_register_operand")
 	(match_operand:SFDF 1 "vsx_prefixed_constant"))]
@@ -8252,6 +8262,13 @@
       DONE;
     }
 
+  imm = constant_generates_xxspltiw (&vsx_const);
+  if (imm)
+    {
+      emit_insn (gen_xxspltiw_<mode>_internal (dest, GEN_INT (imm)));
+      DONE;
+    }
+
   else
     gcc_unreachable ();
 })
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 429da57d19d..ec607a7aee7 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -644,6 +644,10 @@ msplat-float-constant
 Target Var(TARGET_SPLAT_FLOAT_CONSTANT) Init(1) Save
 Generate (do not generate) code that uses the XXSPLTIDP instruction.
 
+msplat-word-constant
+Target Var(TARGET_SPLAT_WORD_CONSTANT) Init(1) Save
+Generate (do not generate) code that uses the XXSPLTIW instruction.
+
 -param=rs6000-density-pct-threshold=
 Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) IntegerRange(0, 100) Param
 When costing for loop vectorization, we probably need to penalize the loop body
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
new file mode 100644
index 00000000000..27764ddbc83
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V16HI vector constants where the
+   first 4 elements are the same as the next 4 elements, etc.  */
+
+vector unsigned char
+v16qi_const_1 (void)
+{
+  return (vector unsigned char) { 1, 1, 1, 1, 1, 1, 1, 1,
+				  1, 1, 1, 1, 1, 1, 1, 1, }; /* VSLTPISB.  */
+}
+
+vector unsigned char
+v16qi_const_2 (void)
+{
+  return (vector unsigned char) { 1, 2, 3, 4, 1, 2, 3, 4,
+				  1, 2, 3, 4, 1, 2, 3, 4, }; /* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}              1 } } */
+/* { dg-final { scan-assembler-times {\mvspltisb\M|\mxxspltib\M} 1 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}                   } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}                    } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
new file mode 100644
index 00000000000..1f0475cf47a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
@@ -0,0 +1,67 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SF vector constants.  */
+
+vector float
+v4sf_const_1 (void)
+{
+  return (vector float) { 1.0f, 1.0f, 1.0f, 1.0f };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_nan (void)
+{
+  return (vector float) { __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf ("") };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_inf (void)
+{
+  return (vector float) { __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff () };		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_m0 (void)
+{
+  return (vector float) { -0.0f, -0.0f, -0.0f, -0.0f };	/* XXSPLTIB/VSLW.  */
+}
+
+vector float
+v4sf_splats_1 (void)
+{
+  return vec_splats (1.0f);				/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_nan (void)
+{
+  return vec_splats (__builtin_nanf (""));		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_inf (void)
+{
+  return vec_splats (__builtin_inff ());		/* XXSPLTIW.  */
+}
+
+vector float
+v8hi_splats_m0 (void)
+{
+  return vec_splats (-0.0f);				/* XXSPLTIB/VSLW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 6 } } */
+/* { dg-final { scan-assembler-times {\mxxspltib\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvslw\M}     2 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}      } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}       } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
new file mode 100644
index 00000000000..02d0c6d66a2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VEXTSB2W) is not done.  */
+
+vector int
+v4si_const_1 (void)
+{
+  return (vector int) { 1, 1, 1, 1 };			/* VSLTPISW.  */
+}
+
+vector int
+v4si_const_126 (void)
+{
+  return (vector int) { 126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector int
+v4si_const_1023 (void)
+{
+  return (vector int) { 1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector int
+v4si_splats_1 (void)
+{
+  return vec_splats (1);				/* VSLTPISW.  */
+}
+
+vector int
+v4si_splats_126 (void)
+{
+  return vec_splats (126);				/* XXSPLTIW.  */
+}
+
+vector int
+v8hi_splats_1023 (void)
+{
+  return vec_splats (1023);				/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mvspltisw\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvextsb2w\M}    } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
new file mode 100644
index 00000000000..59418d3bb0a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
@@ -0,0 +1,62 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V8HI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VUPKLSB) is not done.  */
+
+vector short
+v8hi_const_1 (void)
+{
+  return (vector short) { 1, 1, 1, 1, 1, 1, 1, 1 };	/* VSLTPISH.  */
+}
+
+vector short
+v8hi_const_126 (void)
+{
+  return (vector short) { 126, 126, 126, 126,
+			  126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_const_1023 (void)
+{
+  return (vector short) { 1023, 1023, 1023, 1023,
+			  1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1 (void)
+{
+  return vec_splats ((short)1);				/* VSLTPISH.  */
+}
+
+vector short
+v8hi_splats_126 (void)
+{
+  return vec_splats ((short)126);			/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1023 (void)
+{
+  return vec_splats ((short)1023);			/* XXSPLTIW.  */
+}
+
+/* Test that we can optimiza V8HI where all of the even elements are the same
+   and all of the odd elements are the same.  */
+vector short
+v8hi_const_1023_1000 (void)
+{
+  return (vector short) { 1023, 1000, 1023, 1000,
+			  1023, 1000, 1023, 1000 };	/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  5 } } */
+/* { dg-final { scan-assembler-times {\mvspltish\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvupklsb\M}     } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
index 5f84930e1a7..6c01666b625 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -149,7 +149,7 @@ main (int argc, char *argv [])
   return 0;
 }
 
-/* { dg-final { scan-assembler-times {\mxxspltiw\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mxxspltidp\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [gcc(refs/users/meissner/heads/work071)] Generate XXSPLTIW on power10.
@ 2021-10-20 22:58 Michael Meissner
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2021-10-20 22:58 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:de2c9a286a44b76c7d68424f0018a252dfe7608c

commit de2c9a286a44b76c7d68424f0018a252dfe7608c
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Wed Oct 20 18:58:09 2021 -0400

    Generate XXSPLTIW on power10.
    
    This patch adds support to automatically generate the ISA 3.1 XXSPLTIW
    instruction for V8HImode, V4SImode, and V4SFmode vectors.  It does this by
    adding support for vector constants that can be used, and adding a
    VEC_DUPLICATE pattern to generate the actual XXSPLTIW instruction.
    
    The eP constraint added with the XXSPLTIDP patch will now also recognize
    use of the XXSPLTIW instruction.
    
    I added 4 new tests to test loading up V16QI, V8HI, V4SI, and V4SF vector
    constants.
    
    2021-10-20  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/predicates.md (easy_fp_constant): Add support for
            generating XXSPLTIW.
            (vsx_prefixed_constant): Likewise.
            (easy_vector_constant): Likewise.
            * config/rs6000/rs6000-protos.h (constant_generates_xxspltiw): New
            declaration.
            * config/rs6000/rs6000.c (xxspltib_constant_p): If we can generate
            XXSPLTIW, don't do XXSPLTIB and sign extend.
            (output_vec_const_move): Add support for XXSPLTIW.
            (prefixed_xxsplti_p): Recognize XXSPLTIW instructions as
            prefixed.
            (constant_generates_xxspltiw): New function.
            * config/rs6000/rs6000.md (UNSPEC_XXSPLTIW_CONST): New unspec.
            (xxspltiw_<mode>_internal): New insns.
            (VSX prefixed constant splitter): Add XXSPLTIW support.
            * config/rs6000/rs6000.opt (-msplat-word-constant): New debug
            switch.
            * config/rs6000/vsx.md (vsx_mov<mode>_64bit): Update comment.
            (vsx_mov<mode>_32bit): Likewise.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/vec-splat-constant-v16qi.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4si.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v8hi.c: New test.
            * gcc.target/powerpc/vec-splati-runnable.c: Update insn count.

Diff:
---
 gcc/config/rs6000/predicates.md                    | 11 +++-
 gcc/config/rs6000/rs6000-protos.h                  |  1 +
 gcc/config/rs6000/rs6000.c                         | 49 ++++++++++++++++
 gcc/config/rs6000/rs6000.md                        | 17 ++++++
 gcc/config/rs6000/rs6000.opt                       |  4 ++
 .../gcc.target/powerpc/vec-splat-constant-v16qi.c  | 27 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-v4sf.c   | 67 ++++++++++++++++++++++
 .../gcc.target/powerpc/vec-splat-constant-v4si.c   | 51 ++++++++++++++++
 .../gcc.target/powerpc/vec-splat-constant-v8hi.c   | 62 ++++++++++++++++++++
 .../gcc.target/powerpc/vec-splati-runnable.c       |  2 +-
 10 files changed, 289 insertions(+), 2 deletions(-)

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index fefa420ed67..4b07850eb64 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -608,6 +608,9 @@
     {
       if (constant_generates_xxspltidp (&vsx_const))
 	return true;
+
+      if (constant_generates_xxspltiw (&vsx_const))
+	return true;
     }
 
   /* Otherwise consider floating point constants hard, so that the
@@ -620,7 +623,7 @@
 
 ;; Return 1 if the operand is a 64-bit floating point scalar constant or a
 ;; vector constant that can be loaded to a VSX register with one prefixed
-;; instruction, such as XXSPLTIDP.
+;; instruction, such as XXSPLTIDP or XXSPLTIW.
 ;;
 ;; In addition regular constants, we also recognize constants formed with the
 ;; VEC_DUPLICATE insn from scalar constants.
@@ -651,6 +654,9 @@
   if (constant_generates_xxspltidp (&vsx_const))
     return true;
 
+  if (constant_generates_xxspltiw (&vsx_const))
+    return true;
+
   return false;
 })
 
@@ -706,6 +712,9 @@
 	{
 	  if (constant_generates_xxspltidp (&vsx_const))
 	    return true;
+
+	  if (constant_generates_xxspltiw (&vsx_const))
+	    return true;
 	}
 
       if (TARGET_P9_VECTOR
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index ec4f78d9241..0b93bc3cc0e 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -259,6 +259,7 @@ typedef struct {
 extern bool constant_to_bytes (rtx, machine_mode, rs6000_const *,
 			       rs6000_const_splat);
 extern unsigned constant_generates_xxspltidp (rs6000_const *);
+extern unsigned constant_generates_xxspltiw (rs6000_const *);
 #endif /* RTX_CODE */
 
 #ifdef TREE_CODE
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index b041db3c728..4f24d9491da 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6939,6 +6939,11 @@ xxspltib_constant_p (rtx op,
   else if (IN_RANGE (value, -1, 0))
     *num_insns_ptr = 1;
 
+  /* If we can generate XXSPLTIW or XXSPLTIDP, don't generate XXSPLTIB and a
+     sign extend operation.  */
+  else if (vsx_prefixed_constant (op, mode))
+    return false;
+
   else
     *num_insns_ptr = 2;
 
@@ -7002,6 +7007,13 @@ output_vec_const_move (rtx *operands)
 		  operands[2] = GEN_INT (imm);
 		  return "xxspltidp %x0,%2";
 		}
+
+	      imm = constant_generates_xxspltiw (&vsx_const);
+	      if (imm)
+		{
+		  operands[2] = GEN_INT (imm);
+		  return "xxspltiw %x0,%2";
+		}
 	    }
 	}
 
@@ -26769,6 +26781,9 @@ prefixed_xxsplti_p (rtx_insn *insn)
     {
       if (constant_generates_xxspltidp (&vsx_const))
 	return true;
+
+      if (constant_generates_xxspltiw (&vsx_const))
+	return true;
     }
 
   return false;
@@ -29005,6 +29020,40 @@ constant_generates_xxspltidp (rs6000_const *vsx_const)
   return sf_value;
 }
 
+/* Determine if a vector constant can be loaded with XXSPLTIW.  Return zero if
+   the XXSPLTIW instruction cannot be used.  Otherwise return the immediate
+   value to be used with the XXSPLTIW instruction.  */
+
+unsigned
+constant_generates_xxspltiw (rs6000_const *vsx_const)
+{
+  if (!TARGET_SPLAT_WORD_CONSTANT || !TARGET_PREFIXED || !TARGET_VSX)
+    return 0;
+
+  if (!vsx_const->all_words_same)
+    return 0;
+
+  /* If we can use XXSPLTIB, don't generate XXSPLTIW.  */
+  if (vsx_const->all_bytes_same)
+    return 0;
+
+  /* See if we can use VSPLTISH or VSPLTISW.  */
+  if (vsx_const->all_half_words_same)
+    {
+      unsigned short h_word = vsx_const->half_words[0];
+      short sign_h_word = ((h_word & 0xffff) ^ 0x8000) - 0x8000;
+      if (EASY_VECTOR_15 (sign_h_word))
+	return 0;
+    }
+
+  unsigned int word = vsx_const->words[0];
+  int sign_word = ((word & 0xffffffff) ^ 0x80000000) - 0x80000000;
+  if (EASY_VECTOR_15 (sign_word))
+    return 0;
+
+  return vsx_const->words[0];
+}
+
 \f
 struct gcc_target targetm = TARGET_INITIALIZER;
 
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 2633ad9f815..3c94e547939 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -157,6 +157,7 @@
    UNSPEC_HASHST
    UNSPEC_HASHCHK
    UNSPEC_XXSPLTIDP_CONST
+   UNSPEC_XXSPLTIW_CONST
   ])
 
 ;;
@@ -8232,6 +8233,15 @@
   [(set_attr "type" "vecperm")
    (set_attr "prefixed" "yes")])
 
+(define_insn "xxspltiw_<mode>_internal"
+  [(set (match_operand:SFDF 0 "register_operand" "=wa")
+	(unspec:SFDF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
+		     UNSPEC_XXSPLTIW_CONST))]
+  "TARGET_POWER10"
+  "xxspltiw %x0,%1"
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
 (define_split
   [(set (match_operand:SFDF 0 "vsx_register_operand")
 	(match_operand:SFDF 1 "vsx_prefixed_constant"))]
@@ -8252,6 +8262,13 @@
       DONE;
     }
 
+  imm = constant_generates_xxspltiw (&vsx_const);
+  if (imm)
+    {
+      emit_insn (gen_xxspltiw_<mode>_internal (dest, GEN_INT (imm)));
+      DONE;
+    }
+
   else
     gcc_unreachable ();
 })
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 429da57d19d..ec607a7aee7 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -644,6 +644,10 @@ msplat-float-constant
 Target Var(TARGET_SPLAT_FLOAT_CONSTANT) Init(1) Save
 Generate (do not generate) code that uses the XXSPLTIDP instruction.
 
+msplat-word-constant
+Target Var(TARGET_SPLAT_WORD_CONSTANT) Init(1) Save
+Generate (do not generate) code that uses the XXSPLTIW instruction.
+
 -param=rs6000-density-pct-threshold=
 Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) IntegerRange(0, 100) Param
 When costing for loop vectorization, we probably need to penalize the loop body
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
new file mode 100644
index 00000000000..2707d86e6fd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V16HI vector constants where the
+   first 4 elements are the same as the next 4 elements, etc.  */
+
+vector unsigned char
+v16qi_const_1 (void)
+{
+  return (vector unsigned char) { 1, 1, 1, 1, 1, 1, 1, 1,
+				  1, 1, 1, 1, 1, 1, 1, 1, }; /* VSLTPISB.  */
+}
+
+vector unsigned char
+v16qi_const_2 (void)
+{
+  return (vector unsigned char) { 1, 2, 3, 4, 1, 2, 3, 4,
+				  1, 2, 3, 4, 1, 2, 3, 4, }; /* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}              1 } } */
+/* { dg-final { scan-assembler-times {\mvspltisb\M|\mxxspltib\M} 1 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}                   } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}                    } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
new file mode 100644
index 00000000000..05d4ee3f5cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
@@ -0,0 +1,67 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SF vector constants.  */
+
+vector float
+v4sf_const_1 (void)
+{
+  return (vector float) { 1.0f, 1.0f, 1.0f, 1.0f };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_nan (void)
+{
+  return (vector float) { __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf ("") };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_inf (void)
+{
+  return (vector float) { __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff () };		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_m0 (void)
+{
+  return (vector float) { -0.0f, -0.0f, -0.0f, -0.0f };	/* XXSPLTIB/VSLW.  */
+}
+
+vector float
+v4sf_splats_1 (void)
+{
+  return vec_splats (1.0f);				/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_nan (void)
+{
+  return vec_splats (__builtin_nanf (""));		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_inf (void)
+{
+  return vec_splats (__builtin_inff ());		/* XXSPLTIW.  */
+}
+
+vector float
+v8hi_splats_m0 (void)
+{
+  return vec_splats (-0.0f);				/* XXSPLTIB/VSLW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 6 } } */
+/* { dg-final { scan-assembler-times {\mxxspltib\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvslw\M}     2 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}      } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}       } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
new file mode 100644
index 00000000000..da909e948b2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VEXTSB2W) is not done.  */
+
+vector int
+v4si_const_1 (void)
+{
+  return (vector int) { 1, 1, 1, 1 };			/* VSLTPISW.  */
+}
+
+vector int
+v4si_const_126 (void)
+{
+  return (vector int) { 126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector int
+v4si_const_1023 (void)
+{
+  return (vector int) { 1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector int
+v4si_splats_1 (void)
+{
+  return vec_splats (1);				/* VSLTPISW.  */
+}
+
+vector int
+v4si_splats_126 (void)
+{
+  return vec_splats (126);				/* XXSPLTIW.  */
+}
+
+vector int
+v8hi_splats_1023 (void)
+{
+  return vec_splats (1023);				/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mvspltisw\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvextsb2w\M}    } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
new file mode 100644
index 00000000000..290e05d4a64
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
@@ -0,0 +1,62 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V8HI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VUPKLSB) is not done.  */
+
+vector short
+v8hi_const_1 (void)
+{
+  return (vector short) { 1, 1, 1, 1, 1, 1, 1, 1 };	/* VSLTPISH.  */
+}
+
+vector short
+v8hi_const_126 (void)
+{
+  return (vector short) { 126, 126, 126, 126,
+			  126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_const_1023 (void)
+{
+  return (vector short) { 1023, 1023, 1023, 1023,
+			  1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1 (void)
+{
+  return vec_splats ((short)1);				/* VSLTPISH.  */
+}
+
+vector short
+v8hi_splats_126 (void)
+{
+  return vec_splats ((short)126);			/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1023 (void)
+{
+  return vec_splats ((short)1023);			/* XXSPLTIW.  */
+}
+
+/* Test that we can optimiza V8HI where all of the even elements are the same
+   and all of the odd elements are the same.  */
+vector short
+v8hi_const_1023_1000 (void)
+{
+  return (vector short) { 1023, 1000, 1023, 1000,
+			  1023, 1000, 1023, 1000 };	/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  5 } } */
+/* { dg-final { scan-assembler-times {\mvspltish\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvupklsb\M}     } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
index 5f84930e1a7..6c01666b625 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -149,7 +149,7 @@ main (int argc, char *argv [])
   return 0;
 }
 
-/* { dg-final { scan-assembler-times {\mxxspltiw\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mxxspltidp\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [gcc(refs/users/meissner/heads/work071)] Generate XXSPLTIW on power10.
@ 2021-10-20 22:24 Michael Meissner
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2021-10-20 22:24 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:2eb301c9c41f24f22f5787ddba451860ea19bbab

commit 2eb301c9c41f24f22f5787ddba451860ea19bbab
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Wed Oct 20 18:23:40 2021 -0400

    Generate XXSPLTIW on power10.
    
    This patch adds support to automatically generate the ISA 3.1 XXSPLTIW
    instruction for V8HImode, V4SImode, and V4SFmode vectors.  It does this by
    adding support for vector constants that can be used, and adding a
    VEC_DUPLICATE pattern to generate the actual XXSPLTIW instruction.
    
    The eP constraint added with the XXSPLTIDP patch will now also recognize
    use of the XXSPLTIW instruction.
    
    I added 4 new tests to test loading up V16QI, V8HI, V4SI, and V4SF vector
    constants.
    
    2021-10-20  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/predicates.md (easy_fp_constant): Add support for
            generating XXSPLTIW.
            (vsx_prefixed_constant): Likewise.
            (easy_vector_constant): Likewise.
            * config/rs6000/rs6000-protos.h (constant_generates_xxspltiw): New
            declaration.
            * config/rs6000/rs6000.c (xxspltib_constant_p): If we can generate
            XXSPLTIW, don't do XXSPLTIB and sign extend.
            (output_vec_const_move): Add support for XXSPLTIW.
            (prefixed_xxsplti_p): Recognize XXSPLTIW instructions as
            prefixed.
            (constant_generates_xxspltiw): New function.
            * config/rs6000/rs6000.md (UNSPEC_XXSPLTIW_CONST): New unspec.
            (xxspltiw_<mode>_internal): New insns.
            (VSX prefixed constant splitter): Add XXSPLTIW support.
            * config/rs6000/rs6000.opt (-msplat-word-constant): New debug
            switch.
            * config/rs6000/vsx.md (vsx_mov<mode>_64bit): Update comment.
            (vsx_mov<mode>_32bit): Likewise.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/vec-splat-constant-v16qi.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4si.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v8hi.c: New test.
            * gcc.target/powerpc/vec-splati-runnable.c: Update insn count.

Diff:
---
 gcc/config/rs6000/predicates.md                    | 11 +++-
 gcc/config/rs6000/rs6000-protos.h                  |  1 +
 gcc/config/rs6000/rs6000.c                         | 49 ++++++++++++++++
 gcc/config/rs6000/rs6000.md                        | 17 ++++++
 gcc/config/rs6000/rs6000.opt                       |  4 ++
 .../gcc.target/powerpc/vec-splat-constant-v16qi.c  | 27 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-v4sf.c   | 67 ++++++++++++++++++++++
 .../gcc.target/powerpc/vec-splat-constant-v4si.c   | 51 ++++++++++++++++
 .../gcc.target/powerpc/vec-splat-constant-v8hi.c   | 62 ++++++++++++++++++++
 .../gcc.target/powerpc/vec-splati-runnable.c       |  2 +-
 10 files changed, 289 insertions(+), 2 deletions(-)

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index b97b20a1c88..e6b48fadbfd 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -609,6 +609,9 @@
     {
       if (constant_generates_xxspltidp (&vsx_const))
 	return true;
+
+      if (constant_generates_xxspltiw (&vsx_const))
+	return true;
     }
 
   /* Otherwise consider floating point constants hard, so that the
@@ -621,7 +624,7 @@
 
 ;; Return 1 if the operand is a 64-bit floating point scalar constant or a
 ;; vector constant that can be loaded to a VSX register with one prefixed
-;; instruction, such as XXSPLTIDP.
+;; instruction, such as XXSPLTIDP or XXSPLTIW.
 ;;
 ;; In addition regular constants, we also recognize constants formed with the
 ;; VEC_DUPLICATE insn from scalar constants.
@@ -652,6 +655,9 @@
   if (constant_generates_xxspltidp (&vsx_const))
     return true;
 
+  if (constant_generates_xxspltiw (&vsx_const))
+    return true;
+
   return false;
 })
 
@@ -708,6 +714,9 @@
 	{
 	  if (constant_generates_xxspltidp (&vsx_const))
 	    return true;
+
+	  if (constant_generates_xxspltiw (&vsx_const))
+	    return true;
 	}
 
       if (TARGET_P9_VECTOR
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index ec4f78d9241..0b93bc3cc0e 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -259,6 +259,7 @@ typedef struct {
 extern bool constant_to_bytes (rtx, machine_mode, rs6000_const *,
 			       rs6000_const_splat);
 extern unsigned constant_generates_xxspltidp (rs6000_const *);
+extern unsigned constant_generates_xxspltiw (rs6000_const *);
 #endif /* RTX_CODE */
 
 #ifdef TREE_CODE
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 40c7e5ceddf..9889e42241c 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6939,6 +6939,11 @@ xxspltib_constant_p (rtx op,
   else if (IN_RANGE (value, -1, 0))
     *num_insns_ptr = 1;
 
+  /* If we can generate XXSPLTIW or XXSPLTIDP, don't generate XXSPLTIB and a
+     sign extend operation.  */
+  else if (vsx_prefixed_constant (op, mode))
+    return false;
+
   else
     *num_insns_ptr = 2;
 
@@ -7002,6 +7007,13 @@ output_vec_const_move (rtx *operands)
 		  operands[2] = GEN_INT (imm);
 		  return "xxspltidp %x0,%2";
 		}
+
+	      imm = constant_generates_xxspltiw (&vsx_const);
+	      if (imm)
+		{
+		  operands[2] = GEN_INT (imm);
+		  return "xxspltiw %x0,%2";
+		}
 	    }
 	}
 
@@ -26769,6 +26781,9 @@ prefixed_xxsplti_p (rtx_insn *insn)
     {
       if (constant_generates_xxspltidp (&vsx_const))
 	return true;
+
+      if (constant_generates_xxspltiw (&vsx_const))
+	return true;
     }
 
   return false;
@@ -29001,6 +29016,40 @@ constant_generates_xxspltidp (rs6000_const *vsx_const)
   return sf_value;
 }
 
+/* Determine if a vector constant can be loaded with XXSPLTIW.  Return zero if
+   the XXSPLTIW instruction cannot be used.  Otherwise return the immediate
+   value to be used with the XXSPLTIW instruction.  */
+
+unsigned
+constant_generates_xxspltiw (rs6000_const *vsx_const)
+{
+  if (!TARGET_SPLAT_WORD_CONSTANT || !TARGET_PREFIXED || !TARGET_VSX)
+    return 0;
+
+  if (!vsx_const->all_words_same)
+    return 0;
+
+  /* If we can use XXSPLTIB, don't generate XXSPLTIW.  */
+  if (vsx_const->all_bytes_same)
+    return 0;
+
+  /* See if we can use VSPLTISH or VSPLTISW.  */
+  if (vsx_const->all_half_words_same)
+    {
+      unsigned short h_word = vsx_const->half_words[0];
+      short sign_h_word = ((h_word & 0xffff) ^ 0x8000) - 0x8000;
+      if (EASY_VECTOR_15 (sign_h_word))
+	return 0;
+    }
+
+  unsigned int word = vsx_const->words[0];
+  int sign_word = ((word & 0xffffffff) ^ 0x80000000) - 0x80000000;
+  if (EASY_VECTOR_15 (sign_word))
+    return 0;
+
+  return vsx_const->words[0];
+}
+
 \f
 struct gcc_target targetm = TARGET_INITIALIZER;
 
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 218645aa240..19cd1ba7022 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -157,6 +157,7 @@
    UNSPEC_HASHST
    UNSPEC_HASHCHK
    UNSPEC_XXSPLTIDP_CONST
+   UNSPEC_XXSPLTIW_CONST
   ])
 
 ;;
@@ -8232,6 +8233,15 @@
   [(set_attr "type" "vecperm")
    (set_attr "prefixed" "yes")])
 
+(define_insn "xxspltiw_<mode>_internal"
+  [(set (match_operand:SFDF 0 "register_operand" "=wa")
+	(unspec:SFDF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
+		     UNSPEC_XXSPLTIW_CONST))]
+  "TARGET_POWER10"
+  "xxspltiw %x0,%1"
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
 (define_split
   [(set (match_operand:SFDF 0 "vsx_register_operand")
 	(match_operand:SFDF 1 "vsx_prefixed_constant"))]
@@ -8252,6 +8262,13 @@
       DONE;
     }
 
+  imm = constant_generates_xxspltiw (&vsx_const);
+  if (imm)
+    {
+      emit_insn (gen_xxspltiw_<mode>_internal (dest, GEN_INT (imm)));
+      DONE;
+    }
+
   else
     gcc_unreachable ();
 })
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 429da57d19d..ec607a7aee7 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -644,6 +644,10 @@ msplat-float-constant
 Target Var(TARGET_SPLAT_FLOAT_CONSTANT) Init(1) Save
 Generate (do not generate) code that uses the XXSPLTIDP instruction.
 
+msplat-word-constant
+Target Var(TARGET_SPLAT_WORD_CONSTANT) Init(1) Save
+Generate (do not generate) code that uses the XXSPLTIW instruction.
+
 -param=rs6000-density-pct-threshold=
 Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) IntegerRange(0, 100) Param
 When costing for loop vectorization, we probably need to penalize the loop body
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
new file mode 100644
index 00000000000..2707d86e6fd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V16HI vector constants where the
+   first 4 elements are the same as the next 4 elements, etc.  */
+
+vector unsigned char
+v16qi_const_1 (void)
+{
+  return (vector unsigned char) { 1, 1, 1, 1, 1, 1, 1, 1,
+				  1, 1, 1, 1, 1, 1, 1, 1, }; /* VSLTPISB.  */
+}
+
+vector unsigned char
+v16qi_const_2 (void)
+{
+  return (vector unsigned char) { 1, 2, 3, 4, 1, 2, 3, 4,
+				  1, 2, 3, 4, 1, 2, 3, 4, }; /* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}              1 } } */
+/* { dg-final { scan-assembler-times {\mvspltisb\M|\mxxspltib\M} 1 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}                   } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}                    } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
new file mode 100644
index 00000000000..05d4ee3f5cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
@@ -0,0 +1,67 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SF vector constants.  */
+
+vector float
+v4sf_const_1 (void)
+{
+  return (vector float) { 1.0f, 1.0f, 1.0f, 1.0f };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_nan (void)
+{
+  return (vector float) { __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf ("") };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_inf (void)
+{
+  return (vector float) { __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff () };		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_m0 (void)
+{
+  return (vector float) { -0.0f, -0.0f, -0.0f, -0.0f };	/* XXSPLTIB/VSLW.  */
+}
+
+vector float
+v4sf_splats_1 (void)
+{
+  return vec_splats (1.0f);				/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_nan (void)
+{
+  return vec_splats (__builtin_nanf (""));		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_inf (void)
+{
+  return vec_splats (__builtin_inff ());		/* XXSPLTIW.  */
+}
+
+vector float
+v8hi_splats_m0 (void)
+{
+  return vec_splats (-0.0f);				/* XXSPLTIB/VSLW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 6 } } */
+/* { dg-final { scan-assembler-times {\mxxspltib\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvslw\M}     2 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}      } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}       } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
new file mode 100644
index 00000000000..da909e948b2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VEXTSB2W) is not done.  */
+
+vector int
+v4si_const_1 (void)
+{
+  return (vector int) { 1, 1, 1, 1 };			/* VSLTPISW.  */
+}
+
+vector int
+v4si_const_126 (void)
+{
+  return (vector int) { 126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector int
+v4si_const_1023 (void)
+{
+  return (vector int) { 1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector int
+v4si_splats_1 (void)
+{
+  return vec_splats (1);				/* VSLTPISW.  */
+}
+
+vector int
+v4si_splats_126 (void)
+{
+  return vec_splats (126);				/* XXSPLTIW.  */
+}
+
+vector int
+v8hi_splats_1023 (void)
+{
+  return vec_splats (1023);				/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mvspltisw\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvextsb2w\M}    } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
new file mode 100644
index 00000000000..290e05d4a64
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
@@ -0,0 +1,62 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V8HI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VUPKLSB) is not done.  */
+
+vector short
+v8hi_const_1 (void)
+{
+  return (vector short) { 1, 1, 1, 1, 1, 1, 1, 1 };	/* VSLTPISH.  */
+}
+
+vector short
+v8hi_const_126 (void)
+{
+  return (vector short) { 126, 126, 126, 126,
+			  126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_const_1023 (void)
+{
+  return (vector short) { 1023, 1023, 1023, 1023,
+			  1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1 (void)
+{
+  return vec_splats ((short)1);				/* VSLTPISH.  */
+}
+
+vector short
+v8hi_splats_126 (void)
+{
+  return vec_splats ((short)126);			/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1023 (void)
+{
+  return vec_splats ((short)1023);			/* XXSPLTIW.  */
+}
+
+/* Test that we can optimiza V8HI where all of the even elements are the same
+   and all of the odd elements are the same.  */
+vector short
+v8hi_const_1023_1000 (void)
+{
+  return (vector short) { 1023, 1000, 1023, 1000,
+			  1023, 1000, 1023, 1000 };	/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  5 } } */
+/* { dg-final { scan-assembler-times {\mvspltish\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvupklsb\M}     } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
index 5f84930e1a7..6c01666b625 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -149,7 +149,7 @@ main (int argc, char *argv [])
   return 0;
 }
 
-/* { dg-final { scan-assembler-times {\mxxspltiw\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mxxspltidp\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [gcc(refs/users/meissner/heads/work071)] Generate XXSPLTIW on power10.
@ 2021-10-18 18:51 Michael Meissner
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2021-10-18 18:51 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:4af9bc8155455aade656a31a820d222346406501

commit 4af9bc8155455aade656a31a820d222346406501
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Mon Oct 18 14:51:29 2021 -0400

    Generate XXSPLTIW on power10.
    
    This patch adds support to automatically generate the ISA 3.1 XXSPLTIW
    instruction for V8HImode, V4SImode, and V4SFmode vectors.  It does this by
    adding support for vector constants that can be used, and adding a
    VEC_DUPLICATE pattern to generate the actual XXSPLTIW instruction.
    
    The eP constraint added with the XXSPLTIDP patch will also recognize use
    of the XXSPLTIW instruction.
    
    I rewrote the XXSPLTW built-in functions to use VEC_DUPLICATE instead of
    UNSPEC.
    
    I added 4 new tests to test loading up V16QI, V8HI, V4SI, and V4SF vector
    constants.
    
    2021-10-18  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/predicates.md (easy_fp_constant): Add support for
            XXSPLTIW.
            (vsx_prefixed_constant): Likewise.
            (easy_vector_constant): Likewise.
            * config/rs6000/rs6000-protos.h (vec_const_use_xxspltiw): New
            declaration.
            * config/rs6000/rs6000.c (xxspltib_constant_p): If we can generate
            XXSPLTIW, don't do XXSPLTIB and sign extend.
            (output_vec_const_move): Add support for XXSPLTIW.
            (prefixed_xxsplti_p): Recognize XXSPLTIW instructions as
            prefixed.
            (vec_const_use_xxspltiw): New function.
            * config/rs6000/rs6000.md (UNSPEC_XXSPLTIW_CONST): New unspec.
            (xxspltiw_<mode>_internal): New insns.
            (VSX prefixed constant splitter): Add XXSPLTIW support.
            * config/rs6000/rs6000.opt (-mxxspltiw): New debug switch.
            * config/rs6000/vsx.md (vsx_mov<mode>_64bit): Update comment.
            (vsx_mov<mode>_32bit): Likewise.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/vec-splat-constant-v16qi.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4si.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v8hi.c: New test.
            * gcc.target/powerpc/vec-splati-runnable.c: Update insn count.

Diff:
---
 gcc/config/rs6000/predicates.md                    | 14 +++++
 gcc/config/rs6000/rs6000-protos.h                  |  1 +
 gcc/config/rs6000/rs6000.c                         | 43 ++++++++++++++
 gcc/config/rs6000/rs6000.md                        | 17 ++++++
 gcc/config/rs6000/rs6000.opt                       |  4 ++
 gcc/config/rs6000/vsx.md                           |  4 +-
 .../gcc.target/powerpc/vec-splat-constant-v16qi.c  | 27 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-v4sf.c   | 67 ++++++++++++++++++++++
 .../gcc.target/powerpc/vec-splat-constant-v4si.c   | 51 ++++++++++++++++
 .../gcc.target/powerpc/vec-splat-constant-v8hi.c   | 62 ++++++++++++++++++++
 .../gcc.target/powerpc/vec-splati-runnable.c       |  2 +-
 11 files changed, 289 insertions(+), 3 deletions(-)

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 4b2bbdf40e8..40c4cba68ff 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -608,6 +608,9 @@
     {
       if (vec_const_use_xxspltidp (&vec_const))
 	return true;
+
+      if (vec_const_use_xxspltiw (&vec_const))
+	return true;
     }
 
   /* Otherwise consider floating point constants hard, so that the
@@ -644,6 +647,14 @@
   if (!vec_const_to_bytes (op, mode, &vec_const))
     return false;
   
+  /* If we can generate the constant with 1-2 Altivec instructions, don't
+      generate a prefixed instruction.  */
+  if (CONST_VECTOR_P (op) && easy_altivec_constant (op, mode))
+    return false;
+	   
+  if (vec_const_use_xxspltiw (&vec_const))
+    return true;
+
   if (vec_const_use_xxspltidp (&vec_const))
     return true;
 
@@ -706,6 +717,9 @@
 	{
 	  if (vec_const_use_xxspltidp (&vec_const))
 	    return true;
+
+	  if (vec_const_use_xxspltiw (&vec_const))
+	    return true;
 	}
 
       return easy_altivec_constant (op, mode);
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 8eef955237a..b12f6b10c13 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -249,6 +249,7 @@ typedef struct {
 
 extern bool vec_const_to_bytes (rtx, machine_mode, rs6000_vec_const *);
 extern bool vec_const_use_xxspltidp (rs6000_vec_const *);
+extern bool vec_const_use_xxspltiw (rs6000_vec_const *);
 #endif /* RTX_CODE */
 
 #ifdef TREE_CODE
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 353ec2b572d..20226169ba2 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6939,6 +6939,11 @@ xxspltib_constant_p (rtx op,
   else if (IN_RANGE (value, -1, 0))
     *num_insns_ptr = 1;
 
+  /* If we can generate XXSPLTIW, don't generate XXSPLTIB and a sign extend
+     operation.  */
+  else if (vsx_prefixed_constant (op, mode))
+    return false;
+
   else
     *num_insns_ptr = 2;
 
@@ -6998,6 +7003,12 @@ output_vec_const_move (rtx *operands)
 	      operands[2] = GEN_INT (vec_const.xxspltidp_immediate);
 	      return "xxspltidp %x0,%2";
 	    }
+
+	  if (vec_const_use_xxspltiw (&vec_const))
+	    {
+	      operands[2] = GEN_INT (vec_const.words[0]);
+	      return "xxspltiw %x0,%2";
+	    }
 	}
 
       if (TARGET_P9_VECTOR
@@ -28784,6 +28795,38 @@ vec_const_use_xxspltidp (rs6000_vec_const *vec_const)
   return true;
 }
 
+/* Determine if a vector constant can be loaded with XXSPLTIW.  */
+
+bool
+vec_const_use_xxspltiw (rs6000_vec_const *vec_const)
+{
+  if (!TARGET_XXSPLTIW || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  if (!vec_const->all_words_same)
+    return false;
+
+  /* If we can use XXSPLTIB, don't generate XXSPLTIW.  */
+  if (vec_const->all_bytes_same)
+    return false;
+
+  /* See if we can use VSPLTISH or VSPLTISW.  */
+  if (vec_const->all_half_words_same)
+    {
+      unsigned short h_word = vec_const->half_words[0];
+      short sign_h_word = ((h_word & 0xffff) ^ 0x8000) - 0x8000;
+      if (EASY_VECTOR_15 (sign_h_word))
+	return false;
+    }
+
+  unsigned int word = vec_const->words[0];
+  int sign_word = ((word & 0xffffffff) ^ 0x80000000) - 0x80000000;
+  if (EASY_VECTOR_15 (sign_word))
+    return false;
+
+  return true;
+}
+
 /* Convert a vector constant to an internal structure, breaking it out to
    bytes, half words, words, and double words.  Return true if we have
    successfully broken it out.  */
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 5d830e0db15..1963eb01ed7 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -157,6 +157,7 @@
    UNSPEC_HASHST
    UNSPEC_HASHCHK
    UNSPEC_XXSPLTIDP_CONST
+   UNSPEC_XXSPLTIW_CONST
   ])
 
 ;;
@@ -8232,6 +8233,15 @@
   [(set_attr "type" "vecperm")
    (set_attr "prefixed" "yes")])
 
+(define_insn "xxspltiw_<mode>_internal"
+  [(set (match_operand:SFDF 0 "register_operand" "=wa")
+	(unspec:SFDF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
+		     UNSPEC_XXSPLTIW_CONST))]
+  "TARGET_POWER10"
+  "xxspltidp %x0,%1"
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
 (define_split
   [(set (match_operand:SFDF 0 "vsx_register_operand")
 	(match_operand:SFDF 1 "vsx_prefixed_constant"))]
@@ -8252,6 +8262,13 @@
       DONE;
     }
 
+  if (vec_const_use_xxspltiw (&vec_const))
+    {
+      rtx imm = GEN_INT (vec_const.words[0]);
+      emit_insn (gen_xxspltiw_<mode>_internal (dest, imm));
+      DONE;
+    }
+
   else
     gcc_unreachable ();
 })
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 1d7ce4cc94a..332f61be0ba 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -644,6 +644,10 @@ mxxspltidp
 Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
 Generate (do not generate) XXSPLTIDP instructions.
 
+mxxspltiw
+Target Undocumented Var(TARGET_XXSPLTIW) Init(1) Save
+Generate (do not generate) XXSPLTIW instructions.
+
 -param=rs6000-density-pct-threshold=
 Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) IntegerRange(0, 100) Param
 When costing for loop vectorization, we probably need to penalize the loop body
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index c8518496339..0ceecc1975c 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1192,7 +1192,7 @@
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
-;;              XXLSPLTIDP
+;;              XXLSPLTI*
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
@@ -1241,7 +1241,7 @@
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
 ;;              XXSPLTIB   VSPLTISW   VSX 0/-1
-;;              XXSPLTIDP
+;;              XXSPLTI*
 ;;              VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
new file mode 100644
index 00000000000..2707d86e6fd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V16HI vector constants where the
+   first 4 elements are the same as the next 4 elements, etc.  */
+
+vector unsigned char
+v16qi_const_1 (void)
+{
+  return (vector unsigned char) { 1, 1, 1, 1, 1, 1, 1, 1,
+				  1, 1, 1, 1, 1, 1, 1, 1, }; /* VSLTPISB.  */
+}
+
+vector unsigned char
+v16qi_const_2 (void)
+{
+  return (vector unsigned char) { 1, 2, 3, 4, 1, 2, 3, 4,
+				  1, 2, 3, 4, 1, 2, 3, 4, }; /* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}              1 } } */
+/* { dg-final { scan-assembler-times {\mvspltisb\M|\mxxspltib\M} 1 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}                   } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}                    } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
new file mode 100644
index 00000000000..05d4ee3f5cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
@@ -0,0 +1,67 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SF vector constants.  */
+
+vector float
+v4sf_const_1 (void)
+{
+  return (vector float) { 1.0f, 1.0f, 1.0f, 1.0f };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_nan (void)
+{
+  return (vector float) { __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf ("") };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_inf (void)
+{
+  return (vector float) { __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff () };		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_m0 (void)
+{
+  return (vector float) { -0.0f, -0.0f, -0.0f, -0.0f };	/* XXSPLTIB/VSLW.  */
+}
+
+vector float
+v4sf_splats_1 (void)
+{
+  return vec_splats (1.0f);				/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_nan (void)
+{
+  return vec_splats (__builtin_nanf (""));		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_inf (void)
+{
+  return vec_splats (__builtin_inff ());		/* XXSPLTIW.  */
+}
+
+vector float
+v8hi_splats_m0 (void)
+{
+  return vec_splats (-0.0f);				/* XXSPLTIB/VSLW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 6 } } */
+/* { dg-final { scan-assembler-times {\mxxspltib\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvslw\M}     2 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}      } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}       } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
new file mode 100644
index 00000000000..da909e948b2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VEXTSB2W) is not done.  */
+
+vector int
+v4si_const_1 (void)
+{
+  return (vector int) { 1, 1, 1, 1 };			/* VSLTPISW.  */
+}
+
+vector int
+v4si_const_126 (void)
+{
+  return (vector int) { 126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector int
+v4si_const_1023 (void)
+{
+  return (vector int) { 1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector int
+v4si_splats_1 (void)
+{
+  return vec_splats (1);				/* VSLTPISW.  */
+}
+
+vector int
+v4si_splats_126 (void)
+{
+  return vec_splats (126);				/* XXSPLTIW.  */
+}
+
+vector int
+v8hi_splats_1023 (void)
+{
+  return vec_splats (1023);				/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mvspltisw\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvextsb2w\M}    } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
new file mode 100644
index 00000000000..290e05d4a64
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
@@ -0,0 +1,62 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V8HI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VUPKLSB) is not done.  */
+
+vector short
+v8hi_const_1 (void)
+{
+  return (vector short) { 1, 1, 1, 1, 1, 1, 1, 1 };	/* VSLTPISH.  */
+}
+
+vector short
+v8hi_const_126 (void)
+{
+  return (vector short) { 126, 126, 126, 126,
+			  126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_const_1023 (void)
+{
+  return (vector short) { 1023, 1023, 1023, 1023,
+			  1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1 (void)
+{
+  return vec_splats ((short)1);				/* VSLTPISH.  */
+}
+
+vector short
+v8hi_splats_126 (void)
+{
+  return vec_splats ((short)126);			/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1023 (void)
+{
+  return vec_splats ((short)1023);			/* XXSPLTIW.  */
+}
+
+/* Test that we can optimiza V8HI where all of the even elements are the same
+   and all of the odd elements are the same.  */
+vector short
+v8hi_const_1023_1000 (void)
+{
+  return (vector short) { 1023, 1000, 1023, 1000,
+			  1023, 1000, 1023, 1000 };	/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  5 } } */
+/* { dg-final { scan-assembler-times {\mvspltish\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvupklsb\M}     } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
index 5f84930e1a7..6c01666b625 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -149,7 +149,7 @@ main (int argc, char *argv [])
   return 0;
 }
 
-/* { dg-final { scan-assembler-times {\mxxspltiw\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mxxspltidp\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [gcc(refs/users/meissner/heads/work071)] Generate XXSPLTIW on power10.
@ 2021-10-18 18:47 Michael Meissner
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2021-10-18 18:47 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:0417e8294118964a91de8cc01465e6011b390911

commit 0417e8294118964a91de8cc01465e6011b390911
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Mon Oct 18 14:46:52 2021 -0400

    Generate XXSPLTIW on power10.
    
    This patch adds support to automatically generate the ISA 3.1 XXSPLTIW
    instruction for V8HImode, V4SImode, and V4SFmode vectors.  It does this by
    adding support for vector constants that can be used, and adding a
    VEC_DUPLICATE pattern to generate the actual XXSPLTIW instruction.
    
    The eP constraint added with the XXSPLTIDP patch will also recognize use
    of the XXSPLTIW instruction.
    
    I rewrote the XXSPLTW built-in functions to use VEC_DUPLICATE instead of
    UNSPEC.
    
    I added 4 new tests to test loading up V16QI, V8HI, V4SI, and V4SF vector
    constants.
    
    2021-10-18  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/predicates.md (easy_fp_constant): Add support for
            XXSPLTIW.
            (easy_vector_constant_prefixed): Likewise.
            (easy_vector_constant): Likewise.
            * config/rs6000/rs6000-protos.h (rs6000_vec_const): Add field for
            XXSPLTIW.
            (vec_const_use_xxspltiw): New declaration.
            * config/rs6000/rs6000.c (xxspltib_constant_p): If we can generate
            XXSPLTIW, don't do XXSPLTIB and sign extend.
            (output_vec_const_move): Add support for XXSPLTIW.
            (prefixed_xxsplti_p): Recognize XXSPLTIW instructions as
            prefixed.
            (vec_const_simple_constant): New function.
            (vec_const_use_xxspltiw): New function.
            * config/rs6000/rs6000.opt (-mxxspltiw): New debug switch.
            * config/rs6000/vsx.md (vsx_mov<mode>_64bit): Update comment.
            (vsx_mov<mode>_32bit): Likewise.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/vec-splat-constant-v16qi.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4si.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v8hi.c: New test.
            * gcc.target/powerpc/vec-splati-runnable.c: Update insn count.

Diff:
---
 gcc/config/rs6000/predicates.md                    | 14 +++++
 gcc/config/rs6000/rs6000-protos.h                  |  1 +
 gcc/config/rs6000/rs6000.c                         | 43 ++++++++++++++
 gcc/config/rs6000/rs6000.md                        | 17 ++++++
 gcc/config/rs6000/rs6000.opt                       |  4 ++
 .../gcc.target/powerpc/vec-splat-constant-v16qi.c  | 27 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-v4sf.c   | 67 ++++++++++++++++++++++
 .../gcc.target/powerpc/vec-splat-constant-v4si.c   | 51 ++++++++++++++++
 .../gcc.target/powerpc/vec-splat-constant-v8hi.c   | 62 ++++++++++++++++++++
 .../gcc.target/powerpc/vec-splati-runnable.c       |  2 +-
 10 files changed, 287 insertions(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 4b2bbdf40e8..40c4cba68ff 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -608,6 +608,9 @@
     {
       if (vec_const_use_xxspltidp (&vec_const))
 	return true;
+
+      if (vec_const_use_xxspltiw (&vec_const))
+	return true;
     }
 
   /* Otherwise consider floating point constants hard, so that the
@@ -644,6 +647,14 @@
   if (!vec_const_to_bytes (op, mode, &vec_const))
     return false;
   
+  /* If we can generate the constant with 1-2 Altivec instructions, don't
+      generate a prefixed instruction.  */
+  if (CONST_VECTOR_P (op) && easy_altivec_constant (op, mode))
+    return false;
+	   
+  if (vec_const_use_xxspltiw (&vec_const))
+    return true;
+
   if (vec_const_use_xxspltidp (&vec_const))
     return true;
 
@@ -706,6 +717,9 @@
 	{
 	  if (vec_const_use_xxspltidp (&vec_const))
 	    return true;
+
+	  if (vec_const_use_xxspltiw (&vec_const))
+	    return true;
 	}
 
       return easy_altivec_constant (op, mode);
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 8eef955237a..b12f6b10c13 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -249,6 +249,7 @@ typedef struct {
 
 extern bool vec_const_to_bytes (rtx, machine_mode, rs6000_vec_const *);
 extern bool vec_const_use_xxspltidp (rs6000_vec_const *);
+extern bool vec_const_use_xxspltiw (rs6000_vec_const *);
 #endif /* RTX_CODE */
 
 #ifdef TREE_CODE
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 353ec2b572d..20226169ba2 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6939,6 +6939,11 @@ xxspltib_constant_p (rtx op,
   else if (IN_RANGE (value, -1, 0))
     *num_insns_ptr = 1;
 
+  /* If we can generate XXSPLTIW, don't generate XXSPLTIB and a sign extend
+     operation.  */
+  else if (vsx_prefixed_constant (op, mode))
+    return false;
+
   else
     *num_insns_ptr = 2;
 
@@ -6998,6 +7003,12 @@ output_vec_const_move (rtx *operands)
 	      operands[2] = GEN_INT (vec_const.xxspltidp_immediate);
 	      return "xxspltidp %x0,%2";
 	    }
+
+	  if (vec_const_use_xxspltiw (&vec_const))
+	    {
+	      operands[2] = GEN_INT (vec_const.words[0]);
+	      return "xxspltiw %x0,%2";
+	    }
 	}
 
       if (TARGET_P9_VECTOR
@@ -28784,6 +28795,38 @@ vec_const_use_xxspltidp (rs6000_vec_const *vec_const)
   return true;
 }
 
+/* Determine if a vector constant can be loaded with XXSPLTIW.  */
+
+bool
+vec_const_use_xxspltiw (rs6000_vec_const *vec_const)
+{
+  if (!TARGET_XXSPLTIW || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  if (!vec_const->all_words_same)
+    return false;
+
+  /* If we can use XXSPLTIB, don't generate XXSPLTIW.  */
+  if (vec_const->all_bytes_same)
+    return false;
+
+  /* See if we can use VSPLTISH or VSPLTISW.  */
+  if (vec_const->all_half_words_same)
+    {
+      unsigned short h_word = vec_const->half_words[0];
+      short sign_h_word = ((h_word & 0xffff) ^ 0x8000) - 0x8000;
+      if (EASY_VECTOR_15 (sign_h_word))
+	return false;
+    }
+
+  unsigned int word = vec_const->words[0];
+  int sign_word = ((word & 0xffffffff) ^ 0x80000000) - 0x80000000;
+  if (EASY_VECTOR_15 (sign_word))
+    return false;
+
+  return true;
+}
+
 /* Convert a vector constant to an internal structure, breaking it out to
    bytes, half words, words, and double words.  Return true if we have
    successfully broken it out.  */
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 5d830e0db15..1963eb01ed7 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -157,6 +157,7 @@
    UNSPEC_HASHST
    UNSPEC_HASHCHK
    UNSPEC_XXSPLTIDP_CONST
+   UNSPEC_XXSPLTIW_CONST
   ])
 
 ;;
@@ -8232,6 +8233,15 @@
   [(set_attr "type" "vecperm")
    (set_attr "prefixed" "yes")])
 
+(define_insn "xxspltiw_<mode>_internal"
+  [(set (match_operand:SFDF 0 "register_operand" "=wa")
+	(unspec:SFDF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
+		     UNSPEC_XXSPLTIW_CONST))]
+  "TARGET_POWER10"
+  "xxspltidp %x0,%1"
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
 (define_split
   [(set (match_operand:SFDF 0 "vsx_register_operand")
 	(match_operand:SFDF 1 "vsx_prefixed_constant"))]
@@ -8252,6 +8262,13 @@
       DONE;
     }
 
+  if (vec_const_use_xxspltiw (&vec_const))
+    {
+      rtx imm = GEN_INT (vec_const.words[0]);
+      emit_insn (gen_xxspltiw_<mode>_internal (dest, imm));
+      DONE;
+    }
+
   else
     gcc_unreachable ();
 })
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 1d7ce4cc94a..332f61be0ba 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -644,6 +644,10 @@ mxxspltidp
 Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
 Generate (do not generate) XXSPLTIDP instructions.
 
+mxxspltiw
+Target Undocumented Var(TARGET_XXSPLTIW) Init(1) Save
+Generate (do not generate) XXSPLTIW instructions.
+
 -param=rs6000-density-pct-threshold=
 Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) IntegerRange(0, 100) Param
 When costing for loop vectorization, we probably need to penalize the loop body
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
new file mode 100644
index 00000000000..2707d86e6fd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V16HI vector constants where the
+   first 4 elements are the same as the next 4 elements, etc.  */
+
+vector unsigned char
+v16qi_const_1 (void)
+{
+  return (vector unsigned char) { 1, 1, 1, 1, 1, 1, 1, 1,
+				  1, 1, 1, 1, 1, 1, 1, 1, }; /* VSLTPISB.  */
+}
+
+vector unsigned char
+v16qi_const_2 (void)
+{
+  return (vector unsigned char) { 1, 2, 3, 4, 1, 2, 3, 4,
+				  1, 2, 3, 4, 1, 2, 3, 4, }; /* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}              1 } } */
+/* { dg-final { scan-assembler-times {\mvspltisb\M|\mxxspltib\M} 1 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}                   } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}                    } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
new file mode 100644
index 00000000000..05d4ee3f5cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
@@ -0,0 +1,67 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SF vector constants.  */
+
+vector float
+v4sf_const_1 (void)
+{
+  return (vector float) { 1.0f, 1.0f, 1.0f, 1.0f };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_nan (void)
+{
+  return (vector float) { __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf ("") };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_inf (void)
+{
+  return (vector float) { __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff () };		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_m0 (void)
+{
+  return (vector float) { -0.0f, -0.0f, -0.0f, -0.0f };	/* XXSPLTIB/VSLW.  */
+}
+
+vector float
+v4sf_splats_1 (void)
+{
+  return vec_splats (1.0f);				/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_nan (void)
+{
+  return vec_splats (__builtin_nanf (""));		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_inf (void)
+{
+  return vec_splats (__builtin_inff ());		/* XXSPLTIW.  */
+}
+
+vector float
+v8hi_splats_m0 (void)
+{
+  return vec_splats (-0.0f);				/* XXSPLTIB/VSLW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 6 } } */
+/* { dg-final { scan-assembler-times {\mxxspltib\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvslw\M}     2 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}      } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}       } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
new file mode 100644
index 00000000000..da909e948b2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VEXTSB2W) is not done.  */
+
+vector int
+v4si_const_1 (void)
+{
+  return (vector int) { 1, 1, 1, 1 };			/* VSLTPISW.  */
+}
+
+vector int
+v4si_const_126 (void)
+{
+  return (vector int) { 126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector int
+v4si_const_1023 (void)
+{
+  return (vector int) { 1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector int
+v4si_splats_1 (void)
+{
+  return vec_splats (1);				/* VSLTPISW.  */
+}
+
+vector int
+v4si_splats_126 (void)
+{
+  return vec_splats (126);				/* XXSPLTIW.  */
+}
+
+vector int
+v8hi_splats_1023 (void)
+{
+  return vec_splats (1023);				/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mvspltisw\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvextsb2w\M}    } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
new file mode 100644
index 00000000000..290e05d4a64
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
@@ -0,0 +1,62 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V8HI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VUPKLSB) is not done.  */
+
+vector short
+v8hi_const_1 (void)
+{
+  return (vector short) { 1, 1, 1, 1, 1, 1, 1, 1 };	/* VSLTPISH.  */
+}
+
+vector short
+v8hi_const_126 (void)
+{
+  return (vector short) { 126, 126, 126, 126,
+			  126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_const_1023 (void)
+{
+  return (vector short) { 1023, 1023, 1023, 1023,
+			  1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1 (void)
+{
+  return vec_splats ((short)1);				/* VSLTPISH.  */
+}
+
+vector short
+v8hi_splats_126 (void)
+{
+  return vec_splats ((short)126);			/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1023 (void)
+{
+  return vec_splats ((short)1023);			/* XXSPLTIW.  */
+}
+
+/* Test that we can optimiza V8HI where all of the even elements are the same
+   and all of the odd elements are the same.  */
+vector short
+v8hi_const_1023_1000 (void)
+{
+  return (vector short) { 1023, 1000, 1023, 1000,
+			  1023, 1000, 1023, 1000 };	/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  5 } } */
+/* { dg-final { scan-assembler-times {\mvspltish\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvupklsb\M}     } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
index 5f84930e1a7..6c01666b625 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -149,7 +149,7 @@ main (int argc, char *argv [])
   return 0;
 }
 
-/* { dg-final { scan-assembler-times {\mxxspltiw\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mxxspltidp\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [gcc(refs/users/meissner/heads/work071)] Generate XXSPLTIW on power10.
@ 2021-10-16  1:36 Michael Meissner
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2021-10-16  1:36 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:66c5ef8a3e48d022a1a434b8473f6396be2a9b62

commit 66c5ef8a3e48d022a1a434b8473f6396be2a9b62
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Fri Oct 15 21:35:41 2021 -0400

    Generate XXSPLTIW on power10.
    
    This patch adds support to automatically generate the ISA 3.1 XXSPLTIW
    instruction for V8HImode, V4SImode, and V4SFmode vectors.  It does this by
    adding support for vector constants that can be used, and adding a
    VEC_DUPLICATE pattern to generate the actual XXSPLTIW instruction.
    
    The eV constraint added with the XXSPLTIDP patch will also recognize use
    of the XXSPLTIW instruction.  I have not updated the eS constraint because
    right now I didn't add support to use XXSPLTIW to load SImode and HImode
    constants into vector registers.
    
    I rewrote the XXSPLTW built-in functions to use VEC_DUPLICATE instead of
    UNSPEC.
    
    I added 4 new tests to test loading up V16QI, V8HI, V4SI, and V4SF vector
    constants.
    
    2021-10-15  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/predicates.md (easy_fp_constant): Add support for
            XXSPLTIW.
            (easy_vector_constant_prefixed): Likewise.
            (easy_vector_constant): Likewise.
            * config/rs6000/rs6000-protos.h (rs6000_vec_const): Add field for
            XXSPLTIW.
            (vec_const_use_xxspltiw): New declaration.
            * config/rs6000/rs6000.c (xxspltib_constant_p): If we can generate
            XXSPLTIW, don't do XXSPLTIB and sign extend.
            (output_vec_const_move): Add support for XXSPLTIW.
            (prefixed_xxsplti_p): Recognize XXSPLTIW instructions as
            prefixed.
            (vec_const_simple_constant): New function.
            (vec_const_use_xxspltiw): New function.
            * config/rs6000/rs6000.opt (-mxxspltiw): New debug switch.
            * config/rs6000/vsx.md (vsx_mov<mode>_64bit): Update comment.
            (vsx_mov<mode>_32bit): Likewise.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/vec-splat-constant-v16qi.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4si.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v8hi.c: New test.
            * gcc.target/powerpc/vec-splati-runnable.c: Update insn count.

Diff:
---
 gcc/config/rs6000/predicates.md                    |  11 +-
 gcc/config/rs6000/rs6000-protos.h                  |   2 +
 gcc/config/rs6000/rs6000.c                         | 120 +++++++++++++++++++--
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           |   4 +-
 .../gcc.target/powerpc/vec-splat-constant-v16qi.c  |  27 +++++
 .../gcc.target/powerpc/vec-splat-constant-v4sf.c   |  67 ++++++++++++
 .../gcc.target/powerpc/vec-splat-constant-v4si.c   |  51 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-v8hi.c   |  62 +++++++++++
 .../gcc.target/powerpc/vec-splati-runnable.c       |   2 +-
 10 files changed, 340 insertions(+), 10 deletions(-)

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 517ce08f03d..252abbbaf9a 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -611,6 +611,9 @@
 
       if (vec_const_use_xxspltidp (&vec_const))
 	return true;
+
+      if (vec_const_use_xxspltiw (&vec_const))
+	return true;
     }
 
   /* Otherwise consider floating point constants hard, so that the
@@ -644,7 +647,7 @@
 })
 
 ;; Return 1 if the operand is a scalar constant that can be loaded to a VSX
-;; register with one prefixed instruction, such as XXSPLTIDP.
+;; register with one prefixed instruction, such as XXSPLTIDP or XXSPLTIW.
 ;;
 ;; We have to have separate predicates and constraints for scalars and vectors,
 ;; otherwise things get messed up with TImode when you try to load very large
@@ -666,6 +669,9 @@
   if (vec_const_use_xxspltidp (&vec_const))
     return true;
 
+  if (vec_const_use_xxspltiw (&vec_const))
+    return true;
+
   return false;
 })
 
@@ -744,6 +750,9 @@
 
 	  if (vec_const_use_xxspltidp (&vec_const))
 	    return true;
+
+	  if (vec_const_use_xxspltiw (&vec_const))
+	    return true;
 	}
 
       return easy_altivec_constant (op, mode);
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 6e8b81cb134..52f094dd410 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -240,11 +240,13 @@ typedef struct {
   unsigned char bytes[VECTOR_CONST_BYTES];
   machine_mode orig_mode;		/* Original mode.  */
   unsigned int xxspltidp_immediate;	/* Immediate value for XXSPLTIDP.  */
+  unsigned int xxspltiw_immediate;	/* Immediate value for XXSPLTIW.  */
   unsigned int lxvkq_immediate;		/* Immediate to use with LXVKQ.  */
 } rs6000_vec_const;
 
 extern bool vec_const_to_bytes (rtx, machine_mode, rs6000_vec_const *);
 extern bool vec_const_use_xxspltidp (rs6000_vec_const *);
+extern bool vec_const_use_xxspltiw (rs6000_vec_const *);
 extern bool vec_const_use_lxvkq (rs6000_vec_const *);
 #endif /* RTX_CODE */
 
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index d238dd84fe7..838161fb23a 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6925,12 +6925,17 @@ xxspltib_constant_p (rtx op,
   else
     return false;
 
-  /* See if we could generate vspltisw/vspltish directly instead of xxspltib +
-     sign extend.  Special case 0/-1 to allow getting any VSX register instead
-     of an Altivec register.  */
-  if ((mode == V4SImode || mode == V8HImode) && !IN_RANGE (value, -1, 0)
-      && EASY_VECTOR_15 (value))
-    return false;
+  /* See if we could generate vspltisw/vspltish/xxspltiw directly instead of
+     xxspltib + sign extend.  Special case 0/-1 to allow getting any VSX
+     register instead of an Altivec register.  */
+  if ((mode == V4SImode || mode == V8HImode) && !IN_RANGE (value, -1, 0))
+    {
+      if (EASY_VECTOR_15 (value))
+	return false;
+
+      if (TARGET_XXSPLTIW && TARGET_PREFIXED && TARGET_VSX)
+	return false;
+    }
 
   /* Return # of instructions and the constant byte for XXSPLTIB.  */
   if (mode == V16QImode)
@@ -7004,6 +7009,52 @@ output_vec_const_move (rtx *operands)
 	      operands[2] = GEN_INT (vec_const.xxspltidp_immediate);
 	      return "xxspltidp %x0,%2";
 	    }
+
+	  if (vec_const_use_xxspltiw (&vec_const))
+	    {
+	      HOST_WIDE_INT imm = vec_const.xxspltiw_immediate;
+
+	      /* See if we can generate the shorter VSPLTISB, VSPLTISH, or
+		 VSPLTISW instead of XXSPLTIW.  */
+	      if (dest_vmx_p)
+		{
+		  HOST_WIDE_INT sign_imm
+		    = ((imm & 0xffffffff) ^ 0x80000000) - 0x80000000;
+
+		  if (EASY_VECTOR_15 (sign_imm))
+		    {
+		      operands[2] = GEN_INT (sign_imm);
+		      return "vspltisw %0,%2";
+		    }
+
+		  if (vec_const.bytes[0] == vec_const.bytes[1]
+		      && vec_const.bytes[0] == vec_const.bytes[2]
+		      && vec_const.bytes[0] == vec_const.bytes[3])
+		    {
+		      HOST_WIDE_INT sign_imm8 = ((imm & 0xff) ^ 0x80) - 0x80;
+		      if (EASY_VECTOR_15 (sign_imm8))
+			{
+			  operands[2] = GEN_INT (sign_imm8);
+			  return "vspltisb %0,%2";
+			}
+		    }
+
+		  if (vec_const.h_words[0] == vec_const.h_words[1])
+		    {
+		      HOST_WIDE_INT sign_imm16
+			= ((imm & 0xffff) ^ 0x8000) - 0x8000;
+
+		      if (EASY_VECTOR_15 (sign_imm16))
+			{
+			  operands[2] = GEN_INT (sign_imm16);
+			  return "vspltish %0,%2";
+			}
+		    }
+		}
+
+	      operands[2] = GEN_INT (imm);
+	      return "xxspltiw %x0,%2";
+	    }
 	}
 
       if (TARGET_P9_VECTOR
@@ -26770,6 +26821,9 @@ prefixed_xxsplti_p (rtx_insn *insn)
     {
       if (vec_const_use_xxspltidp (&vec_const))
 	return true;
+
+      if (vec_const_use_xxspltiw (&vec_const))
+	return true;
     }
 
   return false;
@@ -28784,6 +28838,60 @@ vec_const_use_xxspltidp (rs6000_vec_const *vec_const)
   return true;
 }
 
+/* Determine if a vector constant can be loaded with XXSPLTIW.  If so,
+   fill out the fields used to generate the instruction.  */
+
+bool
+vec_const_use_xxspltiw (rs6000_vec_const *vec_const)
+{
+  if (!TARGET_XXSPLTIW || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  /* Make sure that each of the 4 32-bit segments are the same.  */
+  unsigned int value = vec_const->words[0];
+  if (value != vec_const->words[1]
+      || value != vec_const->words[2]
+      || value != vec_const->words[3])
+    return false;
+
+  /* Avoid values that are easy to create with other instructions (0.0 for
+     floating point, and values that can be loaded with VSPLTISW, VSPLTISH,
+     VSPLTISB, or XXSPLTISB.  */
+  if (value == 0)
+    return false;
+
+  machine_mode mode = vec_const->orig_mode;
+  if (mode == VOIDmode)
+    mode = SImode;
+
+  if (!FLOAT_MODE_P (mode))
+    {
+      /* Can we use VSPLTISW to load the constant?  */
+      int sign_value = ((value & 0xffffffff) ^ 0x80000000) - 0x80000000;
+      if (EASY_VECTOR_15 (sign_value))
+	return false;
+
+      /* Can we use VSPLTISH to load the constant?  */
+      if (vec_const->h_words[0] == vec_const->h_words[1])
+	{
+	  int sign_value16 = ((value & 0xffff) ^ 0x8000) - 0x8000;
+	  if (EASY_VECTOR_15 (sign_value16))
+	    return false;
+	}
+
+      /* Can we use XXSPLTISB/VSPLTISB to load the constant?  */
+      if (vec_const->bytes[0] == vec_const->bytes[1]
+	  && vec_const->bytes[0] == vec_const->bytes[2]
+	  && vec_const->bytes[0] == vec_const->bytes[3])
+	return false;
+    }
+
+  /* Record the immediate in the vec_const structure for XXSPLTIW.  */
+  vec_const->xxspltiw_immediate = value;
+
+  return true;
+}
+
 /* Determine if a vector constant can be loaded with LXVKQ.  If so, fill out
    the fields used to generate the instruction.  */
 
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index c9eb78952d6..015bf91b6d5 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -644,6 +644,10 @@ mxxspltidp
 Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
 Generate (do not generate) XXSPLTIDP instructions.
 
+mxxspltiw
+Target Undocumented Var(TARGET_XXSPLTIW) Init(1) Save
+Generate (do not generate) XXSPLTIW instructions.
+
 mlxvkq
 Target Undocumented Var(TARGET_LXVKQ) Init(1) Save
 Generate (do not generate) LXVKQ instructions.
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 15a22525000..07b0b671920 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1192,7 +1192,7 @@
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
-;;              XXLSPLTIDP LXVKQ
+;;              XXLSPLTI*  LXVKQ
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
@@ -1241,7 +1241,7 @@
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
 ;;              XXSPLTIB   VSPLTISW   VSX 0/-1
-;;              XXSPLTIDP  LXVKQ
+;;              XXSPLTI*   LXVKQ
 ;;              VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
new file mode 100644
index 00000000000..2707d86e6fd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V16HI vector constants where the
+   first 4 elements are the same as the next 4 elements, etc.  */
+
+vector unsigned char
+v16qi_const_1 (void)
+{
+  return (vector unsigned char) { 1, 1, 1, 1, 1, 1, 1, 1,
+				  1, 1, 1, 1, 1, 1, 1, 1, }; /* VSLTPISB.  */
+}
+
+vector unsigned char
+v16qi_const_2 (void)
+{
+  return (vector unsigned char) { 1, 2, 3, 4, 1, 2, 3, 4,
+				  1, 2, 3, 4, 1, 2, 3, 4, }; /* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}              1 } } */
+/* { dg-final { scan-assembler-times {\mvspltisb\M|\mxxspltib\M} 1 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}                   } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}                    } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
new file mode 100644
index 00000000000..05d4ee3f5cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
@@ -0,0 +1,67 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SF vector constants.  */
+
+vector float
+v4sf_const_1 (void)
+{
+  return (vector float) { 1.0f, 1.0f, 1.0f, 1.0f };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_nan (void)
+{
+  return (vector float) { __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf ("") };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_inf (void)
+{
+  return (vector float) { __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff () };		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_m0 (void)
+{
+  return (vector float) { -0.0f, -0.0f, -0.0f, -0.0f };	/* XXSPLTIB/VSLW.  */
+}
+
+vector float
+v4sf_splats_1 (void)
+{
+  return vec_splats (1.0f);				/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_nan (void)
+{
+  return vec_splats (__builtin_nanf (""));		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_inf (void)
+{
+  return vec_splats (__builtin_inff ());		/* XXSPLTIW.  */
+}
+
+vector float
+v8hi_splats_m0 (void)
+{
+  return vec_splats (-0.0f);				/* XXSPLTIB/VSLW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 6 } } */
+/* { dg-final { scan-assembler-times {\mxxspltib\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvslw\M}     2 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}      } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}       } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
new file mode 100644
index 00000000000..da909e948b2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VEXTSB2W) is not done.  */
+
+vector int
+v4si_const_1 (void)
+{
+  return (vector int) { 1, 1, 1, 1 };			/* VSLTPISW.  */
+}
+
+vector int
+v4si_const_126 (void)
+{
+  return (vector int) { 126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector int
+v4si_const_1023 (void)
+{
+  return (vector int) { 1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector int
+v4si_splats_1 (void)
+{
+  return vec_splats (1);				/* VSLTPISW.  */
+}
+
+vector int
+v4si_splats_126 (void)
+{
+  return vec_splats (126);				/* XXSPLTIW.  */
+}
+
+vector int
+v8hi_splats_1023 (void)
+{
+  return vec_splats (1023);				/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mvspltisw\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvextsb2w\M}    } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
new file mode 100644
index 00000000000..290e05d4a64
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
@@ -0,0 +1,62 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V8HI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VUPKLSB) is not done.  */
+
+vector short
+v8hi_const_1 (void)
+{
+  return (vector short) { 1, 1, 1, 1, 1, 1, 1, 1 };	/* VSLTPISH.  */
+}
+
+vector short
+v8hi_const_126 (void)
+{
+  return (vector short) { 126, 126, 126, 126,
+			  126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_const_1023 (void)
+{
+  return (vector short) { 1023, 1023, 1023, 1023,
+			  1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1 (void)
+{
+  return vec_splats ((short)1);				/* VSLTPISH.  */
+}
+
+vector short
+v8hi_splats_126 (void)
+{
+  return vec_splats ((short)126);			/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1023 (void)
+{
+  return vec_splats ((short)1023);			/* XXSPLTIW.  */
+}
+
+/* Test that we can optimiza V8HI where all of the even elements are the same
+   and all of the odd elements are the same.  */
+vector short
+v8hi_const_1023_1000 (void)
+{
+  return (vector short) { 1023, 1000, 1023, 1000,
+			  1023, 1000, 1023, 1000 };	/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  5 } } */
+/* { dg-final { scan-assembler-times {\mvspltish\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvupklsb\M}     } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
index 5f84930e1a7..6c01666b625 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -149,7 +149,7 @@ main (int argc, char *argv [])
   return 0;
 }
 
-/* { dg-final { scan-assembler-times {\mxxspltiw\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mxxspltidp\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [gcc(refs/users/meissner/heads/work071)] Generate XXSPLTIW on power10.
@ 2021-10-15  5:29 Michael Meissner
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2021-10-15  5:29 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:40f197fd7e541e118a2d390cd5d6341d97ccb4de

commit 40f197fd7e541e118a2d390cd5d6341d97ccb4de
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Fri Oct 15 01:29:07 2021 -0400

    Generate XXSPLTIW on power10.
    
    This patch adds support to automatically generate the ISA 3.1 XXSPLTIW
    instruction for V8HImode, V4SImode, and V4SFmode vectors.  It does this by
    adding support for vector constants that can be used, and adding a
    VEC_DUPLICATE pattern to generate the actual XXSPLTIW instruction.
    
    The eV constraint added with the XXSPLTIDP patch will also recognize use
    of the XXSPLTIW instruction.  I have not updated the eS constraint because
    right now I didn't add support to use XXSPLTIW to load SImode and HImode
    constants into vector registers.
    
    I rewrote the XXSPLTW built-in functions to use VEC_DUPLICATE instead of
    UNSPEC.
    
    I added 4 new tests to test loading up V16QI, V8HI, V4SI, and V4SF vector
    constants.
    
    2021-10-15  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/predicates.md (easy_fp_constant): Add support for
            XXSPLTIW.
            (easy_vector_constant_prefixed): Likewise.
            (easy_vector_constant): Likewise.
            * config/rs6000/rs6000-protos.h (rs6000_vec_const): Add field for
            XXSPLTIW.
            (vec_const_use_xxspltiw): New declaration.
            * config/rs6000/rs6000.c (xxspltib_constant_p): If we can generate
            XXSPLTIW, don't do XXSPLTIB and sign extend.
            (output_vec_const_move): Add support for XXSPLTIW.
            (prefixed_xxsplti_p): Recognize XXSPLTIW instructions as
            prefixed.
            (vec_const_use_xxspltiw): New function.
            * config/rs6000/rs6000.opt (-mxxspltiw): New debug switch.
            * config/rs6000/vsx.md (vsx_mov<mode>_64bit): Update comment.
            (vsx_mov<mode>_32bit): Likewise.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/vec-splat-constant-v16qi.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4si.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v8hi.c: New test.
            * gcc.target/powerpc/vec-splati-runnable.c: Update insn count.

Diff:
---
 gcc/config/rs6000/predicates.md                    |  11 +-
 gcc/config/rs6000/rs6000-protos.h                  |   2 +
 gcc/config/rs6000/rs6000.c                         | 119 +++++++++++++++++++--
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           |   4 +-
 .../gcc.target/powerpc/vec-splat-constant-v16qi.c  |  27 +++++
 .../gcc.target/powerpc/vec-splat-constant-v4sf.c   |  67 ++++++++++++
 .../gcc.target/powerpc/vec-splat-constant-v4si.c   |  51 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-v8hi.c   |  62 +++++++++++
 .../gcc.target/powerpc/vec-splati-runnable.c       |   2 +-
 10 files changed, 339 insertions(+), 10 deletions(-)

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 517ce08f03d..252abbbaf9a 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -611,6 +611,9 @@
 
       if (vec_const_use_xxspltidp (&vec_const))
 	return true;
+
+      if (vec_const_use_xxspltiw (&vec_const))
+	return true;
     }
 
   /* Otherwise consider floating point constants hard, so that the
@@ -644,7 +647,7 @@
 })
 
 ;; Return 1 if the operand is a scalar constant that can be loaded to a VSX
-;; register with one prefixed instruction, such as XXSPLTIDP.
+;; register with one prefixed instruction, such as XXSPLTIDP or XXSPLTIW.
 ;;
 ;; We have to have separate predicates and constraints for scalars and vectors,
 ;; otherwise things get messed up with TImode when you try to load very large
@@ -666,6 +669,9 @@
   if (vec_const_use_xxspltidp (&vec_const))
     return true;
 
+  if (vec_const_use_xxspltiw (&vec_const))
+    return true;
+
   return false;
 })
 
@@ -744,6 +750,9 @@
 
 	  if (vec_const_use_xxspltidp (&vec_const))
 	    return true;
+
+	  if (vec_const_use_xxspltiw (&vec_const))
+	    return true;
 	}
 
       return easy_altivec_constant (op, mode);
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 6e8b81cb134..52f094dd410 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -240,11 +240,13 @@ typedef struct {
   unsigned char bytes[VECTOR_CONST_BYTES];
   machine_mode orig_mode;		/* Original mode.  */
   unsigned int xxspltidp_immediate;	/* Immediate value for XXSPLTIDP.  */
+  unsigned int xxspltiw_immediate;	/* Immediate value for XXSPLTIW.  */
   unsigned int lxvkq_immediate;		/* Immediate to use with LXVKQ.  */
 } rs6000_vec_const;
 
 extern bool vec_const_to_bytes (rtx, machine_mode, rs6000_vec_const *);
 extern bool vec_const_use_xxspltidp (rs6000_vec_const *);
+extern bool vec_const_use_xxspltiw (rs6000_vec_const *);
 extern bool vec_const_use_lxvkq (rs6000_vec_const *);
 #endif /* RTX_CODE */
 
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index d238dd84fe7..4400e344787 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6925,12 +6925,17 @@ xxspltib_constant_p (rtx op,
   else
     return false;
 
-  /* See if we could generate vspltisw/vspltish directly instead of xxspltib +
-     sign extend.  Special case 0/-1 to allow getting any VSX register instead
-     of an Altivec register.  */
-  if ((mode == V4SImode || mode == V8HImode) && !IN_RANGE (value, -1, 0)
-      && EASY_VECTOR_15 (value))
-    return false;
+  /* See if we could generate vspltisw/vspltish/xxspltiw directly instead of
+     xxspltib + sign extend.  Special case 0/-1 to allow getting any VSX
+     register instead of an Altivec register.  */
+  if ((mode == V4SImode || mode == V8HImode) && !IN_RANGE (value, -1, 0))
+    {
+      if (EASY_VECTOR_15 (value))
+	return false;
+
+      if (TARGET_XXSPLTIW && TARGET_PREFIXED && TARGET_VSX)
+	return false;
+    }
 
   /* Return # of instructions and the constant byte for XXSPLTIB.  */
   if (mode == V16QImode)
@@ -7004,6 +7009,52 @@ output_vec_const_move (rtx *operands)
 	      operands[2] = GEN_INT (vec_const.xxspltidp_immediate);
 	      return "xxspltidp %x0,%2";
 	    }
+
+	  if (vec_const_use_xxspltiw (&vec_const))
+	    {
+	      HOST_WIDE_INT imm = vec_const.xxspltiw_immediate;
+
+	      /* See if we can generate the shorter VSPLTISB, VSPLTISH, or
+		 VSPLTISW instead of XXSPLTIW.  */
+	      if (dest_vmx_p)
+		{
+		  HOST_WIDE_INT sign_imm
+		    = ((imm & 0xffffffff) ^ 0x80000000) - 0x80000000;
+
+		  if (EASY_VECTOR_15 (sign_imm))
+		    {
+		      operands[2] = GEN_INT (sign_imm);
+		      return "vspltisw %0,%2";
+		    }
+
+		  if (vec_const.bytes[0] == vec_const.bytes[1]
+		      && vec_const.bytes[0] == vec_const.bytes[2]
+		      && vec_const.bytes[0] == vec_const.bytes[3])
+		    {
+		      HOST_WIDE_INT sign_imm8 = ((imm & 0xff) ^ 0x80) - 0x80;
+		      if (EASY_VECTOR_15 (sign_imm8))
+			{
+			  operands[2] = GEN_INT (sign_imm8);
+			  return "vspltisb %0,%2";
+			}
+		    }
+
+		  if (vec_const.h_words[0] == vec_const.h_words[1])
+		    {
+		      HOST_WIDE_INT sign_imm16
+			= ((imm & 0xffff) ^ 0x8000) - 0x8000;
+
+		      if (EASY_VECTOR_15 (sign_imm16))
+			{
+			  operands[2] = GEN_INT (sign_imm16);
+			  return "vspltish %0,%2";
+			}
+		    }
+		}
+
+	      operands[2] = GEN_INT (imm);
+	      return "xxspltiw %x0,%2";
+	    }
 	}
 
       if (TARGET_P9_VECTOR
@@ -26770,6 +26821,9 @@ prefixed_xxsplti_p (rtx_insn *insn)
     {
       if (vec_const_use_xxspltidp (&vec_const))
 	return true;
+
+      if (vec_const_use_xxspltiw (&vec_const))
+	return true;
     }
 
   return false;
@@ -28784,6 +28838,59 @@ vec_const_use_xxspltidp (rs6000_vec_const *vec_const)
   return true;
 }
 
+/* Determine if a vector constant can be loaded with XXSPLTIW.  If so,
+   fill out the fields used to generate the instruction.  */
+
+bool
+vec_const_use_xxspltiw (rs6000_vec_const *vec_const)
+{
+  if (!TARGET_XXSPLTIW || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  /* Make sure that each of the 4 32-bit segments are the same.  */
+  unsigned int value = vec_const->words[0];
+  if (value != vec_const->words[1]
+      || value != vec_const->words[2]
+      || value != vec_const->words[3])
+    return false;
+
+  /* Avoid values that are easy to create with other instructions (0.0 for
+     floating point, and values that can be loaded with VSPLTIW..  */
+  if (value == 0)
+    return false;
+
+  machine_mode mode = vec_const->orig_mode;
+  if (mode == VOIDmode)
+    mode = SImode;
+
+  if (!FLOAT_MODE_P (mode))
+    {
+      /* Can we use VSPLTISW to load the constant?  */
+      int sign_value = ((value & 0xffffffff) ^ 0x80000000) - 0x80000000;
+      if (EASY_VECTOR_15 (sign_value))
+	return false;
+
+      /* Can we use XXSPLTISB to load the constant?  */
+      if (vec_const->bytes[0] == vec_const->bytes[1]
+	  && vec_const->bytes[0] == vec_const->bytes[2]
+	  && vec_const->bytes[0] == vec_const->bytes[3])
+	return false;
+
+      /* Can we use VSPLTISH to load the constant?  */
+      if (vec_const->h_words[0] == vec_const->h_words[1])
+	{
+	  int sign_value16 = ((value & 0xffff) ^ 0x8000) - 0x8000;
+	  if (EASY_VECTOR_15 (sign_value16))
+	    return false;
+	}
+    }
+
+  /* Record the immediate in the vec_const structure for XXSPLTIW.  */
+  vec_const->xxspltiw_immediate = value;
+
+  return true;
+}
+
 /* Determine if a vector constant can be loaded with LXVKQ.  If so, fill out
    the fields used to generate the instruction.  */
 
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index c9eb78952d6..015bf91b6d5 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -644,6 +644,10 @@ mxxspltidp
 Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
 Generate (do not generate) XXSPLTIDP instructions.
 
+mxxspltiw
+Target Undocumented Var(TARGET_XXSPLTIW) Init(1) Save
+Generate (do not generate) XXSPLTIW instructions.
+
 mlxvkq
 Target Undocumented Var(TARGET_LXVKQ) Init(1) Save
 Generate (do not generate) LXVKQ instructions.
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 15a22525000..07b0b671920 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1192,7 +1192,7 @@
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
-;;              XXLSPLTIDP LXVKQ
+;;              XXLSPLTI*  LXVKQ
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
@@ -1241,7 +1241,7 @@
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
 ;;              XXSPLTIB   VSPLTISW   VSX 0/-1
-;;              XXSPLTIDP  LXVKQ
+;;              XXSPLTI*   LXVKQ
 ;;              VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
new file mode 100644
index 00000000000..2707d86e6fd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V16HI vector constants where the
+   first 4 elements are the same as the next 4 elements, etc.  */
+
+vector unsigned char
+v16qi_const_1 (void)
+{
+  return (vector unsigned char) { 1, 1, 1, 1, 1, 1, 1, 1,
+				  1, 1, 1, 1, 1, 1, 1, 1, }; /* VSLTPISB.  */
+}
+
+vector unsigned char
+v16qi_const_2 (void)
+{
+  return (vector unsigned char) { 1, 2, 3, 4, 1, 2, 3, 4,
+				  1, 2, 3, 4, 1, 2, 3, 4, }; /* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}              1 } } */
+/* { dg-final { scan-assembler-times {\mvspltisb\M|\mxxspltib\M} 1 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}                   } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}                    } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
new file mode 100644
index 00000000000..05d4ee3f5cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
@@ -0,0 +1,67 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SF vector constants.  */
+
+vector float
+v4sf_const_1 (void)
+{
+  return (vector float) { 1.0f, 1.0f, 1.0f, 1.0f };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_nan (void)
+{
+  return (vector float) { __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf ("") };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_inf (void)
+{
+  return (vector float) { __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff () };		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_m0 (void)
+{
+  return (vector float) { -0.0f, -0.0f, -0.0f, -0.0f };	/* XXSPLTIB/VSLW.  */
+}
+
+vector float
+v4sf_splats_1 (void)
+{
+  return vec_splats (1.0f);				/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_nan (void)
+{
+  return vec_splats (__builtin_nanf (""));		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_inf (void)
+{
+  return vec_splats (__builtin_inff ());		/* XXSPLTIW.  */
+}
+
+vector float
+v8hi_splats_m0 (void)
+{
+  return vec_splats (-0.0f);				/* XXSPLTIB/VSLW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 6 } } */
+/* { dg-final { scan-assembler-times {\mxxspltib\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvslw\M}     2 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}      } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}       } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
new file mode 100644
index 00000000000..da909e948b2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VEXTSB2W) is not done.  */
+
+vector int
+v4si_const_1 (void)
+{
+  return (vector int) { 1, 1, 1, 1 };			/* VSLTPISW.  */
+}
+
+vector int
+v4si_const_126 (void)
+{
+  return (vector int) { 126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector int
+v4si_const_1023 (void)
+{
+  return (vector int) { 1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector int
+v4si_splats_1 (void)
+{
+  return vec_splats (1);				/* VSLTPISW.  */
+}
+
+vector int
+v4si_splats_126 (void)
+{
+  return vec_splats (126);				/* XXSPLTIW.  */
+}
+
+vector int
+v8hi_splats_1023 (void)
+{
+  return vec_splats (1023);				/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mvspltisw\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvextsb2w\M}    } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
new file mode 100644
index 00000000000..290e05d4a64
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
@@ -0,0 +1,62 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V8HI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VUPKLSB) is not done.  */
+
+vector short
+v8hi_const_1 (void)
+{
+  return (vector short) { 1, 1, 1, 1, 1, 1, 1, 1 };	/* VSLTPISH.  */
+}
+
+vector short
+v8hi_const_126 (void)
+{
+  return (vector short) { 126, 126, 126, 126,
+			  126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_const_1023 (void)
+{
+  return (vector short) { 1023, 1023, 1023, 1023,
+			  1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1 (void)
+{
+  return vec_splats ((short)1);				/* VSLTPISH.  */
+}
+
+vector short
+v8hi_splats_126 (void)
+{
+  return vec_splats ((short)126);			/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1023 (void)
+{
+  return vec_splats ((short)1023);			/* XXSPLTIW.  */
+}
+
+/* Test that we can optimiza V8HI where all of the even elements are the same
+   and all of the odd elements are the same.  */
+vector short
+v8hi_const_1023_1000 (void)
+{
+  return (vector short) { 1023, 1000, 1023, 1000,
+			  1023, 1000, 1023, 1000 };	/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  5 } } */
+/* { dg-final { scan-assembler-times {\mvspltish\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvupklsb\M}     } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
index 5f84930e1a7..6c01666b625 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -149,7 +149,7 @@ main (int argc, char *argv [])
   return 0;
 }
 
-/* { dg-final { scan-assembler-times {\mxxspltiw\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mxxspltidp\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [gcc(refs/users/meissner/heads/work071)] Generate XXSPLTIW on power10.
@ 2021-10-15  4:16 Michael Meissner
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2021-10-15  4:16 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:edbeef13690c41cd38ae97108811e48548df03e2

commit edbeef13690c41cd38ae97108811e48548df03e2
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Fri Oct 15 00:16:20 2021 -0400

    Generate XXSPLTIW on power10.
    
    This patch adds support to automatically generate the ISA 3.1 XXSPLTIW
    instruction for V8HImode, V4SImode, and V4SFmode vectors.  It does this by
    adding support for vector constants that can be used, and adding a
    VEC_DUPLICATE pattern to generate the actual XXSPLTIW instruction.
    
    The eV constraint added with the XXSPLTIDP patch will also recognize use
    of the XXSPLTIW instruction.  I have not updated the eS constraint because
    right now I didn't add support to use XXSPLTIW to load SImode and HImode
    constants into vector registers.
    
    I rewrote the XXSPLTW built-in functions to use VEC_DUPLICATE instead of
    UNSPEC.
    
    I added 4 new tests to test loading up V16QI, V8HI, V4SI, and V4SF vector
    constants.
    
    2021-10-15  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/predicates.md (easy_fp_constant): Add support for
            XXSPLTIW.
            (easy_vector_constant_prefixed): Likewise.
            (easy_vector_constant): Likewise.
            * config/rs6000/rs6000-protos.h (rs6000_vec_const): Add field for
            XXSPLTIW.
            (vec_const_use_xxspltiw): New declaration.
            * config/rs6000/rs6000.c (xxspltib_constant_p): If we can generate
            XXSPLTIW, don't do XXSPLTIB and sign extend.
            (output_vec_const_move): Add support for XXSPLTIW.
            (prefixed_xxsplti_p): Recognize XXSPLTIW instructions as
            prefixed.
            (vec_const_use_xxspltiw): New function.
            * config/rs6000/rs6000.opt (-mxxspltiw): New debug switch.
            * config/rs6000/vsx.md (vsx_mov<mode>_64bit): Update comment.
            (vsx_mov<mode>_32bit): Likewise.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/vec-splat-constant-v16qi.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4si.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v8hi.c: New test.
            * gcc.target/powerpc/vec-splati-runnable.c: Update insn count.

Diff:
---
 gcc/config/rs6000/predicates.md                    |  11 +-
 gcc/config/rs6000/rs6000-protos.h                  |   2 +
 gcc/config/rs6000/rs6000.c                         | 123 ++++++++++++++++++++-
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           |   4 +-
 .../gcc.target/powerpc/vec-splat-constant-v16qi.c  |  27 +++++
 .../gcc.target/powerpc/vec-splat-constant-v4sf.c   |  67 +++++++++++
 .../gcc.target/powerpc/vec-splat-constant-v4si.c   |  51 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-v8hi.c   |  62 +++++++++++
 .../gcc.target/powerpc/vec-splati-runnable.c       |   2 +-
 10 files changed, 343 insertions(+), 10 deletions(-)

diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 517ce08f03d..252abbbaf9a 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -611,6 +611,9 @@
 
       if (vec_const_use_xxspltidp (&vec_const))
 	return true;
+
+      if (vec_const_use_xxspltiw (&vec_const))
+	return true;
     }
 
   /* Otherwise consider floating point constants hard, so that the
@@ -644,7 +647,7 @@
 })
 
 ;; Return 1 if the operand is a scalar constant that can be loaded to a VSX
-;; register with one prefixed instruction, such as XXSPLTIDP.
+;; register with one prefixed instruction, such as XXSPLTIDP or XXSPLTIW.
 ;;
 ;; We have to have separate predicates and constraints for scalars and vectors,
 ;; otherwise things get messed up with TImode when you try to load very large
@@ -666,6 +669,9 @@
   if (vec_const_use_xxspltidp (&vec_const))
     return true;
 
+  if (vec_const_use_xxspltiw (&vec_const))
+    return true;
+
   return false;
 })
 
@@ -744,6 +750,9 @@
 
 	  if (vec_const_use_xxspltidp (&vec_const))
 	    return true;
+
+	  if (vec_const_use_xxspltiw (&vec_const))
+	    return true;
 	}
 
       return easy_altivec_constant (op, mode);
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 6e8b81cb134..52f094dd410 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -240,11 +240,13 @@ typedef struct {
   unsigned char bytes[VECTOR_CONST_BYTES];
   machine_mode orig_mode;		/* Original mode.  */
   unsigned int xxspltidp_immediate;	/* Immediate value for XXSPLTIDP.  */
+  unsigned int xxspltiw_immediate;	/* Immediate value for XXSPLTIW.  */
   unsigned int lxvkq_immediate;		/* Immediate to use with LXVKQ.  */
 } rs6000_vec_const;
 
 extern bool vec_const_to_bytes (rtx, machine_mode, rs6000_vec_const *);
 extern bool vec_const_use_xxspltidp (rs6000_vec_const *);
+extern bool vec_const_use_xxspltiw (rs6000_vec_const *);
 extern bool vec_const_use_lxvkq (rs6000_vec_const *);
 #endif /* RTX_CODE */
 
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index d238dd84fe7..9c24c9bc3f7 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6925,12 +6925,17 @@ xxspltib_constant_p (rtx op,
   else
     return false;
 
-  /* See if we could generate vspltisw/vspltish directly instead of xxspltib +
-     sign extend.  Special case 0/-1 to allow getting any VSX register instead
-     of an Altivec register.  */
-  if ((mode == V4SImode || mode == V8HImode) && !IN_RANGE (value, -1, 0)
-      && EASY_VECTOR_15 (value))
-    return false;
+  /* See if we could generate vspltisw/vspltish/xxspltiw directly instead of
+     xxspltib + sign extend.  Special case 0/-1 to allow getting any VSX
+     register instead of an Altivec register.  */
+  if ((mode == V4SImode || mode == V8HImode) && !IN_RANGE (value, -1, 0))
+    {
+      if (EASY_VECTOR_15 (value))
+	return false;
+
+      if (TARGET_XXSPLTIW && TARGET_PREFIXED && TARGET_VSX)
+	return false;
+    }
 
   /* Return # of instructions and the constant byte for XXSPLTIB.  */
   if (mode == V16QImode)
@@ -7004,6 +7009,52 @@ output_vec_const_move (rtx *operands)
 	      operands[2] = GEN_INT (vec_const.xxspltidp_immediate);
 	      return "xxspltidp %x0,%2";
 	    }
+
+	  if (vec_const_use_xxspltiw (&vec_const))
+	    {
+	      HOST_WIDE_INT imm = vec_const.xxspltiw_immediate;
+
+	      /* See if we can generate the shorter VSPLTISB, VSPLTISH, or
+		 VSPLTISW instead of XXSPLTIW.  */
+	      if (dest_vmx_p)
+		{
+		  HOST_WIDE_INT sign_imm
+		    = ((imm & 0xffffffff) ^ 0x80000000) - 0x80000000;
+
+		  if (EASY_VECTOR_15 (sign_imm))
+		    {
+		      operands[2] = GEN_INT (sign_imm);
+		      return "vspltisw %0,%2";
+		    }
+
+		  if (vec_const.bytes[0] == vec_const.bytes[1]
+		      && vec_const.bytes[0] == vec_const.bytes[2]
+		      && vec_const.bytes[0] == vec_const.bytes[3])
+		    {
+		      HOST_WIDE_INT sign_imm8 = ((imm & 0xff) ^ 0x80) - 0x80;
+		      if (EASY_VECTOR_15 (sign_imm8))
+			{
+			  operands[2] = GEN_INT (sign_imm8);
+			  return "vspltisb %0,%2";
+			}
+		    }
+
+		  if (vec_const.h_words[0] == vec_const.h_words[1])
+		    {
+		      HOST_WIDE_INT sign_imm16
+			= ((imm & 0xffff) ^ 0x8000) - 0x8000;
+
+		      if (EASY_VECTOR_15 (sign_imm16))
+			{
+			  operands[2] = GEN_INT (sign_imm16);
+			  return "vspltish %0,%2";
+			}
+		    }
+		}
+
+	      operands[2] = GEN_INT (imm);
+	      return "xxspltiw %x0,%2";
+	    }
 	}
 
       if (TARGET_P9_VECTOR
@@ -26770,6 +26821,9 @@ prefixed_xxsplti_p (rtx_insn *insn)
     {
       if (vec_const_use_xxspltidp (&vec_const))
 	return true;
+
+      if (vec_const_use_xxspltiw (&vec_const))
+	return true;
     }
 
   return false;
@@ -28784,6 +28838,63 @@ vec_const_use_xxspltidp (rs6000_vec_const *vec_const)
   return true;
 }
 
+/* Determine if a vector constant can be loaded with XXSPLTIW.  If so,
+   fill out the fields used to generate the instruction.  */
+
+bool
+vec_const_use_xxspltiw (rs6000_vec_const *vec_const)
+{
+  if (!TARGET_XXSPLTIW || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  /* Make sure that each of the 4 32-bit segments are the same.  */
+  unsigned int value = vec_const->words[0];
+  if (value != vec_const->words[1]
+      || value != vec_const->words[2]
+      || value != vec_const->words[3])
+    return false;
+
+  /* Avoid values that are easy to create with other instructions (0.0 for
+     floating point, and values that can be loaded with VSPLTIW..  */
+  if (value == 0)
+    return false;
+
+  machine_mode mode = vec_const->orig_mode;
+  if (mode == VOIDmode)
+    mode = SImode;
+
+  if (!FLOAT_MODE_P (mode))
+    {
+      /* Can we use VSPLTISW to load the constant?  */
+      int sign_value = ((value & 0xffffffff) ^ 0x80000000) - 0x80000000;
+      if (EASY_VECTOR_15 (sign_value))
+	return false;
+
+      /* Can we use VSPLTISB to load the constant?  */
+      if (vec_const->bytes[0] == vec_const->bytes[1]
+	  && vec_const->bytes[0] == vec_const->bytes[2]
+	  && vec_const->bytes[0] == vec_const->bytes[3])
+	{
+	  int sign_value8 = ((value & 0xff) ^ 0x80) - 0x80;
+	  if (EASY_VECTOR_15 (sign_value8))
+	    return false;
+	}
+
+      /* Can we use VSPLTISH to load the constant?  */
+      if (vec_const->h_words[0] == vec_const->h_words[1])
+	{
+	  int sign_value16 = ((value & 0xffff) ^ 0x8000) - 0x8000;
+	  if (EASY_VECTOR_15 (sign_value16))
+	    return false;
+	}
+    }
+
+  /* Record the immediate in the vec_const structure for XXSPLTIW.  */
+  vec_const->xxspltiw_immediate = value;
+
+  return true;
+}
+
 /* Determine if a vector constant can be loaded with LXVKQ.  If so, fill out
    the fields used to generate the instruction.  */
 
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index c9eb78952d6..015bf91b6d5 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -644,6 +644,10 @@ mxxspltidp
 Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
 Generate (do not generate) XXSPLTIDP instructions.
 
+mxxspltiw
+Target Undocumented Var(TARGET_XXSPLTIW) Init(1) Save
+Generate (do not generate) XXSPLTIW instructions.
+
 mlxvkq
 Target Undocumented Var(TARGET_LXVKQ) Init(1) Save
 Generate (do not generate) LXVKQ instructions.
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 15a22525000..07b0b671920 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1192,7 +1192,7 @@
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
-;;              XXLSPLTIDP LXVKQ
+;;              XXLSPLTI*  LXVKQ
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
@@ -1241,7 +1241,7 @@
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
 ;;              XXSPLTIB   VSPLTISW   VSX 0/-1
-;;              XXSPLTIDP  LXVKQ
+;;              XXSPLTI*   LXVKQ
 ;;              VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
new file mode 100644
index 00000000000..2707d86e6fd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V16HI vector constants where the
+   first 4 elements are the same as the next 4 elements, etc.  */
+
+vector unsigned char
+v16qi_const_1 (void)
+{
+  return (vector unsigned char) { 1, 1, 1, 1, 1, 1, 1, 1,
+				  1, 1, 1, 1, 1, 1, 1, 1, }; /* VSLTPISB.  */
+}
+
+vector unsigned char
+v16qi_const_2 (void)
+{
+  return (vector unsigned char) { 1, 2, 3, 4, 1, 2, 3, 4,
+				  1, 2, 3, 4, 1, 2, 3, 4, }; /* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}              1 } } */
+/* { dg-final { scan-assembler-times {\mvspltisb\M|\mxxspltib\M} 1 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}                   } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}                    } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
new file mode 100644
index 00000000000..05d4ee3f5cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
@@ -0,0 +1,67 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SF vector constants.  */
+
+vector float
+v4sf_const_1 (void)
+{
+  return (vector float) { 1.0f, 1.0f, 1.0f, 1.0f };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_nan (void)
+{
+  return (vector float) { __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf ("") };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_inf (void)
+{
+  return (vector float) { __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff () };		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_m0 (void)
+{
+  return (vector float) { -0.0f, -0.0f, -0.0f, -0.0f };	/* XXSPLTIB/VSLW.  */
+}
+
+vector float
+v4sf_splats_1 (void)
+{
+  return vec_splats (1.0f);				/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_nan (void)
+{
+  return vec_splats (__builtin_nanf (""));		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_inf (void)
+{
+  return vec_splats (__builtin_inff ());		/* XXSPLTIW.  */
+}
+
+vector float
+v8hi_splats_m0 (void)
+{
+  return vec_splats (-0.0f);				/* XXSPLTIB/VSLW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 6 } } */
+/* { dg-final { scan-assembler-times {\mxxspltib\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvslw\M}     2 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}      } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}       } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
new file mode 100644
index 00000000000..da909e948b2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VEXTSB2W) is not done.  */
+
+vector int
+v4si_const_1 (void)
+{
+  return (vector int) { 1, 1, 1, 1 };			/* VSLTPISW.  */
+}
+
+vector int
+v4si_const_126 (void)
+{
+  return (vector int) { 126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector int
+v4si_const_1023 (void)
+{
+  return (vector int) { 1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector int
+v4si_splats_1 (void)
+{
+  return vec_splats (1);				/* VSLTPISW.  */
+}
+
+vector int
+v4si_splats_126 (void)
+{
+  return vec_splats (126);				/* XXSPLTIW.  */
+}
+
+vector int
+v8hi_splats_1023 (void)
+{
+  return vec_splats (1023);				/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mvspltisw\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvextsb2w\M}    } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
new file mode 100644
index 00000000000..290e05d4a64
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
@@ -0,0 +1,62 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V8HI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VUPKLSB) is not done.  */
+
+vector short
+v8hi_const_1 (void)
+{
+  return (vector short) { 1, 1, 1, 1, 1, 1, 1, 1 };	/* VSLTPISH.  */
+}
+
+vector short
+v8hi_const_126 (void)
+{
+  return (vector short) { 126, 126, 126, 126,
+			  126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_const_1023 (void)
+{
+  return (vector short) { 1023, 1023, 1023, 1023,
+			  1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1 (void)
+{
+  return vec_splats ((short)1);				/* VSLTPISH.  */
+}
+
+vector short
+v8hi_splats_126 (void)
+{
+  return vec_splats ((short)126);			/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1023 (void)
+{
+  return vec_splats ((short)1023);			/* XXSPLTIW.  */
+}
+
+/* Test that we can optimiza V8HI where all of the even elements are the same
+   and all of the odd elements are the same.  */
+vector short
+v8hi_const_1023_1000 (void)
+{
+  return (vector short) { 1023, 1000, 1023, 1000,
+			  1023, 1000, 1023, 1000 };	/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  5 } } */
+/* { dg-final { scan-assembler-times {\mvspltish\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvupklsb\M}     } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
index 5f84930e1a7..6c01666b625 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -149,7 +149,7 @@ main (int argc, char *argv [])
   return 0;
 }
 
-/* { dg-final { scan-assembler-times {\mxxspltiw\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mxxspltidp\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [gcc(refs/users/meissner/heads/work071)] Generate XXSPLTIW on power10.
@ 2021-10-14 18:15 Michael Meissner
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2021-10-14 18:15 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:90e34b175735acc14d2ddaf5be4411ad418c559e

commit 90e34b175735acc14d2ddaf5be4411ad418c559e
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Thu Oct 14 14:14:43 2021 -0400

    Generate XXSPLTIW on power10.
    
    This patch adds support to automatically generate the ISA 3.1 XXSPLTIW
    instruction for V8HImode, V4SImode, and V4SFmode vectors.  It does this by
    adding support for vector constants that can be used, and adding a
    VEC_DUPLICATE pattern to generate the actual XXSPLTIW instruction.
    
    I rewrote the XXSPLTW built-in functions to use VEC_DUPLICATE instead of
    UNSPEC.
    
    This patch also updates the insn counts in the vec-splati-runnable.c test to
    work with the new option to use XXSPLTIW to load up some vector constants.
    
    I added 4 new tests to test loading up V16QI, V8HI, V4SI, and V4SF vector
    constants.
    
    2021-10-14  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/constraints.md (eW): New constraint.
            * config/rs6000/predicates.md (easy_fp_constant): Add support for
            XXSPLTIW.
            (easy_vector_constant_32bit_element): New predicate.
            (easy_vector_constant): Add support for XXSPLTIW.
            * config/rs6000/rs6000-protos.h (rs6000_vec_const): Add fields for
            XXSPLTIW.
            (vec_const_use_xxspltiw): New declaration.
            * config/rs6000/rs6000.c (xxspltib_constant_p): If we can generate
            XXSPLTIW, don't do XXSPLTIB and sign extend.
            (output_vec_const_move): Add support for XXSPLTIW.
            (prefixed_xxsplti_p): Recognize XXSPLTIW instructions as
            prefixed.
            (vec_const_use_xxspltiw): New function.
            * config/rs6000/rs6000.opt (-mxxspltiw): New debug switch.
            * config/rs6000/vsx.md (vsx_mov<mode>_64bit): Add support for
            constants loaded with XXSPLTIW.
            (vsx_mov<mode>_32bit): Likewise.
            * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
            eW constraint.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/vec-splat-constant-v16qi.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4si.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v8hi.c: New test.

Diff:
---
 gcc/config/rs6000/constraints.md                   |   5 +
 gcc/config/rs6000/predicates.md                    |  22 ++++
 gcc/config/rs6000/rs6000-protos.h                  |   2 +
 gcc/config/rs6000/rs6000.c                         | 123 ++++++++++++++++++++-
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           |  28 ++---
 .../gcc.target/powerpc/vec-splat-constant-v16qi.c  |  27 +++++
 .../gcc.target/powerpc/vec-splat-constant-v4sf.c   |  67 +++++++++++
 .../gcc.target/powerpc/vec-splat-constant-v4si.c   |  51 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-v8hi.c   |  62 +++++++++++
 10 files changed, 371 insertions(+), 20 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index a15b659d9d7..d2a1c088995 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -223,6 +223,11 @@
   "An IEEE 128-bit constant that can be loaded with the LXVKQ instruction."
   (match_operand 0 "easy_vector_constant_ieee128"))
 
+;; A scalar or vector constant that can be loaded with the XXSPLTIW instruction.
+(define_constraint "eW"
+  "A constant that can be loaded with the XXSPLTIW instruction."
+  (match_operand 0 "easy_vector_constant_32bit_element"))
+
 ;; Floating-point constraints.  These two are defined so that insn
 ;; length attributes can be calculated exactly.
 
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index de191fff08a..2a2bdbe463a 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -611,6 +611,9 @@
 
       if (vec_const_use_xxspltidp (&vec_const))
 	return true;
+
+      if (vec_const_use_xxspltiw (&vec_const))
+	return true;
     }
 
   /* Otherwise consider floating point constants hard, so that the
@@ -638,6 +641,22 @@
 	  && vec_const_use_xxspltidp (&vec_const));
 })
 
+;; Return 1 if the operand is a 32-bit vector constant that can be loaded via
+;; the XXSPLTIW instruction.
+
+(define_predicate "easy_vector_constant_32bit_element"
+  (match_code "const_vector,vec_duplicate,const_int,const_double")
+{
+  rs6000_vec_const vec_const;
+
+  /* Can we do the XXSPLTIW instruction?  */
+  if (!TARGET_XXSPLTIW || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  return (vec_const_to_bytes (op, mode, &vec_const)
+	  && vec_const_use_xxspltiw (&vec_const));
+})
+
 ;; Return 1 if the operand is a special IEEE 128-bit value that can be loaded
 ;; via the LXVKQ instruction.
 
@@ -713,6 +732,9 @@
 
 	  if (vec_const_use_xxspltidp (&vec_const))
 	    return true;
+
+	  if (vec_const_use_xxspltiw (&vec_const))
+	    return true;
 	}
 
       return easy_altivec_constant (op, mode);
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 43c0f96aab5..db0ad716968 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -240,11 +240,13 @@ typedef struct {
   unsigned char bytes[VECTOR_CONST_BYTES];
   machine_mode orig_mode;		/* Original mode.  */
   unsigned int xxspltidp_immediate;	/* Immediate value for XXSPLTIDP.  */
+  unsigned int xxspltiw_immediate;	/* Immediate value for XXSPLTIW.  */
   unsigned lxvkq_immediate;		/* Immediate to use with LXVKQ.  */
 } rs6000_vec_const;
 
 extern bool vec_const_to_bytes (rtx, machine_mode, rs6000_vec_const *);
 extern bool vec_const_use_xxspltidp (rs6000_vec_const *);
+extern bool vec_const_use_xxspltiw (rs6000_vec_const *);
 extern bool vec_const_use_lxvkq (rs6000_vec_const *);
 #endif /* RTX_CODE */
 
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 359379348bb..1bd3f7c9c52 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6925,12 +6925,17 @@ xxspltib_constant_p (rtx op,
   else
     return false;
 
-  /* See if we could generate vspltisw/vspltish directly instead of xxspltib +
-     sign extend.  Special case 0/-1 to allow getting any VSX register instead
-     of an Altivec register.  */
-  if ((mode == V4SImode || mode == V8HImode) && !IN_RANGE (value, -1, 0)
-      && EASY_VECTOR_15 (value))
-    return false;
+  /* See if we could generate vspltisw/vspltish/xxspltiw directly instead of
+     xxspltib + sign extend.  Special case 0/-1 to allow getting any VSX
+     register instead of an Altivec register.  */
+  if ((mode == V4SImode || mode == V8HImode) && !IN_RANGE (value, -1, 0))
+    {
+      if (EASY_VECTOR_15 (value))
+	return false;
+
+      if (TARGET_XXSPLTIW && TARGET_PREFIXED && TARGET_VSX)
+	return false;
+    }
 
   /* Return # of instructions and the constant byte for XXSPLTIB.  */
   if (mode == V16QImode)
@@ -7004,6 +7009,52 @@ output_vec_const_move (rtx *operands)
 	      operands[2] = GEN_INT (vec_const.xxspltidp_immediate);
 	      return "xxspltidp %x0,%2";
 	    }
+
+	  if (vec_const_use_xxspltiw (&vec_const))
+	    {
+	      HOST_WIDE_INT imm = vec_const.xxspltiw_immediate;
+
+	      /* See if we can generate the shorter VSPLTISB, VSPLTISH, or
+		 VSPLTISW instead of XXSPLTIW.  */
+	      if (dest_vmx_p)
+		{
+		  HOST_WIDE_INT sign_imm
+		    = ((imm & 0xffffffff) ^ 0x80000000) - 0x80000000;
+
+		  if (EASY_VECTOR_15 (sign_imm))
+		    {
+		      operands[2] = GEN_INT (sign_imm);
+		      return "vspltisw %0,%2";
+		    }
+
+		  if (vec_const.bytes[0] == vec_const.bytes[1]
+		      && vec_const.bytes[0] == vec_const.bytes[2]
+		      && vec_const.bytes[0] == vec_const.bytes[3])
+		    {
+		      HOST_WIDE_INT sign_imm8 = ((imm & 0xff) ^ 0x80) - 0x80;
+		      if (EASY_VECTOR_15 (sign_imm8))
+			{
+			  operands[2] = GEN_INT (sign_imm8);
+			  return "vspltisb %0,%2";
+			}
+		    }
+
+		  if (vec_const.h_words[0] == vec_const.h_words[1])
+		    {
+		      HOST_WIDE_INT sign_imm16
+			= ((imm & 0xffff) ^ 0x8000) - 0x8000;
+
+		      if (EASY_VECTOR_15 (sign_imm16))
+			{
+			  operands[2] = GEN_INT (sign_imm16);
+			  return "vspltish %0,%2";
+			}
+		    }
+		}
+
+	      operands[2] = GEN_INT (imm);
+	      return "xxspltiw %x0,%2";
+	    }
 	}
 
       if (TARGET_P9_VECTOR
@@ -26770,6 +26821,9 @@ prefixed_xxsplti_p (rtx_insn *insn)
     {
       if (vec_const_use_xxspltidp (&vec_const))
 	return true;
+
+      if (vec_const_use_xxspltiw (&vec_const))
+	return true;
     }
 
   return false;
@@ -28793,6 +28847,63 @@ vec_const_use_xxspltidp (rs6000_vec_const *vec_const)
   return true;
 }
 
+/* Determine if a vector constant can be loaded with XXSPLTIW.  If so,
+   fill out the fields used to generate the instruction.  */
+
+bool
+vec_const_use_xxspltiw (rs6000_vec_const *vec_const)
+{
+  if (!TARGET_XXSPLTIW || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  /* Make sure that each of the 4 32-bit segments are the same.  */
+  unsigned int value = vec_const->words[0];
+  if (value != vec_const->words[1]
+      || value != vec_const->words[2]
+      || value != vec_const->words[3])
+    return false;
+
+  /* Avoid values that are easy to create with other instructions (0.0 for
+     floating point, and values that can be loaded with VSPLTIW..  */
+  if (value == 0)
+    return false;
+
+  machine_mode mode = vec_const->orig_mode;
+  if (mode == VOIDmode)
+    mode = SImode;
+
+  if (!FLOAT_MODE_P (mode))
+    {
+      /* Can we use VSPLTISW to load the constant?  */
+      int sign_value = ((value & 0xffffffff) ^ 0x80000000) - 0x80000000;
+      if (EASY_VECTOR_15 (sign_value))
+	return false;
+
+      /* Can we use VSPLTISB to load the constant?  */
+      if (vec_const->bytes[0] == vec_const->bytes[1]
+	  && vec_const->bytes[0] == vec_const->bytes[2]
+	  && vec_const->bytes[0] == vec_const->bytes[3])
+	{
+	  int sign_value8 = ((value & 0xff) ^ 0x80) - 0x80;
+	  if (EASY_VECTOR_15 (sign_value8))
+	    return false;
+	}
+
+      /* Can we use VSPLTISH to load the constant?  */
+      if (vec_const->h_words[0] == vec_const->h_words[1])
+	{
+	  int sign_value16 = ((value & 0xffff) ^ 0x8000) - 0x8000;
+	  if (EASY_VECTOR_15 (sign_value16))
+	    return false;
+	}
+    }
+
+  /* Record the immediate in the vec_const structure for XXSPLTIW.  */
+  vec_const->xxspltiw_immediate = value;
+
+  return true;
+}
+
 /* Determine if a vector constant can be loaded with LXVKQ.  If so, fill out
    the fields used to generate the instruction.  */
 
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index c9eb78952d6..015bf91b6d5 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -644,6 +644,10 @@ mxxspltidp
 Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
 Generate (do not generate) XXSPLTIDP instructions.
 
+mxxspltiw
+Target Undocumented Var(TARGET_XXSPLTIW) Init(1) Save
+Generate (do not generate) XXSPLTIW instructions.
+
 mlxvkq
 Target Undocumented Var(TARGET_LXVKQ) Init(1) Save
 Generate (do not generate) LXVKQ instructions.
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index b36bbcd2b4e..0d0609b5d0b 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1192,19 +1192,19 @@
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
-;;              XXLSPLTIDP LXVKQ
+;;              XXLSPLTIDP LXVKQ      XXSPLTIW
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        r,         we,        ?wQ,
                 ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
-                wa,        wa,
+                wa,        wa,        wa,
                 ?wa,       v,         <??r>,     wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        we,        r,         r,
                 wQ,        Y,         r,         r,         wE,        jwM,
-                eD,        eQ,
+                eD,        eQ,        eW,
                 ?jwM,      W,         <nW>,      v,         wZ"))]
 
   "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
@@ -1216,46 +1216,46 @@
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
                 store,     load,      store,     *,         vecsimple, vecsimple,
-                vecperm,   vecperm,
+                vecperm,   vecperm,   vecperm,
                 vecsimple, *,         *,         vecstore,  vecload")
    (set_attr "num_insns"
                "*,         *,         *,         2,         *,         2,
                 2,         2,         2,         2,         *,         *,
-                *,         *,
+                *,         *,         *,
                 *,         5,         2,         *,         *")
    (set_attr "max_prefixed_insns"
                "*,         *,         *,         *,         *,         2,
                 2,         2,         2,         2,         *,         *,
-                *,         *,
+                *,         *,         *,
                 *,         *,         *,         *,         *")
    (set_attr "length"
                "*,         *,         *,         8,         *,         8,
                 8,         8,         8,         8,         *,         *,
-                *,         *,
+                *,         *,         *,
                 *,         20,        8,         *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
                 *,         *,         *,         *,         p9v,       *,
-                p10,       p10,
+                p10,       p10,       p10,
                 <VSisa>,   *,         *,         *,         *")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
 ;;              XXSPLTIB   VSPLTISW   VSX 0/-1
-;;              XXSPLTIDP  LXVKQ
+;;              XXSPLTIDP  LXVKQ      XXSPLTIW
 ;;              VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
                 wa,        v,         ?wa,
-                wa,        wa,
+                wa,        wa,        wa,
                 v,         <??r>,
                 wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        Y,         r,         r,
                 wE,        jwM,       ?jwM,
-                eD,        eQ,
+                eD,        eQ,        eW,
                 W,         <nW>,
                 v,         wZ"))]
 
@@ -1268,19 +1268,19 @@
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, load,      store,    *,
                 vecsimple, vecsimple, vecsimple,
-                vecperm,   vecperm,
+                vecperm,   vecperm,   vecperm,
                 *,         *,
                 vecstore,  vecload")
    (set_attr "length"
                "*,         *,         *,         16,        16,        16,
                 *,         *,         *,
-                *,         *,
+                *,         *,         *,
                 20,        16,
                 *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
                 p9v,       *,         <VSisa>,
-                p10,       p10,
+                p10,       p10,       p10,
                 *,         *,
                 *,         *")])
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
new file mode 100644
index 00000000000..2707d86e6fd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V16HI vector constants where the
+   first 4 elements are the same as the next 4 elements, etc.  */
+
+vector unsigned char
+v16qi_const_1 (void)
+{
+  return (vector unsigned char) { 1, 1, 1, 1, 1, 1, 1, 1,
+				  1, 1, 1, 1, 1, 1, 1, 1, }; /* VSLTPISB.  */
+}
+
+vector unsigned char
+v16qi_const_2 (void)
+{
+  return (vector unsigned char) { 1, 2, 3, 4, 1, 2, 3, 4,
+				  1, 2, 3, 4, 1, 2, 3, 4, }; /* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}              1 } } */
+/* { dg-final { scan-assembler-times {\mvspltisb\M|\mxxspltib\M} 1 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}                   } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}                    } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
new file mode 100644
index 00000000000..05d4ee3f5cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
@@ -0,0 +1,67 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SF vector constants.  */
+
+vector float
+v4sf_const_1 (void)
+{
+  return (vector float) { 1.0f, 1.0f, 1.0f, 1.0f };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_nan (void)
+{
+  return (vector float) { __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf ("") };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_inf (void)
+{
+  return (vector float) { __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff () };		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_m0 (void)
+{
+  return (vector float) { -0.0f, -0.0f, -0.0f, -0.0f };	/* XXSPLTIB/VSLW.  */
+}
+
+vector float
+v4sf_splats_1 (void)
+{
+  return vec_splats (1.0f);				/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_nan (void)
+{
+  return vec_splats (__builtin_nanf (""));		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_inf (void)
+{
+  return vec_splats (__builtin_inff ());		/* XXSPLTIW.  */
+}
+
+vector float
+v8hi_splats_m0 (void)
+{
+  return vec_splats (-0.0f);				/* XXSPLTIB/VSLW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 6 } } */
+/* { dg-final { scan-assembler-times {\mxxspltib\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvslw\M}     2 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}      } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}       } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
new file mode 100644
index 00000000000..da909e948b2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VEXTSB2W) is not done.  */
+
+vector int
+v4si_const_1 (void)
+{
+  return (vector int) { 1, 1, 1, 1 };			/* VSLTPISW.  */
+}
+
+vector int
+v4si_const_126 (void)
+{
+  return (vector int) { 126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector int
+v4si_const_1023 (void)
+{
+  return (vector int) { 1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector int
+v4si_splats_1 (void)
+{
+  return vec_splats (1);				/* VSLTPISW.  */
+}
+
+vector int
+v4si_splats_126 (void)
+{
+  return vec_splats (126);				/* XXSPLTIW.  */
+}
+
+vector int
+v8hi_splats_1023 (void)
+{
+  return vec_splats (1023);				/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  4 } } */
+/* { dg-final { scan-assembler-times {\mvspltisw\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvextsb2w\M}    } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
new file mode 100644
index 00000000000..290e05d4a64
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
@@ -0,0 +1,62 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V8HI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VUPKLSB) is not done.  */
+
+vector short
+v8hi_const_1 (void)
+{
+  return (vector short) { 1, 1, 1, 1, 1, 1, 1, 1 };	/* VSLTPISH.  */
+}
+
+vector short
+v8hi_const_126 (void)
+{
+  return (vector short) { 126, 126, 126, 126,
+			  126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_const_1023 (void)
+{
+  return (vector short) { 1023, 1023, 1023, 1023,
+			  1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1 (void)
+{
+  return vec_splats ((short)1);				/* VSLTPISH.  */
+}
+
+vector short
+v8hi_splats_126 (void)
+{
+  return vec_splats ((short)126);			/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1023 (void)
+{
+  return vec_splats ((short)1023);			/* XXSPLTIW.  */
+}
+
+/* Test that we can optimiza V8HI where all of the even elements are the same
+   and all of the odd elements are the same.  */
+vector short
+v8hi_const_1023_1000 (void)
+{
+  return (vector short) { 1023, 1000, 1023, 1000,
+			  1023, 1000, 1023, 1000 };	/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  5 } } */
+/* { dg-final { scan-assembler-times {\mvspltish\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvupklsb\M}     } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [gcc(refs/users/meissner/heads/work071)] Generate XXSPLTIW on power10.
@ 2021-10-14 13:39 Michael Meissner
  0 siblings, 0 replies; 13+ messages in thread
From: Michael Meissner @ 2021-10-14 13:39 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:27b9f117fdbcbab893f79a09961201c7ce0c0c8c

commit 27b9f117fdbcbab893f79a09961201c7ce0c0c8c
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Thu Oct 14 09:38:59 2021 -0400

    Generate XXSPLTIW on power10.
    
    This patch adds support to automatically generate the ISA 3.1 XXSPLTIW
    instruction for V8HImode, V4SImode, and V4SFmode vectors.  It does this by
    adding support for vector constants that can be used, and adding a
    VEC_DUPLICATE pattern to generate the actual XXSPLTIW instruction.
    
    I rewrote the XXSPLTW built-in functions to use VEC_DUPLICATE instead of
    UNSPEC.
    
    This patch also updates the insn counts in the vec-splati-runnable.c test to
    work with the new option to use XXSPLTIW to load up some vector constants.
    
    I added 4 new tests to test loading up V16QI, V8HI, V4SI, and V4SF vector
    constants.
    
    2021-10-14  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/constraints.md (eW): New constraint.
            * config/rs6000/predicates.md (easy_fp_constant): Add support for
            XXSPLTIW.
            (easy_vector_constant_32bit_element): New predicate.
            (easy_vector_constant): Add support for XXSPLTIW.
            * config/rs6000/rs6000-protos.h (rs6000_vec_const): Add fields for
            XXSPLTIW.
            (vec_const_use_xxspltiw): New declaration.
            * config/rs6000/rs6000.c (output_vec_const_move): Add support for
            XXSPLTIW.
            (prefixed_xxsplti_p): Recognize XXSPLTIW instructions as
            prefixed.
            (vec_const_use_xxspltiw): New function.
            * config/rs6000/rs6000.opt (-mxxspltiw): New debug switch.
            * config/rs6000/vsx.md (vsx_mov<mode>_64bit): Add support for
            constants loaded with XXSPLTIW.
            (vsx_mov<mode>_32bit): Likewise.
            * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
            eW constraint.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/vec-splat-constant-v16qi.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v4si.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v8hi.c: New test.

Diff:
---
 gcc/config/rs6000/constraints.md                   |  5 ++
 gcc/config/rs6000/predicates.md                    | 25 ++++++++
 gcc/config/rs6000/rs6000-protos.h                  |  3 +
 gcc/config/rs6000/rs6000.c                         | 46 +++++++++++++++
 gcc/config/rs6000/rs6000.opt                       |  4 ++
 gcc/config/rs6000/vsx.md                           | 28 ++++-----
 .../gcc.target/powerpc/vec-splat-constant-v16qi.c  | 27 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-v4sf.c   | 67 ++++++++++++++++++++++
 .../gcc.target/powerpc/vec-splat-constant-v4si.c   | 51 ++++++++++++++++
 .../gcc.target/powerpc/vec-splat-constant-v8hi.c   | 62 ++++++++++++++++++++
 10 files changed, 304 insertions(+), 14 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index a15b659d9d7..d2a1c088995 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -223,6 +223,11 @@
   "An IEEE 128-bit constant that can be loaded with the LXVKQ instruction."
   (match_operand 0 "easy_vector_constant_ieee128"))
 
+;; A scalar or vector constant that can be loaded with the XXSPLTIW instruction.
+(define_constraint "eW"
+  "A constant that can be loaded with the XXSPLTIW instruction."
+  (match_operand 0 "easy_vector_constant_32bit_element"))
+
 ;; Floating-point constraints.  These two are defined so that insn
 ;; length attributes can be calculated exactly.
 
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 2c9c0a29845..cb7af8b33c4 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -610,6 +610,9 @@
       if (vec_const_use_xxspltidp (&vec_const))
 	return true;
 
+      if (vec_const_use_xxspltiw (&vec_const))
+	return true;
+
       if (vec_const_use_lxvkq (&vec_const))
 	return true;
     }
@@ -642,6 +645,25 @@
   return vec_const_use_xxspltidp (&vec_const);
 })
 
+;; Return 1 if the operand is a 32-bit vector constant that can be loaded via
+;; the XXSPLTIW instruction.
+
+(define_predicate "easy_vector_constant_32bit_element"
+  (match_code "const_vector,vec_duplicate,const_int,const_double")
+{
+  rs6000_vec_const vec_const;
+
+  /* Can we do the XXSPLTIW instruction?  */
+  if (!TARGET_XXSPLTIW || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  /* Convert the vector constant to bytes.  */
+  if (!vec_const_to_bytes (op, mode, &vec_const))
+    return false;
+
+  return vec_const_use_xxspltiw (&vec_const);
+})
+
 ;; Return 1 if the operand is a special IEEE 128-bit value that can be loaded
 ;; via the LXVKQ instruction.
 
@@ -718,6 +740,9 @@
 	  if (vec_const_use_xxspltidp (&vec_const))
 	    return true;
 
+	  if (vec_const_use_xxspltiw (&vec_const))
+	    return true;
+
 	  if (vec_const_use_lxvkq (&vec_const))
 	    return true;
 	}
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 388fe18e314..089e1b33c43 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -243,6 +243,8 @@ typedef struct {
   bool is_xxspltidp;			/* Use XXSPLTIDP to load constant.  */
   machine_mode xxspltidp_mode;		/* Mode to use for XXSPLTIDP.  */
   unsigned int xxspltidp_immediate;	/* Immediate value for XXSPLTIDP.  */
+  bool is_xxspltiw;			/* Use XXSPLTIDP to load constant.  */
+  unsigned int xxspltiw_immediate;	/* Immediate value for XXSPLTIW.  */
   bool is_lxvkq;			/* LXVKQ can load the constant.  */
   unsigned lxvkq_immediate;		/* Immediate to use with LXVKQ.  */
   bool is_prefixed;			/* Prefixed instruction used.  */
@@ -250,6 +252,7 @@ typedef struct {
 
 extern bool vec_const_to_bytes (rtx, machine_mode, rs6000_vec_const *);
 extern bool vec_const_use_xxspltidp (rs6000_vec_const *);
+extern bool vec_const_use_xxspltiw (rs6000_vec_const *);
 extern bool vec_const_use_lxvkq (rs6000_vec_const *);
 #endif /* RTX_CODE */
 
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 2a038ea7dea..4b800ab7d52 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -7004,6 +7004,12 @@ output_vec_const_move (rtx *operands)
 	      operands[2] = GEN_INT (vec_const.xxspltidp_immediate);
 	      return "xxspltidp %x0,%2";
 	    }
+
+	  if (vec_const_use_xxspltiw (&vec_const))
+	    {
+	      operands[2] = GEN_INT (vec_const.xxspltiw_immediate);
+	      return "xxspltiw %x0,%2";
+	    }
 	}
 
       if (TARGET_P9_VECTOR
@@ -26773,6 +26779,9 @@ prefixed_xxsplti_p (rtx_insn *insn)
 
       if (vec_const_use_xxspltidp (&vec_const))
 	return true;
+
+      if (vec_const_use_xxspltiw (&vec_const))
+	return true;
     }
 
   return false;
@@ -28792,6 +28801,43 @@ vec_const_use_xxspltidp (rs6000_vec_const *vec_const)
   return true;
 }
 
+/* Determine if a vector constant can be loaded with XXSPLTIW.  If so,
+   fill out the fields used to generate the instruction.  */
+
+bool
+vec_const_use_xxspltiw (rs6000_vec_const *vec_const)
+{
+  if (!TARGET_XXSPLTIW || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  /* Make sure that each of the 4 32-bit segments are the same.  */
+  unsigned int value = vec_const->words[0];
+  if (value != vec_const->words[1]
+      || value != vec_const->words[2]
+      || value != vec_const->words[3])
+    return false;
+
+  /* Avoid values that are easy to create with other instructions (0.0 for
+     floating point, and values that can be loaded with XXSPLTIB and sign
+     extension for integer.  */
+  if (value == 0)
+    return false;
+
+  machine_mode mode = vec_const->orig_mode;
+  if (mode == VOIDmode)
+    mode = SImode;
+
+  if (!FLOAT_MODE_P (mode) && IN_RANGE (value, -128, 127))
+    return false;
+
+  /* Record the information in the vec_const structure for XXSPLTIW.  */
+  vec_const->is_xxspltiw = true;
+  vec_const->is_prefixed = true;
+  vec_const->xxspltiw_immediate = value;
+
+  return true;
+}
+
 /* Determine if a vector constant can be loaded with LXVKQ.  If so, fill out
    the fields used to generate the instruction.  */
 
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index c9eb78952d6..015bf91b6d5 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -644,6 +644,10 @@ mxxspltidp
 Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
 Generate (do not generate) XXSPLTIDP instructions.
 
+mxxspltiw
+Target Undocumented Var(TARGET_XXSPLTIW) Init(1) Save
+Generate (do not generate) XXSPLTIW instructions.
+
 mlxvkq
 Target Undocumented Var(TARGET_LXVKQ) Init(1) Save
 Generate (do not generate) LXVKQ instructions.
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index eddbf395e77..95b29dd38dd 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1193,19 +1193,19 @@
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)  XXLSPLTIDP
-;;              LXVKQ
+;;              LXVKQ      XXLSPLTIW
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        r,         we,        ?wQ,
                 ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
                 ?wa,       v,         <??r>,     wZ,        v,         wa,
-                wa")
+                wa,        wa")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        we,        r,         r,
                 wQ,        Y,         r,         r,         wE,        jwM,
                 ?jwM,      W,         <nW>,      v,         wZ,        eD,
-                eQ"))]
+                eQ,        eW"))]
 
   "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
    && (register_operand (operands[0], <MODE>mode) 
@@ -1217,41 +1217,41 @@
                "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
                 store,     load,      store,     *,         vecsimple, vecsimple,
                 vecsimple, *,         *,         vecstore,  vecload,   vecperm,
-                vecperm")
+                vecperm,   vecperm")
    (set_attr "num_insns"
                "*,         *,         *,         2,         *,         2,
                 2,         2,         2,         2,         *,         *,
                 *,         5,         2,         *,         *,         *,
-                *")
+                *,         *")
    (set_attr "max_prefixed_insns"
                "*,         *,         *,         *,         *,         2,
                 2,         2,         2,         2,         *,         *,
                 *,         *,         *,         *,         *,         *,
-                *")
+                *,         *")
    (set_attr "length"
                "*,         *,         *,         8,         *,         8,
                 8,         8,         8,         8,         *,         *,
                 *,         20,        8,         *,         *,         *,
-                *")
+                *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
                 *,         *,         *,         *,         p9v,       *,
                 <VSisa>,   *,         *,         *,         *,         p10,
-                p10")])
+                p10,       p10")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
 ;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
-;;              LVX (VMX)  STVX (VMX) XXSPLTID   LXVKQ
+;;              LVX (VMX)  STVX (VMX) XXSPLTID   LXVKQ      XXSPLTIW
 (define_insn "*vsx_mov<mode>_32bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
                 wa,        v,         ?wa,       v,         <??r>,
-                wZ,        v,         wa,        wa")
+                wZ,        v,         wa,        wa,        wa")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        Y,         r,         r,
                 wE,        jwM,       ?jwM,      W,         <nW>,
-                v,         wZ,        eD,        eQ"))]
+                v,         wZ,        eD,        eQ,        eW"))]
 
   "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
    && (register_operand (operands[0], <MODE>mode) 
@@ -1262,15 +1262,15 @@
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, load,      store,    *,
                 vecsimple, vecsimple, vecsimple, *,         *,
-                vecstore,  vecload,   vecperm,   vecperm")
+                vecstore,  vecload,   vecperm,   vecperm,   vecperm")
    (set_attr "length"
                "*,         *,         *,         16,        16,        16,
                 *,         *,         *,         20,        16,
-                *,         *,         *,         *")
+                *,         *,         *,         *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
                 p9v,       *,         <VSisa>,   *,         *,
-                *,         *,         p10,       p10")])
+                *,         *,         p10,       p10,       p10")])
 
 ;; Explicit  load/store expanders for the builtin functions
 (define_expand "vsx_load_<mode>"
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
new file mode 100644
index 00000000000..2707d86e6fd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v16qi.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V16HI vector constants where the
+   first 4 elements are the same as the next 4 elements, etc.  */
+
+vector unsigned char
+v16qi_const_1 (void)
+{
+  return (vector unsigned char) { 1, 1, 1, 1, 1, 1, 1, 1,
+				  1, 1, 1, 1, 1, 1, 1, 1, }; /* VSLTPISB.  */
+}
+
+vector unsigned char
+v16qi_const_2 (void)
+{
+  return (vector unsigned char) { 1, 2, 3, 4, 1, 2, 3, 4,
+				  1, 2, 3, 4, 1, 2, 3, 4, }; /* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}              1 } } */
+/* { dg-final { scan-assembler-times {\mvspltisb\M|\mxxspltib\M} 1 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}                   } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}                    } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
new file mode 100644
index 00000000000..05d4ee3f5cb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4sf.c
@@ -0,0 +1,67 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SF vector constants.  */
+
+vector float
+v4sf_const_1 (void)
+{
+  return (vector float) { 1.0f, 1.0f, 1.0f, 1.0f };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_nan (void)
+{
+  return (vector float) { __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf (""),
+			  __builtin_nanf ("") };	/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_inf (void)
+{
+  return (vector float) { __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff (),
+			  __builtin_inff () };		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_const_m0 (void)
+{
+  return (vector float) { -0.0f, -0.0f, -0.0f, -0.0f };	/* XXSPLTIB/VSLW.  */
+}
+
+vector float
+v4sf_splats_1 (void)
+{
+  return vec_splats (1.0f);				/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_nan (void)
+{
+  return vec_splats (__builtin_nanf (""));		/* XXSPLTIW.  */
+}
+
+vector float
+v4sf_splats_inf (void)
+{
+  return vec_splats (__builtin_inff ());		/* XXSPLTIW.  */
+}
+
+vector float
+v8hi_splats_m0 (void)
+{
+  return vec_splats (-0.0f);				/* XXSPLTIB/VSLW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M} 6 } } */
+/* { dg-final { scan-assembler-times {\mxxspltib\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvslw\M}     2 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}      } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}       } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
new file mode 100644
index 00000000000..2cefe3ffa70
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v4si.c
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V4SI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VEXTSB2W) is not done.  */
+
+vector int
+v4si_const_1 (void)
+{
+  return (vector int) { 1, 1, 1, 1 };		/* VSLTPISW.  */
+}
+
+vector int
+v4si_const_126 (void)
+{
+  return (vector int) { 126, 126, 126, 126 };	/* XXSPLTIB/VEXTSB2W.  */
+}
+
+vector int
+v4si_const_1023 (void)
+{
+  return (vector int) { 1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector int
+v4si_splats_1 (void)
+{
+  return vec_splats (1);			/* VSLTPISW.  */
+}
+
+vector int
+v4si_splats_126 (void)
+{
+  return vec_splats (126);			/* XXSPLTIB/VEXTSB2W.  */
+}
+
+vector int
+v8hi_splats_1023 (void)
+{
+  return vec_splats (1023);			/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mvspltisw\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltib\M}  2 } } */
+/* { dg-final { scan-assembler-times {\mvextsb2w\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
new file mode 100644
index 00000000000..290e05d4a64
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v8hi.c
@@ -0,0 +1,62 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxspltiw" } */
+
+#include <altivec.h>
+
+/* Test whether XXSPLTIW is generated for V8HI vector constants.  We make sure
+   the power9 support (XXSPLTIB/VUPKLSB) is not done.  */
+
+vector short
+v8hi_const_1 (void)
+{
+  return (vector short) { 1, 1, 1, 1, 1, 1, 1, 1 };	/* VSLTPISH.  */
+}
+
+vector short
+v8hi_const_126 (void)
+{
+  return (vector short) { 126, 126, 126, 126,
+			  126, 126, 126, 126 };		/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_const_1023 (void)
+{
+  return (vector short) { 1023, 1023, 1023, 1023,
+			  1023, 1023, 1023, 1023 };	/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1 (void)
+{
+  return vec_splats ((short)1);				/* VSLTPISH.  */
+}
+
+vector short
+v8hi_splats_126 (void)
+{
+  return vec_splats ((short)126);			/* XXSPLTIW.  */
+}
+
+vector short
+v8hi_splats_1023 (void)
+{
+  return vec_splats ((short)1023);			/* XXSPLTIW.  */
+}
+
+/* Test that we can optimiza V8HI where all of the even elements are the same
+   and all of the odd elements are the same.  */
+vector short
+v8hi_const_1023_1000 (void)
+{
+  return (vector short) { 1023, 1000, 1023, 1000,
+			  1023, 1000, 1023, 1000 };	/* XXSPLTIW.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltiw\M}  5 } } */
+/* { dg-final { scan-assembler-times {\mvspltish\M}  2 } } */
+/* { dg-final { scan-assembler-not   {\mxxspltib\M}    } } */
+/* { dg-final { scan-assembler-not   {\mvupklsb\M}     } } */
+/* { dg-final { scan-assembler-not   {\mlxvx?\M}       } } */
+/* { dg-final { scan-assembler-not   {\mplxv\M}        } } */


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-10-21  2:53 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-15 15:15 [gcc(refs/users/meissner/heads/work071)] Generate XXSPLTIW on power10 Michael Meissner
  -- strict thread matches above, loose matches on Subject: below --
2021-10-21  2:53 Michael Meissner
2021-10-21  2:38 Michael Meissner
2021-10-21  0:40 Michael Meissner
2021-10-20 22:58 Michael Meissner
2021-10-20 22:24 Michael Meissner
2021-10-18 18:51 Michael Meissner
2021-10-18 18:47 Michael Meissner
2021-10-16  1:36 Michael Meissner
2021-10-15  5:29 Michael Meissner
2021-10-15  4:16 Michael Meissner
2021-10-14 18:15 Michael Meissner
2021-10-14 13:39 Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).