public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc(refs/users/meissner/heads/work071)] Generate XXSPLTI32DX on power10.
@ 2021-10-15 23:18 Michael Meissner
  0 siblings, 0 replies; only message in thread
From: Michael Meissner @ 2021-10-15 23:18 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:c8cf12ca3db525b5345cb9cb87b31a18496f40e7

commit c8cf12ca3db525b5345cb9cb87b31a18496f40e7
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Fri Oct 15 19:17:52 2021 -0400

    Generate XXSPLTI32DX on power10.
    
    This patch generates XXSPLTI32DX for SF/DF floating point constants that
    cannot be generated with the XXSPLTIDP instruction.  In addition, it adds
    support for using XXSPLTI32DX to load up V2DF constants, where both constants
    are the same.
    
    At the present time, XXSPLTI32DX is not enabled by default.
    
    2021-10-15  Michael Meissner  <meissner@linux.ibm.com>
    
    gcc/
    
            * config/rs6000/constraint.md (eJ): New constraint.
            (eK): New constraint.
            * config/rs6000/predicates.md (easy_fp_constant): If the constant
            can be loaded with XXSPLTI32DX, it is easy.
            (vsx_prefixed_scalar_constant_2insn): New predicate.
            (vsx_prefixed_vector_constant_2insn): New predicate.
            (vsx_prefixed_constant_2insn): New predicate.
            (easy_vector_constant): If the constant can be loaded with
            XXSPLTI32DX, it is easy.
            * config/rs6000/rs6000-protos.h (rs6000_vec_const): Add
            xxsplti32dx fields.
            (vec_const_use_xxsplti32dx): New declaration.
            * config/rs6000/rs6000.c (output_vec_const_move): Add support for
            generating XXSPLTI32DX.  Also support generating XXSPLTISB instead
            of VSPLTISB for XXSPLTIW.
            (vec_const_simple_constant): New function, split from
            vec_const_use_xxspltiw.
            (vec_const_use_xxspltiw): Move some code to
            vec_const_simple_constant.
            (prefixed_xxsplti_p): Constants loaded with XXSPLTI32DX are
            prefixed.
            (vec_const_use_xxsplti32dx): New function.
            * config/rs6000/rs6000.md (movsf_hardfloat): Add support for
            constants loaded with XXSPLTI32DX.
            (mov<mode>_hardfloat32, FMOVE64 iterator):  Likewise.
            (mov<mode>_hardfloat64, FMOVE64 iterator): Likewise.
            (movdi_internal32): Likewise.
            (movdi_internal64): Likewise.
            * config/rs6000/rs6000.opt (-mxxsplti32dx): New debug option.
            * config/rs6000/vsx.md (UNSPEC_XXSPLTI32DX_CONST): New unspec.
            (vsx_mov<mode>_64bit): Add support for constants loaded with
            XXSPLTI32DX.
            (vsx_mov<mode>_32bit): Likewise.
            (XXSPLTI32DX): New mode iterator.
            (splitter for XXSPLTI32DX): Add splitter for constants loaded with
            XXSPLTI32DX.
            (xxsplti32dx_<mode>_first): New insns.
            (xxsplti32dx_<mode>_second): New insns.
            * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
            eJ and eK constraints.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/vec-splat-constant-df-2.c: New test.
            * gcc.target/powerpc/vec-splat-constant-di-2.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2df-2.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2di-2.c: New test.

Diff:
---
 gcc/config/rs6000/constraints.md                   |  12 ++
 gcc/config/rs6000/predicates.md                    |  74 +++++++++++++
 gcc/config/rs6000/rs6000-protos.h                  |   3 +
 gcc/config/rs6000/rs6000.c                         | 116 ++++++++++++++-----
 gcc/config/rs6000/rs6000.md                        | 123 ++++++++++++++++-----
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           | 111 ++++++++++++++++---
 gcc/doc/md.texi                                    |   8 ++
 .../gcc.target/powerpc/vec-splat-constant-df-2.c   |  24 ++++
 .../gcc.target/powerpc/vec-splat-constant-di-2.c   |  38 +++++++
 .../gcc.target/powerpc/vec-splat-constant-v2df-2.c |  24 ++++
 .../gcc.target/powerpc/vec-splat-constant-v2di-2.c |  29 +++++
 12 files changed, 492 insertions(+), 74 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index e645f405588..cd10824edb9 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -213,6 +213,18 @@
   "A signed 34-bit integer constant if prefixed instructions are supported."
   (match_operand 0 "cint34_operand"))
 
+;; A scalar constant that can be loaded into vector registers with two prefixed
+;; instructions such as XXSPLTI32DX.
+(define_constraint "eJ"
+  "A scalar constant that can be loaded with two prefixed instructions."
+  (match_operand 0 "vsx_prefixed_scalar_constant_2insn"))
+
+;; A vector constant that can be loaded into vector registers with two prefixed
+;; instructions such as XXSPLTI32DX.
+(define_constraint "eK"
+  "A scalar constant that can be loaded with two prefixed instructions."
+  (match_operand 0 "vsx_prefixed_vector_constant_2insn"))
+
 ;; A scalar constant that can be loaded into vector registers with one prefixed
 ;; instruction such as XXSPLTIDP.
 (define_constraint "eS"
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 252abbbaf9a..cdb8c25517d 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -614,6 +614,9 @@
 
       if (vec_const_use_xxspltiw (&vec_const))
 	return true;
+
+      if (vec_const_use_xxsplti32dx (&vec_const))
+	return true;
     }
 
   /* Otherwise consider floating point constants hard, so that the
@@ -646,6 +649,28 @@
   return false;
 })
 
+;; Return 1 if the operand is a scalar constant that can be loaded to a VSX
+;; register with two prefixed instructions, such as XXSPLTI32DX.
+
+(define_predicate "vsx_prefixed_scalar_constant_2insn"
+  (match_code "const_int,const_double")
+{
+  rs6000_vec_const vec_const;
+
+  /* Do we have prefixed instructions and VSX registers available?  Is the
+     constant recognized?  */
+  if (!TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  if (!vec_const_to_bytes (op, mode, &vec_const))
+    return false;
+  
+  if (vec_const_use_xxsplti32dx (&vec_const))
+    return true;
+
+  return false;
+})
+
 ;; Return 1 if the operand is a scalar constant that can be loaded to a VSX
 ;; register with one prefixed instruction, such as XXSPLTIDP or XXSPLTIW.
 ;;
@@ -675,6 +700,52 @@
   return false;
 })
 
+;; Return 1 if the operand is a scalar constant that can be loaded to a VSX
+;; register with two prefixed instructions, such as XXSPLTI32DX.
+;;
+;; We have to have separate predicates and constraints for scalars and vectors,
+;; otherwise things get messed up with TImode when you try to load very large
+;; integer constants.
+
+(define_predicate "vsx_prefixed_vector_constant_2insn"
+  (match_code "const_vector,vec_duplicate")
+{
+  rs6000_vec_const vec_const;
+
+  /* Do we have prefixed instructions and VSX registers available?  Is the
+     constant recognized?  */
+  if (!TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  if (!vec_const_to_bytes (op, mode, &vec_const))
+    return false;
+  
+  if (vec_const_use_xxsplti32dx (&vec_const))
+    return true;
+
+  return false;
+  })
+
+;; Combination of the two prefixed vector constant predicates.
+(define_predicate "vsx_prefixed_constant_2insn"
+  (match_code "const_int,const_double,const_vector,vec_duplicate")
+{
+  rs6000_vec_const vec_const;
+
+  /* Do we have prefixed instructions and VSX registers available?  Is the
+     constant recognized?  */
+  if (!TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  if (!vec_const_to_bytes (op, mode, &vec_const))
+    return false;
+  
+  if (vec_const_use_xxsplti32dx (&vec_const))
+    return true;
+
+  return false;
+})
+
 ;; Return 1 if the operand is a special IEEE 128-bit value that can be loaded
 ;; via the LXVKQ instruction.
 
@@ -753,6 +824,9 @@
 
 	  if (vec_const_use_xxspltiw (&vec_const))
 	    return true;
+
+	  if (vec_const_use_xxsplti32dx (&vec_const))
+	    return true;
 	}
 
       return easy_altivec_constant (op, mode);
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 52f094dd410..36666c9f2d6 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -242,9 +242,12 @@ typedef struct {
   unsigned int xxspltidp_immediate;	/* Immediate value for XXSPLTIDP.  */
   unsigned int xxspltiw_immediate;	/* Immediate value for XXSPLTIW.  */
   unsigned int lxvkq_immediate;		/* Immediate to use with LXVKQ.  */
+  unsigned int xxsplti32dx_upper;	/* Upper value for XXSPLTI32DX.  */
+  unsigned int xxsplti32dx_lower;	/* Lower value for XXSPLTI32DX.  */
 } rs6000_vec_const;
 
 extern bool vec_const_to_bytes (rtx, machine_mode, rs6000_vec_const *);
+extern bool vec_const_use_xxsplti32dx (rs6000_vec_const *);
 extern bool vec_const_use_xxspltidp (rs6000_vec_const *);
 extern bool vec_const_use_xxspltiw (rs6000_vec_const *);
 extern bool vec_const_use_lxvkq (rs6000_vec_const *);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 838161fb23a..332ca3ca400 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -7014,9 +7014,19 @@ output_vec_const_move (rtx *operands)
 	    {
 	      HOST_WIDE_INT imm = vec_const.xxspltiw_immediate;
 
-	      /* See if we can generate the shorter VSPLTISB, VSPLTISH, or
-		 VSPLTISW instead of XXSPLTIW.  */
-	      if (dest_vmx_p)
+	      /* See if we can generate the shorter XXSPLTIB, VSPLTISH, or
+		 VSPLTISW instead of XXSPLTIW.  In theory, the alternatives
+		 should take care of this, but just in case, handle it
+		 here.  */
+	      if (vec_const.bytes[0] == vec_const.bytes[1]
+		  && vec_const.bytes[0] == vec_const.bytes[2]
+		  && vec_const.bytes[0] == vec_const.bytes[3])
+		{
+		  operands[2] = GEN_INT (imm & 0xff);
+		  return "xxspltib %x0,%2";
+		}
+
+	      else if (dest_vmx_p)
 		{
 		  HOST_WIDE_INT sign_imm
 		    = ((imm & 0xffffffff) ^ 0x80000000) - 0x80000000;
@@ -7027,18 +7037,6 @@ output_vec_const_move (rtx *operands)
 		      return "vspltisw %0,%2";
 		    }
 
-		  if (vec_const.bytes[0] == vec_const.bytes[1]
-		      && vec_const.bytes[0] == vec_const.bytes[2]
-		      && vec_const.bytes[0] == vec_const.bytes[3])
-		    {
-		      HOST_WIDE_INT sign_imm8 = ((imm & 0xff) ^ 0x80) - 0x80;
-		      if (EASY_VECTOR_15 (sign_imm8))
-			{
-			  operands[2] = GEN_INT (sign_imm8);
-			  return "vspltisb %0,%2";
-			}
-		    }
-
 		  if (vec_const.h_words[0] == vec_const.h_words[1])
 		    {
 		      HOST_WIDE_INT sign_imm16
@@ -7055,6 +7053,9 @@ output_vec_const_move (rtx *operands)
 	      operands[2] = GEN_INT (imm);
 	      return "xxspltiw %x0,%2";
 	    }
+
+	  if (vec_const_use_xxsplti32dx (&vec_const))
+	    return "#";
 	}
 
       if (TARGET_P9_VECTOR
@@ -26824,6 +26825,9 @@ prefixed_xxsplti_p (rtx_insn *insn)
 
       if (vec_const_use_xxspltiw (&vec_const))
 	return true;
+
+      if (vec_const_use_xxsplti32dx (&vec_const))
+	return true;
     }
 
   return false;
@@ -28838,15 +28842,13 @@ vec_const_use_xxspltidp (rs6000_vec_const *vec_const)
   return true;
 }
 
-/* Determine if a vector constant can be loaded with XXSPLTIW.  If so,
-   fill out the fields used to generate the instruction.  */
+/* Internal function to return true if a particular vector constant is simple
+   to generate without using prefixed instructions.  */
 
-bool
-vec_const_use_xxspltiw (rs6000_vec_const *vec_const)
+static bool
+vec_const_simple_constant (rs6000_vec_const *vec_const,
+			   machine_mode default_int_mode)
 {
-  if (!TARGET_XXSPLTIW || !TARGET_PREFIXED || !TARGET_VSX)
-    return false;
-
   /* Make sure that each of the 4 32-bit segments are the same.  */
   unsigned int value = vec_const->words[0];
   if (value != vec_const->words[1]
@@ -28854,44 +28856,98 @@ vec_const_use_xxspltiw (rs6000_vec_const *vec_const)
       || value != vec_const->words[3])
     return false;
 
-  /* Avoid values that are easy to create with other instructions (0.0 for
-     floating point, and values that can be loaded with VSPLTISW, VSPLTISH,
-     VSPLTISB, or XXSPLTISB.  */
+  /* Return true for values that are easy to create with other instructions
+     (0.0 for floating point, and values that can be loaded with VSPLTISW,
+     VSPLTISH, VSPLTISB, or XXSPLTISB.  */
   if (value == 0)
-    return false;
+    return true;
 
   machine_mode mode = vec_const->orig_mode;
   if (mode == VOIDmode)
-    mode = SImode;
+    mode = default_int_mode;
 
   if (!FLOAT_MODE_P (mode))
     {
       /* Can we use VSPLTISW to load the constant?  */
       int sign_value = ((value & 0xffffffff) ^ 0x80000000) - 0x80000000;
       if (EASY_VECTOR_15 (sign_value))
-	return false;
+	return true;
 
       /* Can we use VSPLTISH to load the constant?  */
       if (vec_const->h_words[0] == vec_const->h_words[1])
 	{
 	  int sign_value16 = ((value & 0xffff) ^ 0x8000) - 0x8000;
 	  if (EASY_VECTOR_15 (sign_value16))
-	    return false;
+	    return true;
 	}
 
       /* Can we use XXSPLTISB/VSPLTISB to load the constant?  */
       if (vec_const->bytes[0] == vec_const->bytes[1]
 	  && vec_const->bytes[0] == vec_const->bytes[2]
 	  && vec_const->bytes[0] == vec_const->bytes[3])
-	return false;
+	return true;
     }
 
+  return false;
+}
+
+/* Determine if a vector constant can be loaded with XXSPLTIW.  If so,
+   fill out the fields used to generate the instruction.  */
+
+bool
+vec_const_use_xxspltiw (rs6000_vec_const *vec_const)
+{
+  if (!TARGET_XXSPLTIW || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  /* Make sure that each of the 4 32-bit segments are the same.  */
+  unsigned int value = vec_const->words[0];
+  if (value != vec_const->words[1]
+      || value != vec_const->words[2]
+      || value != vec_const->words[3])
+    return false;
+
+  /* Do not use XXSPLTIW for values that are easy to create with other
+     instructions (0.0 for floating point, and values that can be loaded with
+     VSPLTISW, VSPLTISH, VSPLTISB, or XXSPLTISB.  */
+  if (vec_const_simple_constant (vec_const, SImode))
+    return false;
+
   /* Record the immediate in the vec_const structure for XXSPLTIW.  */
   vec_const->xxspltiw_immediate = value;
 
   return true;
 }
 
+/* Determine if a vector constant can be loaded with 2 XXSPLTI32DX
+   instructions.  If so, fill out the fields used to generate the
+   instruction.  */
+
+bool
+vec_const_use_xxsplti32dx (rs6000_vec_const *vec_const)
+{
+  if (!TARGET_XXSPLTI32DX || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  /* Make sure that each of the 2 64-bit segments are the same.  */
+  if (vec_const->d_words[0] != vec_const->d_words[1])
+    return false;
+
+  /* Do not use XXSPLTI32DX for values that can be created with simple
+     non-prefixed instructions, or created by the other vector constant
+     instructions (XXSPLTIDP, XXSPLTIW, or LXVKQ).  */
+  if (vec_const_simple_constant (vec_const, DImode)
+      || vec_const_use_xxspltidp (vec_const)
+      || vec_const_use_xxspltiw (vec_const)
+      || vec_const_use_lxvkq (vec_const))
+    return false;
+
+  vec_const->xxsplti32dx_upper = vec_const->words[0];
+  vec_const->xxsplti32dx_lower = vec_const->words[1];
+
+  return true;
+}
+
 /* Determine if a vector constant can be loaded with LXVKQ.  If so, fill out
    the fields used to generate the instruction.  */
 
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 79ea4a82b4f..8a19a9c2208 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -7764,17 +7764,17 @@
 ;;
 ;;	LWZ          LFS        LXSSP       LXSSPX     STFS       STXSSP
 ;;	STXSSPX      STW        XXLXOR      LI         FMR        XSCPSGNDP
-;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP
+;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP  XXSPLTI32DX
 
 (define_insn "movsf_hardfloat"
   [(set (match_operand:SF 0 "nonimmediate_operand"
 	 "=!r,       f,         v,          wa,        m,         wY,
 	  Z,         m,         wa,         !r,        f,         wa,
-	  !r,        *c*l,      !r,         *h,        wa")
+	  !r,        *c*l,      !r,         *h,        wa,        wa")
 	(match_operand:SF 1 "input_operand"
 	 "m,         m,         wY,         Z,         f,         v,
 	  wa,        r,         j,          j,         f,         wa,
-	  r,         r,         *h,         0,         eS"))]
+	  r,         r,         *h,         0,         eS,        eJ"))]
   "(register_operand (operands[0], SFmode)
    || register_operand (operands[1], SFmode))
    && TARGET_HARD_FLOAT
@@ -7797,15 +7797,24 @@
    mt%0 %1
    mf%1 %0
    nop
+   #
    #"
   [(set_attr "type"
 	"load,       fpload,    fpload,     fpload,    fpstore,   fpstore,
 	 fpstore,    store,     veclogical, integer,   fpsimple,  fpsimple,
-	 *,          mtjmpr,    mfjmpr,     *,         vecperm")
+	 *,          mtjmpr,    mfjmpr,     *,         vecperm,   vecperm")
    (set_attr "isa"
 	"*,          *,         p9v,        p8v,       *,         p9v,
 	 p8v,        *,         *,          *,         *,         *,
-	 *,          *,         *,          *,         p10")])
+	 *,          *,         *,          *,         p10,       p10")
+   (set_attr "max_prefixed_insns"
+	"*,          *,         *,          *,         *,         *,
+	 *,          *,         *,          *,         *,         *,
+	 *,          *,         *,          *,         *,         2")
+   (set_attr "num_insns"
+	"*,          *,         *,          *,         *,         *,
+	 *,          *,         *,          *,         *,         *,
+	 *,          *,         *,          *,         *,         2")])
 
 ;;	LWZ          LFIWZX     STW        STFIWX     MTVSRWZ    MFVSRWZ
 ;;	FMR          MR         MT%0       MF%1       NOP
@@ -8065,18 +8074,18 @@
 
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSD         STXSD       XXLOR       XXLXOR      GPR<-0
-;;           LWZ          STW         MR          XXSPLTIDP
+;;           LWZ          STW         MR          XXSPLTIDP   XXSPLTI32DX
 
 
 (define_insn "*mov<mode>_hardfloat32"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
             "=m,          d,          d,          <f64_p9>,   wY,
               <f64_av>,   Z,          <f64_vsx>,  <f64_vsx>,  !r,
-              Y,          r,          !r,         wa")
+              Y,          r,          !r,         wa,         wa")
 	(match_operand:FMOVE64 1 "input_operand"
              "d,          m,          d,          wY,         <f64_p9>,
               Z,          <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
-              r,          Y,          r,          eS"))]
+              r,          Y,          r,          eS,         eJ"))]
   "! TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8094,20 +8103,29 @@
    #
    #
    #
+   #
    #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, two,
-             store,       load,       two,        vecperm")
+             store,       load,       two,        vecperm,    vecperm")
    (set_attr "size" "64")
    (set_attr "length"
             "*,           *,          *,          *,          *,
              *,           *,          *,          *,          8,
-             8,           8,          8,          *")
+             8,           8,          8,          *,          *")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
-             *,           *,          *,          p10")])
+             *,           *,          *,          p10,        p10")
+   (set_attr "max_prefixed_insns"
+            "*,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *,          2")
+   (set_attr "isa"
+            "*,           *,          *,          p9v,        p9v,
+             p7v,         p7v,        *,          *,          *,
+             *,           *,          *,          p10,        p10")])
 
 ;;           STW      LWZ     MR      G-const H-const F-const
 
@@ -8134,19 +8152,19 @@
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSDX        STXSDX      XXLOR       XXLXOR      LI 0
 ;;           STD          LD          MR          MT{CTR,LR}  MF{CTR,LR}
-;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP
+;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP   XXSPLTI32DX
 
 (define_insn "*mov<mode>_hardfloat64"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
            "=m,           d,          d,          <f64_p9>,   wY,
              <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
              YZ,          r,          !r,         *c*l,       !r,
-            *h,           r,          <f64_dm>,   wa")
+            *h,           r,          <f64_dm>,   wa,         wa")
 	(match_operand:FMOVE64 1 "input_operand"
             "d,           m,          d,          wY,         <f64_p9>,
              Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
              r,           YZ,         r,          r,          *h,
-             0,           <f64_dm>,   r,          eS"))]
+             0,           <f64_dm>,   r,          eS,         eJ"))]
   "TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8169,18 +8187,29 @@
    nop
    mfvsrd %0,%x1
    mtvsrd %x0,%1
+   #
    #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, integer,
              store,       load,       *,          mtjmpr,     mfjmpr,
-             *,           mfvsr,      mtvsr,      vecperm")
+             *,           mfvsr,      mtvsr,      vecperm,    vecperm")
    (set_attr "size" "64")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
              *,           *,          *,          *,          *,
-             *,           p8v,        p8v,        p10")])
+             *,           p8v,        p8v,        p10,        p10")
+   (set_attr "num_insns"
+            "*,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *,          2")
+   (set_attr "max_prefixed_insns"
+            "*,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *,          *,
+             *,           *,          *,          *,          2")])
 
 ;;           STD      LD       MR      MT<SPR> MF<SPR> G-const
 ;;           H-const  F-const  Special
@@ -9228,7 +9257,7 @@
 ;; a gpr into a fpr instead of reloading an invalid 'Y' address
 
 ;;        GPR store  GPR load   GPR move   FPR store  FPR load   FPR move
-;;        XXSPLTIDP
+;;        XXSPLTIDP  XXSPLTI32DX
 ;;        GPR const  AVX store  AVX store  AVX load   AVX load   VSX move
 ;;        P9 0       P9 -1      AVX 0/-1   VSX 0      VSX -1     P9 const
 ;;        AVX const  
@@ -9236,13 +9265,13 @@
 (define_insn "*movdi_internal32"
   [(set (match_operand:DI 0 "nonimmediate_operand"
          "=Y,        r,         r,         m,         ^d,        ^d,
-          ^wa,
+          ^wa,       ^wa,
           r,         wY,        Z,         ^v,        $v,        ^wa,
           wa,        wa,        v,         wa,        *i,        v,
           v")
 	(match_operand:DI 1 "input_operand"
          "r,         Y,         r,         ^d,        m,         ^d,
-          eS,
+          eS,        eJ,
           IJKnF,     ^v,        $v,        wY,        Z,         ^wa,
           Oj,        wM,        OjwM,      Oj,        wM,        wS,
           wB"))]
@@ -9258,6 +9287,7 @@
    fmr %0,%1
    #
    #
+   #
    stxsd %1,%0
    stxsdx %x1,%y0
    lxsd %0,%1
@@ -9272,23 +9302,35 @@
    #"
   [(set_attr "type"
          "store,     load,      *,         fpstore,   fpload,    fpsimple,
-          vecperm,
+          vecperm,   vecperm,
           *,         fpstore,   fpstore,   fpload,    fpload,    veclogical,
           vecsimple, vecsimple, vecsimple, veclogical,veclogical,vecsimple,
           vecsimple")
    (set_attr "size" "64")
    (set_attr "length"
          "8,         8,         8,         *,         *,         *,
-          *,
+          *,         *,
           16,        *,         *,         *,         *,         *,
           *,         *,         *,         *,         *,         8,
           *")
    (set_attr "isa"
          "*,         *,         *,         *,         *,         *,
-          p10,
+          p10,       p10,
           *,         p9v,       p7v,       p9v,       p7v,       *,
           p9v,       p9v,       p7v,       *,         *,         p7v,
-          p7v")])
+          p7v")
+   (set_attr "num_insns"
+         "*,         *,         *,         *,         *,         *,
+          *,         2,
+          *,         *,         *,         *,         *,         *,
+          *,         *,         *,         *,         *,         *,
+          *")
+   (set_attr "max_prefixed_insns"
+         "*,         *,         *,         *,         *,         *,
+          *,         2,
+          *,         *,         *,         *,         *,         *,
+          *,         *,         *,         *,         *,         *,
+          *")])
 
 (define_split
   [(set (match_operand:DI 0 "gpc_reg_operand")
@@ -9321,7 +9363,7 @@
 })
 
 ;;	   GPR store   GPR load    GPR move
-;;	   XXSPLTIDP
+;;	   XXSPLTIDP   XXSPLTI32DX
 ;;	   GPR li      GPR lis     GPR pli     GPR #
 ;;	   FPR store   FPR load    FPR move
 ;;	   AVX store   AVX store   AVX load    AVX load    VSX move
@@ -9332,7 +9374,7 @@
 (define_insn "*movdi_internal64"
   [(set (match_operand:DI 0 "nonimmediate_operand"
 	  "=YZ,        r,          r,
-	   ^wa,
+	   ^wa,        ^wa,
 	   r,          r,          r,          r,
 	   m,          ^d,         ^d,
 	   wY,         Z,          $v,         $v,         ^wa,
@@ -9342,7 +9384,7 @@
 	   ?r,         ?wa")
 	(match_operand:DI 1 "input_operand"
 	  "r,          YZ,         r,
-	   eS,
+	   eS,         eJ,
 	   I,          L,          eI,         nF,
 	   ^d,         m,          ^d,
 	   ^v,         $v,         wY,         Z,          ^wa,
@@ -9362,6 +9404,7 @@
    lis %0,%v1
    li %0,%1
    #
+   #
    stfd%U0%X0 %1,%0
    lfd%U1%X1 %0,%1
    fmr %0,%1
@@ -9384,7 +9427,7 @@
    mtvsrd %x0,%1"
   [(set_attr "type"
 	  "store,      load,       *,
-	   vecperm,
+	   vecperm,    vecperm,
 	   *,          *,          *,          *,
 	   fpstore,    fpload,     fpsimple,
 	   fpstore,    fpstore,    fpload,     fpload,     veclogical,
@@ -9395,7 +9438,7 @@
    (set_attr "size" "64")
    (set_attr "length"
 	  "*,          *,          *,
-	   *,
+	   *,          *,
 	   *,          *,          *,          20,
 	   *,          *,          *,
 	   *,          *,          *,          *,          *,
@@ -9405,14 +9448,34 @@
 	   *,          *")
    (set_attr "isa"
 	  "*,          *,          *,
-	   p10,
+	   p10,        p10,
 	   *,          *,          p10,        *,
 	   *,          *,          *,
 	   p9v,        p7v,        p9v,        p7v,        *,
 	   p9v,        p9v,        p7v,        *,          *,
 	   p7v,        p7v,
 	   *,          *,          *,
-	   p8v,        p8v")])
+	   p8v,        p8v")
+   (set_attr "num_insns"
+	  "*,          *,          *,
+	   *,          2,
+	   *,          *,          *,          *,
+	   *,          *,          *,
+	   *,          *,          *,          *,          *,
+	   *,          *,          *,          *,          *,
+	   8,          *,
+	   *,          *,          *,
+	   *,          *")
+   (set_attr "max_prefixed_insns"
+	  "*,          *,          *,
+	   *,          2,
+	   *,          *,          *,          *,
+	   *,          *,          *,
+	   *,          *,          *,          *,          *,
+	   *,          *,          *,          *,          *,
+	   8,          *,
+	   *,          *,          *,
+	   *,          *")])
 
 ; Some DImode loads are best done as a load of -1 followed by a mask
 ; instruction.
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 015bf91b6d5..25c56238861 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -640,6 +640,10 @@ mprivileged
 Target Var(rs6000_privileged) Init(0)
 Generate code that will run in privileged state.
 
+mxxsplti32dx
+Target Undocumented Var(TARGET_XXSPLTI32DX) Init(1) Save
+Generate (do not generate) XXSPLTI32DX instructions.
+
 mxxspltidp
 Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
 Generate (do not generate) XXSPLTIDP instructions.
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 07b0b671920..5058addd9ae 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -376,6 +376,7 @@
    UNSPEC_XXSPLTIW
    UNSPEC_XXSPLTIDP
    UNSPEC_XXSPLTI32DX
+   UNSPEC_XXSPLTI32DX_CONST
    UNSPEC_XXBLEND
    UNSPEC_XXPERMX
   ])
@@ -1192,19 +1193,19 @@
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
-;;              XXLSPLTI*  LXVKQ
+;;              XXLSPLTI*  LXVKQ      XXSPLTI32DX
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        r,         we,        ?wQ,
                 ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
-                wa,        wa,
+                wa,        wa,        wa,
                 ?wa,       v,         <??r>,     wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        we,        r,         r,
                 wQ,        Y,         r,         r,         wE,        jwM,
-                eV,        eQ,
+                eV,        eQ,        eK,
                 ?jwM,      W,         <nW>,      v,         wZ"))]
 
   "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
@@ -1216,46 +1217,46 @@
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
                 store,     load,      store,     *,         vecsimple, vecsimple,
-                vecperm,   vecperm,
+                vecperm,   vecperm,   vecperm,
                 vecsimple, *,         *,         vecstore,  vecload")
    (set_attr "num_insns"
                "*,         *,         *,         2,         *,         2,
                 2,         2,         2,         2,         *,         *,
-                *,         *,
+                *,         *,         2,
                 *,         5,         2,         *,         *")
    (set_attr "max_prefixed_insns"
                "*,         *,         *,         *,         *,         2,
                 2,         2,         2,         2,         *,         *,
-                *,         *,
+                *,         *,         2,
                 *,         *,         *,         *,         *")
    (set_attr "length"
                "*,         *,         *,         8,         *,         8,
                 8,         8,         8,         8,         *,         *,
-                *,         *,
+                *,         *,         *,
                 *,         20,        8,         *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
                 *,         *,         *,         *,         p9v,       *,
-                p10,       p10,
+                p10,       p10,       p10,
                 <VSisa>,   *,         *,         *,         *")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
 ;;              XXSPLTIB   VSPLTISW   VSX 0/-1
-;;              XXSPLTI*   LXVKQ
+;;              XXSPLTI*   LXVKQ      XXSPLTI32DX
 ;;              VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
                 wa,        v,         ?wa,
-                wa,        wa,
+                wa,        wa,        wa,
                 v,         <??r>,
                 wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        Y,         r,         r,
                 wE,        jwM,       ?jwM,
-                eV,        eQ,
+                eV,        eQ,        eK,
                 W,         <nW>,
                 v,         wZ"))]
 
@@ -1268,20 +1269,26 @@
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, load,      store,    *,
                 vecsimple, vecsimple, vecsimple,
-                vecperm,   vecperm,
+                vecperm,   vecperm,   vecperm,
                 *,         *,
                 vecstore,  vecload")
    (set_attr "length"
                "*,         *,         *,         16,        16,        16,
                 *,         *,         *,
-                *,         *,
+                *,         *,         *,
                 20,        16,
                 *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
                 p9v,       *,         <VSisa>,
-                p10,       p10,
+                p10,       p10,       p10,
                 *,         *,
+                *,         *")
+   (set_attr "num_insns"
+               "*,         *,         *,         4,         4,         4,
+                *,         *,         *,
+                *,         *,         2,
+                5,         4,
                 *,         *")])
 
 ;; Explicit  load/store expanders for the builtin functions
@@ -6579,6 +6586,82 @@
   [(set_attr "type" "vecperm")
    (set_attr "prefixed" "yes")])
 
+;; XXSPLTI32DX used to create 64-bit constants or vector constants where the
+;; even elements match and the odd elements match.
+(define_mode_iterator XXSPLTI32DX [DI SF DF V16QI V8HI V4SI V4SF V2DI V2DF])
+
+;; Don't split DImode before register allocation, so that it has a better
+;; chance of winding up in a GPR register.
+(define_split
+  [(set (match_operand:XXSPLTI32DX 0 "vsx_register_operand")
+	(match_operand:XXSPLTI32DX 1 "vsx_prefixed_constant_2insn"))]
+  "TARGET_POWER10 && (reload_completed || <MODE>mode != DImode)"
+  [(set (match_dup 0)
+	(unspec:XXSPLTI32DX [(match_dup 2)
+			     (match_dup 3)] UNSPEC_XXSPLTI32DX_CONST))
+   (set (match_dup 0)
+	(unspec:XXSPLTI32DX [(match_dup 0)
+			     (match_dup 4)
+			     (match_dup 5)] UNSPEC_XXSPLTI32DX_CONST))]
+{
+  HOST_WIDE_INT high;
+  HOST_WIDE_INT low;
+  rs6000_vec_const vec_const;
+
+  if (!vec_const_to_bytes (operands[1], <MODE>mode, &vec_const))
+    gcc_unreachable ();
+
+  if (!vec_const_use_xxsplti32dx (&vec_const))
+    gcc_unreachable ();
+
+  high = vec_const.xxsplti32dx_upper;
+  low = vec_const.xxsplti32dx_lower;
+
+  /* If the low bits are 0 or all 1s, initialize that word first.  This way we
+     can use a smaller XXSPLTIB/XXLXOR/XXLORC instruction instead of the first
+     XXSPLTI32DX.  */
+  if (low == 0 || low ==  -1)
+    {
+      operands[2] = const1_rtx;
+      operands[3] = GEN_INT (low);
+      operands[4] = const0_rtx;
+      operands[5] = GEN_INT (high);
+    }
+  else
+    {
+      operands[2] = const0_rtx;
+      operands[3] = GEN_INT (high);
+      operands[4] = const1_rtx;
+      operands[5] = GEN_INT (low);
+    }
+})
+
+;; First word of XXSPLTI32DX
+(define_insn "*xxsplti32dx_<mode>_first"
+  [(set (match_operand:XXSPLTI32DX 0 "vsx_register_operand" "=wa,wa,wa")
+	(unspec:XXSPLTI32DX [(match_operand 1 "u1bit_cint_operand" "n,n,n")
+			     (match_operand 2 "const_int_operand" "O,wM,n")]
+			    UNSPEC_XXSPLTI32DX_CONST))]
+  "TARGET_XXSPLTI32DX"
+  "@
+   xxlxor %x0,%x0,%x0
+   xxlorc %x0,%x0,%x0
+   xxsplti32dx %x0,%1,%2"
+  [(set_attr "type" "veclogical,veclogical,vecperm")
+   (set_attr "prefixed" "*,*,yes")])
+
+;; Second word of XXSPLTI32DX
+(define_insn "*xxsplti32dx_<mode>_second"
+  [(set (match_operand:XXSPLTI32DX 0 "vsx_register_operand" "=wa")
+	(unspec:XXSPLTI32DX [(match_operand:XXSPLTI32DX 1 "vsx_register_operand" "0")
+			     (match_operand 2 "u1bit_cint_operand" "n")
+			     (match_operand 3 "const_int_operand" "n")]
+			    UNSPEC_XXSPLTI32DX_CONST))]
+  "TARGET_XXSPLTI32DX"
+  "xxsplti32dx %x0,%2,%3"
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
 ;; XXBLEND built-in function support
 (define_insn "xxblend_<mode>"
   [(set (match_operand:VM3 0 "register_operand" "=wa")
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 0e87ad1f200..794c5de9516 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -3336,6 +3336,14 @@ A constant whose negation is a signed 16-bit constant.
 @item eI
 A signed 34-bit integer constant if prefixed instructions are supported.
 
+@item eJ
+A scalar constant that can be loaded with two prefixed instructions to
+a VSX register.
+
+@item eK
+A vector constant that can be loaded with two prefixed instructions to
+a VSX register.
+
 @item eS
 A scalar constant that can be loaded with one prefixed instruction to
 a VSX register.
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df-2.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df-2.c
new file mode 100644
index 00000000000..3b4b4e01d1b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df-2.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxsplti32dx -mxxspltiw" } */
+
+#define M_PI		3.14159265358979323846
+#define SUBNORMAL	0x1p-149f
+
+/* Test generation of floating point constants with XXSPLTI32DX.  */
+
+double
+df_double_pi (void)
+{
+  return M_PI;			/* 2x XXSPLTI32DX.  */
+}
+
+/* This float subnormal cannot be loaded with XXSPLTIDP.  */
+
+double
+v2df_double_denorm (void)
+{
+  return SUBNORMAL;		/* XXLXOR, XXSPLTI32DX.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di-2.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di-2.c
new file mode 100644
index 00000000000..30ad33388e8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di-2.c
@@ -0,0 +1,38 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxsplti32dx -mxxspltiw" } */
+
+/* Test generation of integer constants loaded into the vector registers with
+   the ISA 3.1 (power10) instruction XXSPLTI32DX.  We use asm to force the
+   value into vector registers.  */
+
+#define LARGE_BITS	0x12345678ABCDEF01LL
+#define SUBNORMAL	0x8000000000000001LL
+
+/* 0x8000000000000001LL is the bit pattern for a negative subnormal value can
+   be generated with XXSPLTI32DX but not XXSLTIDP.  */
+double
+scalar_float_subnormal (void)
+{
+  /* 2x XXSPLTI32DX.  */
+  double d;
+  long long ll = SUBNORMAL;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* 0x12345678ABCDEF01LL is a large constant that can be loaded with 2x
+   XXSPLTI32DX instructions.  */
+double
+scalar_large_constant (void)
+{
+  /* 2x XXSPLTI32DX.  */
+  double d;
+  long long ll = LARGE_BITS;
+
+  __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll));
+  return d;
+}
+
+/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 4 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df-2.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df-2.c
new file mode 100644
index 00000000000..8bc119ad41f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df-2.c
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxsplti32dx -mxxspltiw" } */
+
+#define M_PI		3.14159265358979323846
+#define SUBNORMAL	0x1p-149f
+
+/* Test generation of floating point constants with XXSPLTI32DX.  */
+
+vector double
+v2df_double_pi (void)
+{
+  /* 2x XXSPLTI32DX.  */
+  return (vector double) { M_PI, M_PI };
+}
+
+vector double
+v2df_double_denorm (void)
+{
+  /* XXLXOR, XXSPLTI32DX.  */
+  return (vector double) { SUBNORMAL, SUBNORMAL };
+}
+
+/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di-2.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di-2.c
new file mode 100644
index 00000000000..2730742752a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di-2.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxsplti32dx -mxxspltiw" } */
+
+/* Test generation of integer constants loaded into the vector registers with
+   the ISA 3.1 (power10) instruction XXSPLTI32DX.  */
+
+#define LARGE_BITS	0x12345678ABCDEF01LL
+#define SUBNORMAL	0x8000000000000001LL
+
+/* 0x8000000000000001LL is the bit pattern for a negative subnormal value can
+   be generated with XXSPLTI32DX but not XXSLTIDP.  */
+vector long long
+vector_float_subnormal (void)
+{
+  /* 2x XXSPLTI32DX.  */
+  return (vector long long) { SUBNORMAL, SUBNORMAL };
+}
+
+/* 0x12345678ABCDEF01LL is a large constant that can be loaded with 2x
+   XXSPLTI32DX instructions.  */
+vector long long
+vector_large_constant (void)
+{
+  /* 2x XXSPLTI32DX.  */
+  return (vector long long) { LARGE_BITS, LARGE_BITS };
+}
+
+/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 4 } } */


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2021-10-15 23:18 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-15 23:18 [gcc(refs/users/meissner/heads/work071)] Generate XXSPLTI32DX on power10 Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).