[gcc(refs/users/meissner/heads/work071)] Generate XXSPLTIDP on power10.

public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed

From: Michael Meissner <meissner@gcc.gnu.org>
To: gcc-cvs@gcc.gnu.org
Subject: [gcc(refs/users/meissner/heads/work071)] Generate XXSPLTIDP on power10.
Date: Wed, 20 Oct 2021 21:46:27 +0000 (GMT)	[thread overview]
Message-ID: <20211020214627.23E0A3858D39@sourceware.org> (raw)

https://gcc.gnu.org/g:4119bec66ccc4ea84b7a01182004c65ad3ca070a

commit 4119bec66ccc4ea84b7a01182004c65ad3ca070a
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Wed Oct 20 17:46:00 2021 -0400

    Generate XXSPLTIDP on power10.
    
    This patch implements XXSPLTIDP support for SF, and DF scalar constants and all
    vector constants.  The XXSPLTIDP instruction is given a 32-bit immediate that
    is converted to a vector of two DFmode constants.  The immediate is in SFmode
    format, so only constants that fit as SFmode values can be loaded with
    XXSPLTIDP.
    
    I added one new constraint (eP) which matches instructions that load a VSX
    register with one prefixed instruction.  This patch adds the XXSPLTIDP
    support.  The next patch will add the XXSPLTIW support.
    
    This patch depends on the previous patch that addes the rs6000_const structure
    and the constant_to_bytes function.
    
    DImode scalar constants are not handled.  This is due to the majority of DImode
    constants will be in the GPR registers.  With vector registers, you have the
    problem that XXSPLTIDP splats the double word into both elements of the
    vector.  However, if TImode is loaded with an integer constant, it wants a full
    128-bit constant.
    
    I have added a temporary switch (-msplat-float-constant) to control whether or
    not the XXSPLTIDP instruction is generated.
    
    I added 4 new tests to test loading up SF/DF scalar and vector constants.
    
    This patch updates the previous patch to take into account the comments
    from the patch review.   The main change is that this patch does is map
    each vector and scalar to provide all of bits and then match those bits to
    see if the XXSPLTIDP instruction can generate the bits necessary, even if
    the values in the vector aren't DFmode constants.
    
    2021-10-20  Michael Meissner  <meissner@the-meissners.org>
    
    gcc/
    
            * config/rs6000/constraints.md (eP): New constraint.
            * config/rs6000/predicates.md (easy_fp_constant): Add support for
            generating XXSPLTIDP.
            (vsx_prefixed_constant): New predicate.
            (easy_vector_constant): Add support for generating XXSPLTIDP.
            * config/rs6000/rs6000-protos.h (prefixed_xxsplti_p): New
            declaration.
            (constant_generates_xxspltidp): New declaration.
            * config/rs6000/rs6000.c (prefixed_xxsplti_p): New function.
            (constant_generates_xxspltidp): New function.
            * config/rs6000/rs6000.md (UNSPEC_XXSPLTIDP_CONST): New unspec.
            (prefixed attribute): Add support for prefixed instructions to load
            constants into VSX registers.
            (movsf_hardfloat): Add support for XXSPLTIDP.
            (mov<mode>_hardfloat32, FMOVE64 iterator): Likewise.
            (mov<mode>_hardfloat64, FMOVE64 iterator): Likewise.
            (xxspltidp_<mode>_internal): New insns.
            (splitter for VSX prefix constants): New splitters.
            * config/rs6000/rs6000.opt (-msplat-float-constant): New debug option.
            * config/rs6000/vsx.md (vsx_mov<mode>_64bit): Add support for
            XXSPLTIDP.
            (vsx_mov<mode>_32bit): Likewise.
            * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the
            eP constraint.
    
    gcc/testsuite/
    
            * gcc.target/powerpc/pr86731-fwrapv-longlong.c: Update insn
            regex for power10.
            * gcc.target/powerpc/vec-splat-constant-df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-sf.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2df.c: New test.
            * gcc.target/powerpc/vec-splat-constant-v2di.c: New test.
            * gcc.target/powerpc/vec-splati-runnable.c: Update insn counts.

Diff:
---
 gcc/config/rs6000/constraints.md                   |   6 +
 gcc/config/rs6000/predicates.md                    |  57 ++++++++
 gcc/config/rs6000/rs6000-protos.h                  |   2 +
 gcc/config/rs6000/rs6000.c                         | 149 +++++++++++++++++++++
 gcc/config/rs6000/rs6000.md                        |  86 +++++++++---
 gcc/config/rs6000/rs6000.opt                       |   4 +
 gcc/config/rs6000/vsx.md                           |  32 ++++-
 .../gcc.target/powerpc/pr86731-fwrapv-longlong.c   |   9 +-
 .../gcc.target/powerpc/vec-splat-constant-df.c     |  60 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-sf.c     |  60 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-v2df.c   |  64 +++++++++
 .../gcc.target/powerpc/vec-splat-constant-v2di.c   |  50 +++++++
 .../gcc.target/powerpc/vec-splati-runnable.c       |   2 +-
 13 files changed, 552 insertions(+), 29 deletions(-)

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index c8cff1a3038..7d594872a78 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -213,6 +213,12 @@
   "A signed 34-bit integer constant if prefixed instructions are supported."
   (match_operand 0 "cint34_operand"))
 
+;; A SF/DF scalar constant or a vector constant that can be loaded into vector
+;; registers with one prefixed instruction such as XXSPLTIDP.
+(define_constraint "eP"
+  "A constant that can be loaded into a VSX register with one prefixed insn."
+  (match_operand 0 "vsx_prefixed_constant"))
+
 ;; Floating-point constraints.  These two are defined so that insn
 ;; length attributes can be calculated exactly.
 
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 956e42bc514..b97b20a1c88 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -601,6 +601,16 @@
   if (TARGET_VSX && op == CONST0_RTX (mode))
     return 1;
 
+  /* Constants that can be generated with ISA 3.1 instructions are easy.  */
+  rs6000_const vsx_const;
+
+  if (TARGET_POWER10
+      && constant_to_bytes (op, mode, &vsx_const, RS6000_CONST_SPLAT_16_BYTES))
+    {
+      if (constant_generates_xxspltidp (&vsx_const))
+	return true;
+    }
+
   /* Otherwise consider floating point constants hard, so that the
      constant gets pushed to memory during the early RTL phases.  This
      has the advantage that double precision constants that can be
@@ -609,6 +619,42 @@
    return 0;
 })
 
+;; Return 1 if the operand is a 64-bit floating point scalar constant or a
+;; vector constant that can be loaded to a VSX register with one prefixed
+;; instruction, such as XXSPLTIDP.
+;;
+;; In addition regular constants, we also recognize constants formed with the
+;; VEC_DUPLICATE insn from scalar constants.
+;;
+;; We don't handle scalar integer constants here because the assumption is the
+;; normal integer constants will be loaded into GPR registers.  For the
+;; constants that need to be loaded into vector registers, the instructions
+;; don't work well with TImode variables assigned a constant.  This is because
+;; the 64-bit scalar constants are splatted into both halves of the register.
+
+(define_predicate "vsx_prefixed_constant"
+  (match_code "const_double,const_vector,vec_duplicate")
+{
+  /* If we can generate the constant with 1-2 Altivec instructions, don't
+      generate a prefixed instruction.  */
+  if (CONST_VECTOR_P (op) && easy_altivec_constant (op, mode))
+    return false;
+
+  /* Do we have prefixed instructions and are VSX registers available?  Is the
+     constant recognized?  */
+  if (!TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  rs6000_const vsx_const;
+  if (!constant_to_bytes (op, mode, &vsx_const, RS6000_CONST_SPLAT_16_BYTES))
+    return false;
+
+  if (constant_generates_xxspltidp (&vsx_const))
+    return true;
+
+  return false;
+})
+
 ;; Return 1 if the operand is a constant that can loaded with a XXSPLTIB
 ;; instruction and then a VUPKHSB, VECSB2W or VECSB2D instruction.
 
@@ -653,6 +699,17 @@
       if (zero_constant (op, mode) || all_ones_constant (op, mode))
 	return true;
 
+      /* Constants that can be generated with ISA 3.1 instructions are
+         easy.  */
+      rs6000_const vsx_const;
+
+      if (TARGET_POWER10
+	  && constant_to_bytes (op, mode, &vsx_const, RS6000_CONST_NO_SPLAT))
+	{
+	  if (constant_generates_xxspltidp (&vsx_const))
+	    return true;
+	}
+
       if (TARGET_P9_VECTOR
           && xxspltib_constant_p (op, mode, &num_insns, &value))
 	return true;
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index a3fecbb7812..ec4f78d9241 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -198,6 +198,7 @@ enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode);
 extern bool prefixed_load_p (rtx_insn *);
 extern bool prefixed_store_p (rtx_insn *);
 extern bool prefixed_paddi_p (rtx_insn *);
+extern bool prefixed_xxsplti_p (rtx_insn *);
 extern void rs6000_asm_output_opcode (FILE *);
 extern void output_pcrel_opt_reloc (rtx);
 extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int);
@@ -257,6 +258,7 @@ typedef struct {
 
 extern bool constant_to_bytes (rtx, machine_mode, rs6000_const *,
 			       rs6000_const_splat);
+extern unsigned constant_generates_xxspltidp (rs6000_const *);
 #endif /* RTX_CODE */
 
 #ifdef TREE_CODE
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index e2f48f5a1e2..40c7e5ceddf 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6990,6 +6990,21 @@ output_vec_const_move (rtx *operands)
 	    gcc_unreachable ();
 	}
 
+      if (TARGET_PREFIXED)
+	{
+	  rs6000_const vsx_const;
+	  if (constant_to_bytes (vec, mode, &vsx_const,
+				 RS6000_CONST_SPLAT_16_BYTES))
+	    {
+	      unsigned imm = constant_generates_xxspltidp (&vsx_const);
+	      if (imm)
+		{
+		  operands[2] = GEN_INT (imm);
+		  return "xxspltidp %x0,%2";
+		}
+	    }
+	}
+
       if (TARGET_P9_VECTOR
 	  && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
 	{
@@ -26724,6 +26739,41 @@ prefixed_paddi_p (rtx_insn *insn)
   return (iform == INSN_FORM_PCREL_EXTERNAL || iform == INSN_FORM_PCREL_LOCAL);
 }
 
+/* Whether an instruction is a prefixed XXSPLTI* instruction.  This is called
+   from the prefixed attribute processing.  */
+
+bool
+prefixed_xxsplti_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  if (!set)
+    return false;
+
+  rtx dest = SET_DEST (set);
+  rtx src = SET_SRC (set);
+  machine_mode mode = GET_MODE (dest);
+
+  if (!REG_P (dest) && !SUBREG_P (dest))
+    return false;
+
+  if (GET_CODE (src) == UNSPEC)
+    {
+      int unspec = XINT (src, 1);
+      return (unspec == UNSPEC_XXSPLTIW
+	      || unspec == UNSPEC_XXSPLTIDP
+	      || unspec == UNSPEC_XXSPLTI32DX);
+    }
+
+  rs6000_const vsx_const;
+  if (constant_to_bytes (src, mode, &vsx_const, RS6000_CONST_SPLAT_16_BYTES))
+    {
+      if (constant_generates_xxspltidp (&vsx_const))
+	return true;
+    }
+
+  return false;
+}
+
 /* Whether the next instruction needs a 'p' prefix issued before the
    instruction is printed out.  */
 static bool prepend_p_to_next_insn;
@@ -28852,6 +28902,105 @@ constant_to_bytes (rtx op,
   return true;
 }
 
+
+/* Determine if a vector constant can be loaded with XXSPLTIDP.  Return zero if
+   the XXSPLTIDP instruction cannot be used.  Otherwise return the immediate
+   value to be used with the XXSPLTIDP instruction.  */
+
+unsigned
+constant_generates_xxspltidp (rs6000_const *vsx_const)
+{
+  if (!TARGET_SPLAT_FLOAT_CONSTANT || !TARGET_PREFIXED || !TARGET_VSX)
+    return 0;
+
+  /* Make sure that the two 64-bit segments are the same.  */
+  if (!vsx_const->all_double_words_same)
+    return 0;
+
+  /* If the bytes, half words, or words are all the same, don't use XXSPLTIDP.
+     Use a simpler instruction (XXSPLTIB, VSPLTISB, VSPLTISH, or VSPLTISW).  */
+  if (vsx_const->all_bytes_same
+      || vsx_const->all_half_words_same
+      || vsx_const->all_words_same)
+    return 0;
+
+  unsigned HOST_WIDE_INT value = vsx_const->double_words[0];
+
+  /* Avoid values that look like DFmode NaN's, except for the normal NaN bit
+     pattern and the signalling NaN bit pattern.  Recognize infinity and
+     negative infinity.  */
+
+  /* Bit representation of DFmode normal quiet NaN.  */
+#define RS6000_CONST_DF_NAN	HOST_WIDE_INT_UC (0x7ff8000000000000)
+
+  /* Bit representation of DFmode normal signaling NaN.  */
+#define RS6000_CONST_DF_NANS	HOST_WIDE_INT_UC (0x7ff4000000000000)
+
+  /* Bit representation of DFmode positive infinity.  */
+#define RS6000_CONST_DF_INF	HOST_WIDE_INT_UC (0x7ff0000000000000)
+
+  /* Bit representation of DFmode negative infinity.  */
+#define RS6000_CONST_DF_NEG_INF	HOST_WIDE_INT_UC (0xfff0000000000000)
+
+  if (value != RS6000_CONST_DF_NAN
+      && value != RS6000_CONST_DF_NANS
+      && value != RS6000_CONST_DF_INF
+      && value != RS6000_CONST_DF_NEG_INF)
+    {
+      /* The IEEE 754 64-bit floating format has 1 bit for sign, 11 bits for
+	 the exponent, and 52 bits for the mantissa (not counting the hidden
+	 bit used for normal numbers).  NaN values have the exponent set to all
+	 1 bits, and the mantissa non-zero (mantissa == 0 is infinity).  */
+
+      int df_exponent = (value >> 52) & 0x7ff;
+      unsigned HOST_WIDE_INT df_mantissa
+	= value & ((HOST_WIDE_INT_1U << 52) - HOST_WIDE_INT_1U);
+
+      if (df_exponent == 0x7ff && df_mantissa != 0)	/* other NaNs.  */
+	return 0;
+
+      /* Avoid values that are DFmode subnormal values.  Subnormal numbers have
+	 the exponent all 0 bits, and the mantissa non-zero.  If the value is
+	 subnormal, then the hidden bit in the mantissa is not set.  */
+      if (df_exponent == 0 && df_mantissa != 0)		/* subnormal.  */
+	return 0;
+    }
+
+  /* Change the representation to DFmode constant.  */
+  long df_words[2] = { vsx_const->words[0], vsx_const->words[1] };
+
+  /* real_from_target takes the target words in  target order.  */
+  if (!BYTES_BIG_ENDIAN)
+    std::swap (df_words[0], df_words[1]);
+
+  REAL_VALUE_TYPE rv_type;
+  real_from_target (&rv_type, df_words, DFmode);
+
+  const REAL_VALUE_TYPE *rv = &rv_type;
+
+  /* Validate that the number can be stored as a SFmode value.  */
+  if (!exact_real_truncate (SFmode, rv))
+    return 0;
+
+  /* Validate that the number is not a SFmode subnormal value (exponent is 0,
+     mantissa field is non-zero) which is undefined for the XXSPLTIDP
+     instruction.  */
+  long sf_value;
+  real_to_target (&sf_value, rv, SFmode);
+
+  /* IEEE 754 32-bit values have 1 bit for the sign, 8 bits for the exponent,
+     and 23 bits for the mantissa.  Subnormal numbers have the exponent all
+     0 bits, and the mantissa non-zero.  */
+  long sf_exponent = (sf_value >> 23) & 0xFF;
+  long sf_mantissa = sf_value & 0x7FFFFF;
+
+  if (sf_exponent == 0 && sf_mantissa != 0)
+    return 0;
+
+  /* Return the immediate to be used.  */
+  return sf_value;
+}
+
 \f
 struct gcc_target targetm = TARGET_INITIALIZER;
 
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 6bec2bddbde..218645aa240 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -156,6 +156,7 @@
    UNSPEC_PEXTD
    UNSPEC_HASHST
    UNSPEC_HASHCHK
+   UNSPEC_XXSPLTIDP_CONST
   ])
 
 ;;
@@ -314,6 +315,11 @@
 
 	 (eq_attr "type" "integer,add")
 	 (if_then_else (match_test "prefixed_paddi_p (insn)")
+		       (const_string "yes")
+		       (const_string "no"))
+
+	 (eq_attr "type" "vecperm")
+	 (if_then_else (match_test "prefixed_xxsplti_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))]
 
@@ -7759,17 +7765,17 @@
 ;;
 ;;	LWZ          LFS        LXSSP       LXSSPX     STFS       STXSSP
 ;;	STXSSPX      STW        XXLXOR      LI         FMR        XSCPSGNDP
-;;	MR           MT<x>      MF<x>       NOP
+;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP
 
 (define_insn "movsf_hardfloat"
   [(set (match_operand:SF 0 "nonimmediate_operand"
 	 "=!r,       f,         v,          wa,        m,         wY,
 	  Z,         m,         wa,         !r,        f,         wa,
-	  !r,        *c*l,      !r,         *h")
+	  !r,        *c*l,      !r,         *h,        wa")
 	(match_operand:SF 1 "input_operand"
 	 "m,         m,         wY,         Z,         f,         v,
 	  wa,        r,         j,          j,         f,         wa,
-	  r,         r,         *h,         0"))]
+	  r,         r,         *h,         0,         eP"))]
   "(register_operand (operands[0], SFmode)
    || register_operand (operands[1], SFmode))
    && TARGET_HARD_FLOAT
@@ -7791,15 +7797,16 @@
    mr %0,%1
    mt%0 %1
    mf%1 %0
-   nop"
+   nop
+   #"
   [(set_attr "type"
 	"load,       fpload,    fpload,     fpload,    fpstore,   fpstore,
 	 fpstore,    store,     veclogical, integer,   fpsimple,  fpsimple,
-	 *,          mtjmpr,    mfjmpr,     *")
+	 *,          mtjmpr,    mfjmpr,     *,         vecperm")
    (set_attr "isa"
 	"*,          *,         p9v,        p8v,       *,         p9v,
 	 p8v,        *,         *,          *,         *,         *,
-	 *,          *,         *,          *")])
+	 *,          *,         *,          *,         p10")])
 
 ;;	LWZ          LFIWZX     STW        STFIWX     MTVSRWZ    MFVSRWZ
 ;;	FMR          MR         MT%0       MF%1       NOP
@@ -8059,18 +8066,18 @@
 
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSD         STXSD       XXLOR       XXLXOR      GPR<-0
-;;           LWZ          STW         MR
+;;           LWZ          STW         MR          XXSPLTIDP
 
 
 (define_insn "*mov<mode>_hardfloat32"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
             "=m,          d,          d,          <f64_p9>,   wY,
               <f64_av>,   Z,          <f64_vsx>,  <f64_vsx>,  !r,
-              Y,          r,          !r")
+              Y,          r,          !r,         wa")
 	(match_operand:FMOVE64 1 "input_operand"
              "d,          m,          d,          wY,         <f64_p9>,
               Z,          <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
-              r,          Y,          r"))]
+              r,          Y,          r,          eP"))]
   "! TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8087,20 +8094,21 @@
    #
    #
    #
+   #
    #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, two,
-             store,       load,       two")
+             store,       load,       two,        vecperm")
    (set_attr "size" "64")
    (set_attr "length"
             "*,           *,          *,          *,          *,
              *,           *,          *,          *,          8,
-             8,           8,          8")
+             8,           8,          8,          *")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
-             *,           *,          *")])
+             *,           *,          *,          p10")])
 
 ;;           STW      LWZ     MR      G-const H-const F-const
 
@@ -8127,19 +8135,19 @@
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSDX        STXSDX      XXLOR       XXLXOR      LI 0
 ;;           STD          LD          MR          MT{CTR,LR}  MF{CTR,LR}
-;;           NOP          MFVSRD      MTVSRD
+;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP
 
 (define_insn "*mov<mode>_hardfloat64"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
            "=m,           d,          d,          <f64_p9>,   wY,
              <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
              YZ,          r,          !r,         *c*l,       !r,
-            *h,           r,          <f64_dm>")
+            *h,           r,          <f64_dm>,   wa")
 	(match_operand:FMOVE64 1 "input_operand"
             "d,           m,          d,          wY,         <f64_p9>,
              Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
              r,           YZ,         r,          r,          *h,
-             0,           <f64_dm>,   r"))]
+             0,           <f64_dm>,   r,          eP"))]
   "TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8161,18 +8169,19 @@
    mf%1 %0
    nop
    mfvsrd %0,%x1
-   mtvsrd %x0,%1"
+   mtvsrd %x0,%1
+   #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, integer,
              store,       load,       *,          mtjmpr,     mfjmpr,
-             *,           mfvsr,      mtvsr")
+             *,           mfvsr,      mtvsr,      vecperm")
    (set_attr "size" "64")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
              *,           *,          *,          *,          *,
-             *,           p8v,        p8v")])
+             *,           p8v,        p8v,        p10")])
 
 ;;           STD      LD       MR      MT<SPR> MF<SPR> G-const
 ;;           H-const  F-const  Special
@@ -8206,6 +8215,47 @@
    (set_attr "length"
             "*,       *,      *,      *,      *,      8,
              12,      16,     *")])
+
+;; Split the VSX prefixed instruction to support SFmode and DFmode scalar
+;; constants that look like DFmode floating point values where both elements
+;; are the same.  The constant has to be expressible as a SFmode constant that
+;; is not a SFmode denormal value.
+;;
+;; We don't need splitters for the 128-bit types, since the function
+;; rs6000_output_move_128bit handles the generation of XXSPLTIDP.
+(define_insn "xxspltidp_<mode>_internal"
+  [(set (match_operand:SFDF 0 "register_operand" "=wa")
+	(unspec:SFDF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
+		     UNSPEC_XXSPLTIDP_CONST))]
+  "TARGET_POWER10"
+  "xxspltidp %x0,%1"
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
+(define_split
+  [(set (match_operand:SFDF 0 "vsx_register_operand")
+	(match_operand:SFDF 1 "vsx_prefixed_constant"))]
+  "TARGET_POWER10"
+  [(pc)]
+{
+  rtx dest = operands[0];
+  rtx src = operands[1];
+  rs6000_const vsx_const;
+
+  if (!constant_to_bytes (src, <MODE>mode, &vsx_const, RS6000_CONST_SPLAT_16_BYTES))
+    gcc_unreachable ();
+
+  unsigned imm = constant_generates_xxspltidp (&vsx_const);
+  if (imm)
+    {
+      emit_insn (gen_xxspltidp_<mode>_internal (dest, GEN_INT (imm)));
+      DONE;
+    }
+
+  else
+    gcc_unreachable ();
+})
+
 \f
 (define_expand "mov<mode>"
   [(set (match_operand:FMOVE128 0 "general_operand")
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 9d7878f144a..429da57d19d 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -640,6 +640,10 @@ mprivileged
 Target Var(rs6000_privileged) Init(0)
 Generate code that will run in privileged state.
 
+msplat-float-constant
+Target Var(TARGET_SPLAT_FLOAT_CONSTANT) Init(1) Save
+Generate (do not generate) code that uses the XXSPLTIDP instruction.
+
 -param=rs6000-density-pct-threshold=
 Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) IntegerRange(0, 100) Param
 When costing for loop vectorization, we probably need to penalize the loop body
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index bf033e31c1c..0ceecc1975c 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1192,16 +1192,19 @@
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
+;;              XXLSPLTI*
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        r,         we,        ?wQ,
                 ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
+                wa,
                 ?wa,       v,         <??r>,     wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        we,        r,         r,
                 wQ,        Y,         r,         r,         wE,        jwM,
+                eP,
                 ?jwM,      W,         <nW>,      v,         wZ"))]
 
   "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
@@ -1213,36 +1216,47 @@
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
                 store,     load,      store,     *,         vecsimple, vecsimple,
+                vecperm,
                 vecsimple, *,         *,         vecstore,  vecload")
    (set_attr "num_insns"
                "*,         *,         *,         2,         *,         2,
                 2,         2,         2,         2,         *,         *,
+                *,
                 *,         5,         2,         *,         *")
    (set_attr "max_prefixed_insns"
                "*,         *,         *,         *,         *,         2,
                 2,         2,         2,         2,         *,         *,
+                *,
                 *,         *,         *,         *,         *")
    (set_attr "length"
                "*,         *,         *,         8,         *,         8,
                 8,         8,         8,         8,         *,         *,
+                *,
                 *,         20,        8,         *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
                 *,         *,         *,         *,         p9v,       *,
+                p10,
                 <VSisa>,   *,         *,         *,         *")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
-;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
+;;              XXSPLTIB   VSPLTISW   VSX 0/-1
+;;              XXSPLTI*
+;;              VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
-                wa,        v,         ?wa,       v,         <??r>,
+                wa,        v,         ?wa,
+                wa,
+                v,         <??r>,
                 wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        Y,         r,         r,
-                wE,        jwM,       ?jwM,      W,         <nW>,
+                wE,        jwM,       ?jwM,
+                eP,
+                W,         <nW>,
                 v,         wZ"))]
 
   "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
@@ -1253,15 +1267,21 @@
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, load,      store,    *,
-                vecsimple, vecsimple, vecsimple, *,         *,
+                vecsimple, vecsimple, vecsimple,
+                vecperm,
+                *,         *,
                 vecstore,  vecload")
    (set_attr "length"
                "*,         *,         *,         16,        16,        16,
-                *,         *,         *,         20,        16,
+                *,         *,         *,
+                *,
+                20,        16,
                 *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
-                p9v,       *,         <VSisa>,   *,         *,
+                p9v,       *,         <VSisa>,
+                p10,
+                *,         *,
                 *,         *")])
 
 ;; Explicit  load/store expanders for the builtin functions
diff --git a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
index bd1502bb30a..dcb30e1d886 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c
@@ -24,11 +24,12 @@ vector signed long long splats4(void)
         return (vector signed long long) vec_sl(mzero, mzero);
 }
 
-/* Codegen will consist of splat and shift instructions for most types.
-   If folding is enabled, the vec_sl tests using vector long long type will
-   generate a lvx instead of a vspltisw+vsld pair.  */
+/* Codegen will consist of splat and shift instructions for most types.  If
+   folding is enabled, the vec_sl tests using vector long long type will
+   generate a lvx instead of a vspltisw+vsld pair.  On power10, it will
+   generate a xxspltidp instruction instead of the lvx.  */
 
 /* { dg-final { scan-assembler-times {\mvspltis[bhw]\M} 0 } } */
 /* { dg-final { scan-assembler-times {\mvsl[bhwd]\M} 0 } } */
-/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M|\mxxspltidp\M} 2 } } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
new file mode 100644
index 00000000000..8f6e176f9af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+double
+scalar_double_0 (void)
+{
+  return 0.0;			/* XXSPLTIB or XXLXOR.  */
+}
+
+double
+scalar_double_1 (void)
+{
+  return 1.0;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+double
+scalar_double_m0 (void)
+{
+  return -0.0;			/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_nan (void)
+{
+  return __builtin_nan ("");	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_inf (void)
+{
+  return __builtin_inf ();	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inf ();
+}
+#endif
+
+double
+scalar_double_pi (void)
+{
+  return M_PI;			/* PLFD.  */
+}
+
+double
+scalar_double_denorm (void)
+{
+  return 0x1p-149f;		/* PLFD.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
new file mode 100644
index 00000000000..72504bdfbbd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating SFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+float
+scalar_float_0 (void)
+{
+  return 0.0f;			/* XXSPLTIB or XXLXOR.  */
+}
+
+float
+scalar_float_1 (void)
+{
+  return 1.0f;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+float
+scalar_float_m0 (void)
+{
+  return -0.0f;			/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_nan (void)
+{
+  return __builtin_nanf ("");	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_inf (void)
+{
+  return __builtin_inff ();	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inff ();
+}
+#endif
+
+float
+scalar_float_pi (void)
+{
+  return (float)M_PI;		/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_denorm (void)
+{
+  return 0x1p-149f;		/* PLFS.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
new file mode 100644
index 00000000000..82ffc86f8aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
@@ -0,0 +1,64 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating V2DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+vector double
+v2df_double_0 (void)
+{
+  return (vector double) { 0.0, 0.0 };			/* XXSPLTIB or XXLXOR.  */
+}
+
+vector double
+v2df_double_1 (void)
+{
+  return (vector double) { 1.0, 1.0 };			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+vector double
+v2df_double_m0 (void)
+{
+  return (vector double) { -0.0, -0.0 };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_nan (void)
+{
+  return (vector double) { __builtin_nan (""),
+			   __builtin_nan ("") };	/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_inf (void)
+{
+  return (vector double) { __builtin_inf (),
+			   __builtin_inf () };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_m_inf (void)
+{
+  return (vector double) { - __builtin_inf (),
+			   - __builtin_inf () };	/* XXSPLTIDP.  */
+}
+#endif
+
+vector double
+v2df_double_pi (void)
+{
+  return (vector double) { M_PI, M_PI };		/* PLVX.  */
+}
+
+vector double
+v2df_double_denorm (void)
+{
+  return (vector double) { (double)0x1p-149f,
+			   (double)0x1p-149f };		/* PLVX.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
new file mode 100644
index 00000000000..4d44f943d26
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c
@@ -0,0 +1,50 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+/* Test generating V2DImode constants that have the same bit pattern as
+   V2DFmode constants that can be loaded with the XXSPLTIDP instruction with
+   the ISA 3.1 (power10).  */
+
+vector long long
+vector_0 (void)
+{
+  /* XXSPLTIB or XXLXOR.  */
+  return (vector long long) { 0LL, 0LL };
+}
+
+vector long long
+vector_1 (void)
+{
+  /* XXSPLTIB and VEXTSB2D.  */
+  return (vector long long) { 1LL, 1LL };
+}
+
+/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated
+   with XXSPLTISDP.  */
+vector long long
+vector_float_neg_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x8000000000000000LL, 0x8000000000000000LL };
+}
+
+/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with
+   XXSPLTISDP.  */
+vector long long
+vector_float_1_0 (void)
+{
+  /* XXSPLTIDP.  */
+  return (vector long long) { 0x3ff0000000000000LL, 0x3ff0000000000000LL };
+}
+
+/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated
+   with XXSPLTIDP.  */
+vector long long
+scalar_pi (void)
+{
+  /* PLXV.  */
+  return (vector long long) { 0x400921fb54442d18LL, 0x400921fb54442d18LL };
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
index a135279b1d7..5f84930e1a7 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -150,7 +150,7 @@ main (int argc, char *argv [])
 }
 
 /* { dg-final { scan-assembler-times {\mxxspltiw\M} 2 } } */
-/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 3 } } */
 /* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */

next             reply	other threads:[~2021-10-20 21:46 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-20 21:46 Michael Meissner [this message]
  -- strict thread matches above, loose matches on Subject: below --
2021-10-21  2:33 Michael Meissner
2021-10-20 22:44 Michael Meissner
2021-10-18 17:53 Michael Meissner
2021-10-18 17:02 Michael Meissner
2021-10-15  2:42 Michael Meissner
2021-10-14 16:49 Michael Meissner
2021-10-14 15:56 Michael Meissner
2021-10-14 14:57 Michael Meissner
2021-10-14  3:24 Michael Meissner
2021-10-12 22:05 Michael Meissner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211020214627.23E0A3858D39@sourceware.org \
    --to=meissner@gcc.gnu.org \
    --cc=gcc-cvs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).