[PATCH] Generate XXSPLTIDP on power10.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH] Generate XXSPLTIDP on power10.
@ 2021-08-25 19:46 Michael Meissner
  2021-08-26 19:17 ` will schmidt
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Michael Meissner @ 2021-08-25 19:46 UTC (permalink / raw)
  To: gcc-patches, Michael Meissner, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner, Will Schmidt

Generate XXSPLTIDP on power10.

This patch implements XXSPLTIDP support for SF and DF scalar constants and V2DF
vector constants.  The XXSPLTIDP instruction is given a 32-bit immediate that
is converted to a vector of two DFmode constants.  The immediate is in SFmode
format, so only constants that fit as SFmode values can be loaded with
XXSPLTIDP.

I added a new constraint (eF) to match constants that can be loaded with the
XXSPLTIDP instruction.

I have added a temporary switch (-mxxspltidp) to control whether or not the
XXSPLTIDP instruction is generated.

I added 3 new tests to test loading up SF/DF scalar and V2DF vector
constants.

I have tested this with bootstrap compilers on power10 systems and there was no
regression.  I have built GCC with these patches on little endian power9 and
big endian power8 systems, and there were no regressions.

In addition, I have built and run the full Spec 2017 rate suite, comparing with
the patches enabled and not enabled.  There were roughly 66,000 XXSPLTIDP's
generated in the rate build for Spec 2017.  On a stand-alone system that is
running single threaded, blender_r has a 1.9% increase in performance, and rest
of the benchmarks are performance neutral.  However, I would expect that in a
real world scenario, switching to use XXSPLTIDP will increase performance due
to removing all of the loads.

Can I check this into the master branch?

2021-08-25  Michael Meissner  <meissner@linux.ibm.com>

gcc/
	* config/rs6000/constraints.md (eF): New constraint.
	* config/rs6000/predicates.md (easy_fp_constant): If we can load
	the scalar constant with XXSPLTIDP, the floating point constant is
	easy.
	(xxspltidp_operand): New predicate.
	(easy_vector_constant): If we can generate XXSPLTIDP, mark the
	vector constant as easy.
	* config/rs6000/rs6000-protos.h (xxspltidp_constant_p): New
	declaration.
	(prefixed_permute_p): Likewise.
	* config/rs6000/rs6000.c (xxspltidp_constant_p): New function.
	(output_vec_const_move): Add support for XXSPLTIDP.
	(prefixed_permute_p): New function.
	* config/rs6000/rs6000.md (prefixed attribute): Add support for
	permute prefixed instructions.
	(movsf_hardfloat): Add XXSPLTIDP support.
	(mov<mode>_hardfloat32, FMOVE64 iterator): Likewise.
	(mov<mode>_hardfloat64, FMOVE64 iterator): Likewise.
	* config/rs6000/rs6000.opt (-mxxspltidp): New switch.
	* config/rs6000/vsx.md (vsx_move<mode>_64bit): Add XXSPLTIDP
	support.
	(vsx_move<mode>_32bit): Likewise.
	(vsx_splat_v2df_xxspltidp): New insn.
	(XXSPLTIDP): New mode iterator.
	(xxspltidp_<mode>_internal): New insn and splits.
	(xxspltidp_<mode>_inst): Replace xxspltidp_v2df_inst with an
	iterated form that also does SFmode, and DFmode.

gcc/testsuite/
	* gcc.target/powerpc/vec-splat-constant-sf.c: New test.
	* gcc.target/powerpc/vec-splat-constant-df.c: New test.
	* gcc.target/powerpc/vec-splat-constant-v2df.c: New test.
---
 gcc/config/rs6000/constraints.md              |   5 +
 gcc/config/rs6000/predicates.md               |  17 +++
 gcc/config/rs6000/rs6000-protos.h             |   2 +
 gcc/config/rs6000/rs6000.c                    | 106 ++++++++++++++++++
 gcc/config/rs6000/rs6000.md                   |  45 +++++---
 gcc/config/rs6000/rs6000.opt                  |   4 +
 gcc/config/rs6000/vsx.md                      |  64 ++++++++++-
 .../powerpc/vec-splat-constant-df.c           |  60 ++++++++++
 .../powerpc/vec-splat-constant-sf.c           |  60 ++++++++++
 .../powerpc/vec-splat-constant-v2df.c         |  64 +++++++++++
 10 files changed, 405 insertions(+), 22 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c

diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index c8cff1a3038..ea2e4a267c3 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -208,6 +208,11 @@ (define_constraint "P"
   (and (match_code "const_int")
        (match_test "((- (unsigned HOST_WIDE_INT) ival) + 0x8000) < 0x10000")))
 
+;; SF/DF/V2DF scalar or vector constant that can be loaded with XXSPLTIDP
+(define_constraint "eF"
+  "A vector constant that can be loaded with the XXSPLTIDP instruction."
+  (match_operand 0 "xxspltidp_operand"))
+
 ;; 34-bit signed integer constant
 (define_constraint "eI"
   "A signed 34-bit integer constant if prefixed instructions are supported."
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 956e42bc514..134243e404b 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -601,6 +601,11 @@ (define_predicate "easy_fp_constant"
   if (TARGET_VSX && op == CONST0_RTX (mode))
     return 1;
 
+  /* If we have the ISA 3.1 XXSPLTIDP instruction, see if the constant can
+     be loaded with that instruction.  */
+  if (xxspltidp_operand (op, mode))
+    return 1;
+
   /* Otherwise consider floating point constants hard, so that the
      constant gets pushed to memory during the early RTL phases.  This
      has the advantage that double precision constants that can be
@@ -640,6 +645,15 @@ (define_predicate "xxspltib_constant_nosplit"
   return num_insns == 1;
 })
 
+;; Return 1 if operand is a SF/DF CONST_DOUBLE or V2DF CONST_VECTOR that can be
+;; loaded via the ISA 3.1 XXSPLTIDP instruction.
+(define_predicate "xxspltidp_operand"
+  (match_code "const_double,const_vector,vec_duplicate")
+{
+  HOST_WIDE_INT value = 0;
+  return xxspltidp_constant_p (op, mode, &value);
+})
+
 ;; Return 1 if the operand is a CONST_VECTOR and can be loaded into a
 ;; vector register without using memory.
 (define_predicate "easy_vector_constant"
@@ -653,6 +667,9 @@ (define_predicate "easy_vector_constant"
       if (zero_constant (op, mode) || all_ones_constant (op, mode))
 	return true;
 
+      if (xxspltidp_operand (op, mode))
+	return true;
+
       if (TARGET_P9_VECTOR
           && xxspltib_constant_p (op, mode, &num_insns, &value))
 	return true;
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 14f6b313105..9bba57c22f2 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -32,6 +32,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, int, int, int,
 
 extern int easy_altivec_constant (rtx, machine_mode);
 extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
+extern bool xxspltidp_constant_p (rtx, machine_mode, HOST_WIDE_INT *);
 extern int vspltis_shifted (rtx);
 extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
 extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
@@ -198,6 +199,7 @@ enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode);
 extern bool prefixed_load_p (rtx_insn *);
 extern bool prefixed_store_p (rtx_insn *);
 extern bool prefixed_paddi_p (rtx_insn *);
+extern bool prefixed_permute_p (rtx_insn *);
 extern void rs6000_asm_output_opcode (FILE *);
 extern void output_pcrel_opt_reloc (rtx);
 extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index e073b26b430..322b3c83925 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6533,6 +6533,74 @@ xxspltib_constant_p (rtx op,
   return true;
 }
 
+/* Return true if OP is of the given MODE and can be synthesized with ISA 3.1
+   XXSPLTIDP instruction.
+
+   Return the constant that is being split via CONSTANT_PTR to use in the
+   XXSPLTIDP instruction.  */
+
+bool
+xxspltidp_constant_p (rtx op,
+		      machine_mode mode,
+		      HOST_WIDE_INT *constant_ptr)
+{
+  *constant_ptr = 0;
+
+  if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX)
+    return false;
+
+  if (mode == VOIDmode)
+    mode = GET_MODE (op);
+
+  rtx element = op;
+  if (mode == V2DFmode)
+    {
+      if (CONST_VECTOR_P (op))
+	{
+	  element = CONST_VECTOR_ELT (op, 0);
+	  if (!rtx_equal_p (element, CONST_VECTOR_ELT (op, 1)))
+	    return false;
+	}
+
+      else if (GET_CODE (op) == VEC_DUPLICATE)
+	element = XEXP (op, 0);
+
+      else
+	return false;
+
+      mode = DFmode;
+    }
+
+  if (mode != SFmode && mode != DFmode)
+    return false;
+
+  if (GET_MODE (element) != mode)
+    return false;
+
+  if (!CONST_DOUBLE_P (element))
+    return false;
+
+  /* Don't return true for 0.0 since that is easy to create without
+     XXSPLTIDP.  */
+  if (element == CONST0_RTX (mode))
+    return false;
+
+  /* If the value doesn't fit in a SFmode, exactly, we can't use XXSPLTIDP.  */
+  const struct real_value *rv = CONST_DOUBLE_REAL_VALUE (element);
+  if (!exact_real_truncate (SFmode, rv))
+    return false;
+
+  long value;
+  REAL_VALUE_TO_TARGET_SINGLE (*rv, value);
+
+  /* Test for SFmode denormal (exponent is 0, mantissa field is non-zero).  */
+  if (((value & 0x7F800000) == 0) && ((value & 0x7FFFFF) != 0))
+    return false;
+
+  *constant_ptr = value;
+  return true;
+}
+
 const char *
 output_vec_const_move (rtx *operands)
 {
@@ -6548,6 +6616,7 @@ output_vec_const_move (rtx *operands)
     {
       bool dest_vmx_p = ALTIVEC_REGNO_P (REGNO (dest));
       int xxspltib_value = 256;
+      HOST_WIDE_INT xxspltidp_value = 0;
       int num_insns = -1;
 
       if (zero_constant (vec, mode))
@@ -6577,6 +6646,12 @@ output_vec_const_move (rtx *operands)
 	    gcc_unreachable ();
 	}
 
+      if (xxspltidp_constant_p (vec, mode, &xxspltidp_value))
+	{
+	  operands[2] = GEN_INT (xxspltidp_value);
+	  return "xxspltidp %x0,%2";
+	}
+
       if (TARGET_P9_VECTOR
 	  && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
 	{
@@ -26219,6 +26294,37 @@ prefixed_paddi_p (rtx_insn *insn)
   return (iform == INSN_FORM_PCREL_EXTERNAL || iform == INSN_FORM_PCREL_LOCAL);
 }
 
+/* Whether a permute type instruction is a prefixed instruction.  This is
+   called from the prefixed attribute processing.  */
+
+bool
+prefixed_permute_p (rtx_insn *insn)
+{
+  rtx set = single_set (insn);
+  if (!set)
+    return false;
+
+  rtx dest = SET_DEST (set);
+  rtx src = SET_SRC (set);
+  machine_mode mode = GET_MODE (dest);
+
+  if (!REG_P (dest) && !SUBREG_P (dest))
+    return false;
+
+  switch (mode)
+    {
+    case DFmode:
+    case SFmode:
+    case V2DFmode:
+      return xxspltidp_operand (src, mode);
+
+    default:
+      break;
+    }
+
+  return false;
+}
+
 /* Whether the next instruction needs a 'p' prefix issued before the
    instruction is printed out.  */
 static bool prepend_p_to_next_insn;
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index a84438f8545..bf3bfed3b88 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -314,6 +314,11 @@ (define_attr "prefixed" "no,yes"
 
 	 (eq_attr "type" "integer,add")
 	 (if_then_else (match_test "prefixed_paddi_p (insn)")
+		       (const_string "yes")
+		       (const_string "no"))
+
+	 (eq_attr "type" "vecperm")
+	 (if_then_else (match_test "prefixed_permute_p (insn)")
 		       (const_string "yes")
 		       (const_string "no"))]
 
@@ -7723,17 +7728,17 @@ (define_split
 ;;
 ;;	LWZ          LFS        LXSSP       LXSSPX     STFS       STXSSP
 ;;	STXSSPX      STW        XXLXOR      LI         FMR        XSCPSGNDP
-;;	MR           MT<x>      MF<x>       NOP
+;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP
 
 (define_insn "movsf_hardfloat"
   [(set (match_operand:SF 0 "nonimmediate_operand"
 	 "=!r,       f,         v,          wa,        m,         wY,
 	  Z,         m,         wa,         !r,        f,         wa,
-	  !r,        *c*l,      !r,         *h")
+	  !r,        *c*l,      !r,         *h,        wa")
 	(match_operand:SF 1 "input_operand"
 	 "m,         m,         wY,         Z,         f,         v,
 	  wa,        r,         j,          j,         f,         wa,
-	  r,         r,         *h,         0"))]
+	  r,         r,         *h,         0,         eF"))]
   "(register_operand (operands[0], SFmode)
    || register_operand (operands[1], SFmode))
    && TARGET_HARD_FLOAT
@@ -7755,15 +7760,16 @@ (define_insn "movsf_hardfloat"
    mr %0,%1
    mt%0 %1
    mf%1 %0
-   nop"
+   nop
+   #"
   [(set_attr "type"
 	"load,       fpload,    fpload,     fpload,    fpstore,   fpstore,
 	 fpstore,    store,     veclogical, integer,   fpsimple,  fpsimple,
-	 *,          mtjmpr,    mfjmpr,     *")
+	 *,          mtjmpr,    mfjmpr,     *,         vecperm")
    (set_attr "isa"
 	"*,          *,         p9v,        p8v,       *,         p9v,
 	 p8v,        *,         *,          *,         *,         *,
-	 *,          *,         *,          *")])
+	 *,          *,         *,          *,         p10")])
 
 ;;	LWZ          LFIWZX     STW        STFIWX     MTVSRWZ    MFVSRWZ
 ;;	FMR          MR         MT%0       MF%1       NOP
@@ -8023,18 +8029,18 @@ (define_split
 
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSD         STXSD       XXLOR       XXLXOR      GPR<-0
-;;           LWZ          STW         MR
+;;           LWZ          STW         MR          XXSPLTIDP
 
 
 (define_insn "*mov<mode>_hardfloat32"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
             "=m,          d,          d,          <f64_p9>,   wY,
               <f64_av>,   Z,          <f64_vsx>,  <f64_vsx>,  !r,
-              Y,          r,          !r")
+              Y,          r,          !r,         wa")
 	(match_operand:FMOVE64 1 "input_operand"
              "d,          m,          d,          wY,         <f64_p9>,
               Z,          <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
-              r,          Y,          r"))]
+              r,          Y,          r,          eF"))]
   "! TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8051,20 +8057,21 @@ (define_insn "*mov<mode>_hardfloat32"
    #
    #
    #
+   #
    #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, two,
-             store,       load,       two")
+             store,       load,       two,        vecperm")
    (set_attr "size" "64")
    (set_attr "length"
             "*,           *,          *,          *,          *,
              *,           *,          *,          *,          8,
-             8,           8,          8")
+             8,           8,          8,          *")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
-             *,           *,          *")])
+             *,           *,          *,          p10")])
 
 ;;           STW      LWZ     MR      G-const H-const F-const
 
@@ -8091,19 +8098,19 @@ (define_insn "*mov<mode>_softfloat32"
 ;;           STFD         LFD         FMR         LXSD        STXSD
 ;;           LXSDX        STXSDX      XXLOR       XXLXOR      LI 0
 ;;           STD          LD          MR          MT{CTR,LR}  MF{CTR,LR}
-;;           NOP          MFVSRD      MTVSRD
+;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP
 
 (define_insn "*mov<mode>_hardfloat64"
   [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
            "=m,           d,          d,          <f64_p9>,   wY,
              <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
              YZ,          r,          !r,         *c*l,       !r,
-            *h,           r,          <f64_dm>")
+            *h,           r,          <f64_dm>,   wa")
 	(match_operand:FMOVE64 1 "input_operand"
             "d,           m,          d,          wY,         <f64_p9>,
              Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
              r,           YZ,         r,          r,          *h,
-             0,           <f64_dm>,   r"))]
+             0,           <f64_dm>,   r,          eF"))]
   "TARGET_POWERPC64 && TARGET_HARD_FLOAT
    && (gpc_reg_operand (operands[0], <MODE>mode)
        || gpc_reg_operand (operands[1], <MODE>mode))"
@@ -8125,18 +8132,19 @@ (define_insn "*mov<mode>_hardfloat64"
    mf%1 %0
    nop
    mfvsrd %0,%x1
-   mtvsrd %x0,%1"
+   mtvsrd %x0,%1
+   #"
   [(set_attr "type"
             "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
              fpload,      fpstore,    veclogical, veclogical, integer,
              store,       load,       *,          mtjmpr,     mfjmpr,
-             *,           mfvsr,      mtvsr")
+             *,           mfvsr,      mtvsr,      vecperm")
    (set_attr "size" "64")
    (set_attr "isa"
             "*,           *,          *,          p9v,        p9v,
              p7v,         p7v,        *,          *,          *,
              *,           *,          *,          *,          *,
-             *,           p8v,        p8v")])
+             *,           p8v,        p8v,        p10")])
 
 ;;           STD      LD       MR      MT<SPR> MF<SPR> G-const
 ;;           H-const  F-const  Special
@@ -8170,6 +8178,7 @@ (define_insn "*mov<mode>_softfloat64"
    (set_attr "length"
             "*,       *,      *,      *,      *,      8,
              12,      16,     *")])
+
 \f
 (define_expand "mov<mode>"
   [(set (match_operand:FMOVE128 0 "general_operand")
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 0538db387dc..928c4fafe07 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -639,3 +639,7 @@ Enable instructions that guard against return-oriented programming attacks.
 mprivileged
 Target Var(rs6000_privileged) Init(0)
 Generate code that will run in privileged state.
+
+mxxspltidp
+Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
+Generate (do not generate) XXSPLTIDP instructions.
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index bf033e31c1c..af9a04870d4 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1191,16 +1191,19 @@ (define_insn_and_split "*xxspltib_<mode>_split"
 ;; instruction). But generate XXLXOR/XXLORC if it will avoid a register move.
 
 ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
+;;		XXSPLTIDP
 ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
 ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
 (define_insn "vsx_mov<mode>_64bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        r,         we,        ?wQ,
+                wa,
                 ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
                 ?wa,       v,         <??r>,     wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        we,        r,         r,
+                eF,
                 wQ,        Y,         r,         r,         wE,        jwM,
                 ?jwM,      W,         <nW>,      v,         wZ"))]
 
@@ -1212,36 +1215,44 @@ (define_insn "vsx_mov<mode>_64bit"
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
+                vecperm,
                 store,     load,      store,     *,         vecsimple, vecsimple,
                 vecsimple, *,         *,         vecstore,  vecload")
    (set_attr "num_insns"
                "*,         *,         *,         2,         *,         2,
+                *,
                 2,         2,         2,         2,         *,         *,
                 *,         5,         2,         *,         *")
    (set_attr "max_prefixed_insns"
                "*,         *,         *,         *,         *,         2,
+                *,
                 2,         2,         2,         2,         *,         *,
                 *,         *,         *,         *,         *")
    (set_attr "length"
                "*,         *,         *,         8,         *,         8,
+                *,
                 8,         8,         8,         8,         *,         *,
                 *,         20,        8,         *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
+                p10,
                 *,         *,         *,         *,         p9v,       *,
                 <VSisa>,   *,         *,         *,         *")])
 
 ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
+;;		XXSPLTIDP
 ;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
 ;;              LVX (VMX)  STVX (VMX)
 (define_insn "*vsx_mov<mode>_32bit"
   [(set (match_operand:VSX_M 0 "nonimmediate_operand"
                "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
+                wa,
                 wa,        v,         ?wa,       v,         <??r>,
                 wZ,        v")
 
 	(match_operand:VSX_M 1 "input_operand" 
                "wa,        ZwO,       wa,        Y,         r,         r,
+                eF,
                 wE,        jwM,       ?jwM,      W,         <nW>,
                 v,         wZ"))]
 
@@ -1253,14 +1264,17 @@ (define_insn "*vsx_mov<mode>_32bit"
 }
   [(set_attr "type"
                "vecstore,  vecload,   vecsimple, load,      store,    *,
+                vecperm,
                 vecsimple, vecsimple, vecsimple, *,         *,
                 vecstore,  vecload")
    (set_attr "length"
                "*,         *,         *,         16,        16,        16,
+                *,
                 *,         *,         *,         20,        16,
                 *,         *")
    (set_attr "isa"
                "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
+                p10,
                 p9v,       *,         <VSisa>,   *,         *,
                 *,         *")])
 
@@ -4580,6 +4594,23 @@ (define_insn "vsx_splat_<mode>_reg"
    mtvsrdd %x0,%1,%1"
   [(set_attr "type" "vecperm,vecmove")])
 
+(define_insn "*vsx_splat_v2df_xxspltidp"
+  [(set (match_operand:V2DF 0 "vsx_register_operand" "=wa")
+	(vec_duplicate:V2DF
+	 (match_operand:DF 1 "xxspltidp_operand" "eF")))]
+  "TARGET_POWER10"
+{
+  HOST_WIDE_INT value;
+
+  if (!xxspltidp_constant_p (operands[1], DFmode, &value))
+    gcc_unreachable ();
+
+  operands[2] = GEN_INT (value);
+  return "xxspltidp %x0,%1";
+}
+  [(set_attr "type" "vecperm")
+   (set_attr "prefixed" "yes")])
+
 (define_insn "vsx_splat_<mode>_mem"
   [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
 	(vec_duplicate:VSX_D
@@ -6449,15 +6480,40 @@ (define_expand "xxspltidp_v2df"
   DONE;
 })
 
-(define_insn "xxspltidp_v2df_inst"
-  [(set (match_operand:V2DF 0 "register_operand" "=wa")
-	(unspec:V2DF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
-		     UNSPEC_XXSPLTIDP))]
+(define_mode_iterator XXSPLTIDP [SF DF V2DF])
+
+(define_insn "xxspltidp_<mode>_inst"
+  [(set (match_operand:XXSPLTIDP 0 "register_operand" "=wa")
+	(unspec:XXSPLTIDP [(match_operand:SI 1 "c32bit_cint_operand" "n")]
+			  UNSPEC_XXSPLTIDP))]
   "TARGET_POWER10"
   "xxspltidp %x0,%1"
   [(set_attr "type" "vecperm")
    (set_attr "prefixed" "yes")])
 
+;; Generate the XXSPLTIDP instruction to support SFmode and DFmode scalar
+;; constants and V2DF vector constants where both elements are the same.  The
+;; constant has to be expressible as a SFmode constant that is not a SFmode
+;; denormal value.
+(define_insn_and_split "*xxspltidp_<mode>_internal"
+  [(set (match_operand:XXSPLTIDP 0 "vsx_register_operand" "=wa")
+	(match_operand:XXSPLTIDP 1 "xxspltidp_operand"     "eF"))]
+  "TARGET_POWER10"
+  "#"
+  "&& 1"
+  [(set (match_operand:XXSPLTIDP 0 "vsx_register_operand")
+	(unspec:XXSPLTIDP [(match_dup 2)] UNSPEC_XXSPLTIDP))]
+{
+  HOST_WIDE_INT value = 0;
+
+  if (!xxspltidp_constant_p (operands[1], <MODE>mode, &value))
+    gcc_unreachable ();
+
+  operands[2] = GEN_INT (value);
+}
+ [(set_attr "type" "vecperm")
+  (set_attr "prefixed" "yes")])
+
 ;; XXSPLTI32DX built-in function support
 (define_expand "xxsplti32dx_v4si"
   [(set (match_operand:V4SI 0 "register_operand" "=wa")
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
new file mode 100644
index 00000000000..8f6e176f9af
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+double
+scalar_double_0 (void)
+{
+  return 0.0;			/* XXSPLTIB or XXLXOR.  */
+}
+
+double
+scalar_double_1 (void)
+{
+  return 1.0;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+double
+scalar_double_m0 (void)
+{
+  return -0.0;			/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_nan (void)
+{
+  return __builtin_nan ("");	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_inf (void)
+{
+  return __builtin_inf ();	/* XXSPLTIDP.  */
+}
+
+double
+scalar_double_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inf ();
+}
+#endif
+
+double
+scalar_double_pi (void)
+{
+  return M_PI;			/* PLFD.  */
+}
+
+double
+scalar_double_denorm (void)
+{
+  return 0x1p-149f;		/* PLFD.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
new file mode 100644
index 00000000000..72504bdfbbd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
@@ -0,0 +1,60 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating SFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+float
+scalar_float_0 (void)
+{
+  return 0.0f;			/* XXSPLTIB or XXLXOR.  */
+}
+
+float
+scalar_float_1 (void)
+{
+  return 1.0f;			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+float
+scalar_float_m0 (void)
+{
+  return -0.0f;			/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_nan (void)
+{
+  return __builtin_nanf ("");	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_inf (void)
+{
+  return __builtin_inff ();	/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_m_inf (void)	/* XXSPLTIDP.  */
+{
+  return - __builtin_inff ();
+}
+#endif
+
+float
+scalar_float_pi (void)
+{
+  return (float)M_PI;		/* XXSPLTIDP.  */
+}
+
+float
+scalar_float_denorm (void)
+{
+  return 0x1p-149f;		/* PLFS.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
new file mode 100644
index 00000000000..82ffc86f8aa
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
@@ -0,0 +1,64 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target power10_ok } */
+/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
+
+#include <math.h>
+
+/* Test generating V2DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
+   instruction.  */
+
+vector double
+v2df_double_0 (void)
+{
+  return (vector double) { 0.0, 0.0 };			/* XXSPLTIB or XXLXOR.  */
+}
+
+vector double
+v2df_double_1 (void)
+{
+  return (vector double) { 1.0, 1.0 };			/* XXSPLTIDP.  */
+}
+
+#ifndef __FAST_MATH__
+vector double
+v2df_double_m0 (void)
+{
+  return (vector double) { -0.0, -0.0 };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_nan (void)
+{
+  return (vector double) { __builtin_nan (""),
+			   __builtin_nan ("") };	/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_inf (void)
+{
+  return (vector double) { __builtin_inf (),
+			   __builtin_inf () };		/* XXSPLTIDP.  */
+}
+
+vector double
+v2df_double_m_inf (void)
+{
+  return (vector double) { - __builtin_inf (),
+			   - __builtin_inf () };	/* XXSPLTIDP.  */
+}
+#endif
+
+vector double
+v2df_double_pi (void)
+{
+  return (vector double) { M_PI, M_PI };		/* PLVX.  */
+}
+
+vector double
+v2df_double_denorm (void)
+{
+  return (vector double) { (double)0x1p-149f,
+			   (double)0x1p-149f };		/* PLVX.  */
+}
+
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
-- 
2.31.1


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Generate XXSPLTIDP on power10.
  2021-08-25 19:46 [PATCH] Generate XXSPLTIDP on power10 Michael Meissner
@ 2021-08-26 19:17 ` will schmidt
  2021-08-26 21:28   ` Michael Meissner
  2021-08-31 14:50 ` Ping: " Michael Meissner
  2021-08-31 23:41 ` Segher Boessenkool
  2 siblings, 1 reply; 10+ messages in thread
From: will schmidt @ 2021-08-26 19:17 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner

On Wed, 2021-08-25 at 15:46 -0400, Michael Meissner wrote:
> Generate XXSPLTIDP on power10.
> 
> This patch implements XXSPLTIDP support for SF and DF scalar constants and V2DF
> vector constants.  The XXSPLTIDP instruction is given a 32-bit immediate that
> is converted to a vector of two DFmode constants.  The immediate is in SFmode
> format, so only constants that fit as SFmode values can be loaded with
> XXSPLTIDP.

ok

> 
> I added a new constraint (eF) to match constants that can be loaded with the
> XXSPLTIDP instruction.

> 
> I have added a temporary switch (-mxxspltidp) to control whether or not the
> XXSPLTIDP instruction is generated.

How temporary?  

> 
> I added 3 new tests to test loading up SF/DF scalar and V2DF vector
> constants.
> 
> I have tested this with bootstrap compilers on power10 systems and there was no
> regression.  I have built GCC with these patches on little endian power9 and
> big endian power8 systems, and there were no regressions.
> 
> In addition, I have built and run the full Spec 2017 rate suite, comparing with
> the patches enabled and not enabled.  There were roughly 66,000 XXSPLTIDP's
> generated in the rate build for Spec 2017.  On a stand-alone system that is
> running single threaded, blender_r has a 1.9% increase in performance, and rest
> of the benchmarks are performance neutral.  However, I would expect that in a
> real world scenario, switching to use XXSPLTIDP will increase performance due
> to removing all of the loads.

ok

> 
> Can I check this into the master branch?
> 
> 2021-08-25  Michael Meissner  <meissner@linux.ibm.com>
> 
> gcc/
> 	* config/rs6000/constraints.md (eF): New constraint.
> 	* config/rs6000/predicates.md (easy_fp_constant): If we can load
> 	the scalar constant with XXSPLTIDP, the floating point constant is
> 	easy.

Could be shortened to something like ? 
  Add clause to accept xxspltidp_operand as easy.

> 	(xxspltidp_operand): New predicate.

Will there ever be another instruction using the SF/DF CONST_DOUBLE  or
V2DF CONST_VECTOR ?   I tentatively question the name of the operand,
but defer.. 

> 	(easy_vector_constant): If we can generate XXSPLTIDP, mark the
> 	vector constant as easy.

Duplicated from above.

> 	* config/rs6000/rs6000-protos.h (xxspltidp_constant_p): New
> 	declaration.
> 	(prefixed_permute_p): Likewise.


> 	* config/rs6000/rs6000.c (xxspltidp_constant_p): New function.
> 	(output_vec_const_move): Add support for XXSPLTIDP.
> 	(prefixed_permute_p): New function.

Duplicated.

> 	* config/rs6000/rs6000.md (prefixed attribute): Add support for
> 	permute prefixed instructions.
> 	(movsf_hardfloat): Add XXSPLTIDP support.
> 	(mov<mode>_hardfloat32, FMOVE64 iterator): Likewise.
> 	(mov<mode>_hardfloat64, FMOVE64 iterator): Likewise.
> 	* config/rs6000/rs6000.opt (-mxxspltidp): New switch.
> 	* config/rs6000/vsx.md (vsx_move<mode>_64bit): Add XXSPLTIDP
> 	support.
> 	(vsx_move<mode>_32bit): Likewise.

No e in mov (per patch contents below).

> 	(vsx_splat_v2df_xxspltidp): New insn.
> 	(XXSPLTIDP): New mode iterator.
> 	(xxspltidp_<mode>_internal): New insn and splits.
> 	(xxspltidp_<mode>_inst): Replace xxspltidp_v2df_inst with an
> 	iterated form that also does SFmode, and DFmode.
Swap "an iterated form" with "xxspltidp_<mode>_inst  ?




> 
> gcc/testsuite/
> 	* gcc.target/powerpc/vec-splat-constant-sf.c: New test.
> 	* gcc.target/powerpc/vec-splat-constant-df.c: New test.
> 	* gcc.target/powerpc/vec-splat-constant-v2df.c: New test.
> ---
>  gcc/config/rs6000/constraints.md              |   5 +
>  gcc/config/rs6000/predicates.md               |  17 +++
>  gcc/config/rs6000/rs6000-protos.h             |   2 +
>  gcc/config/rs6000/rs6000.c                    | 106 ++++++++++++++++++
>  gcc/config/rs6000/rs6000.md                   |  45 +++++---
>  gcc/config/rs6000/rs6000.opt                  |   4 +
>  gcc/config/rs6000/vsx.md                      |  64 ++++++++++-
>  .../powerpc/vec-splat-constant-df.c           |  60 ++++++++++
>  .../powerpc/vec-splat-constant-sf.c           |  60 ++++++++++
>  .../powerpc/vec-splat-constant-v2df.c         |  64 +++++++++++
>  10 files changed, 405 insertions(+), 22 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
> 
> diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
> index c8cff1a3038..ea2e4a267c3 100644
> --- a/gcc/config/rs6000/constraints.md
> +++ b/gcc/config/rs6000/constraints.md
> @@ -208,6 +208,11 @@ (define_constraint "P"
>    (and (match_code "const_int")
>         (match_test "((- (unsigned HOST_WIDE_INT) ival) + 0x8000) < 0x10000")))
> 
> +;; SF/DF/V2DF scalar or vector constant that can be loaded with XXSPLTIDP
> +(define_constraint "eF"
> +  "A vector constant that can be loaded with the XXSPLTIDP instruction."
> +  (match_operand 0 "xxspltidp_operand"))
> +
>  ;; 34-bit signed integer constant
>  (define_constraint "eI"
>    "A signed 34-bit integer constant if prefixed instructions are supported."
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index 956e42bc514..134243e404b 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -601,6 +601,11 @@ (define_predicate "easy_fp_constant"
>    if (TARGET_VSX && op == CONST0_RTX (mode))
>      return 1;
> 
> +  /* If we have the ISA 3.1 XXSPLTIDP instruction, see if the constant can
> +     be loaded with that instruction.  */
> +  if (xxspltidp_operand (op, mode))
> +    return 1;
> +
>    /* Otherwise consider floating point constants hard, so that the
>       constant gets pushed to memory during the early RTL phases.  This
>       has the advantage that double precision constants that can be
> @@ -640,6 +645,15 @@ (define_predicate "xxspltib_constant_nosplit"
>    return num_insns == 1;
>  })
> 
> +;; Return 1 if operand is a SF/DF CONST_DOUBLE or V2DF CONST_VECTOR that can be
> +;; loaded via the ISA 3.1 XXSPLTIDP instruction.

"Return 1 if" doesnt seem right given the return statement here.

> +(define_predicate "xxspltidp_operand"
> +  (match_code "const_double,const_vector,vec_duplicate")
> +{
> +  HOST_WIDE_INT value = 0;
> +  return xxspltidp_constant_p (op, mode, &value);
> +})
> +
>  ;; Return 1 if the operand is a CONST_VECTOR and can be loaded into a
>  ;; vector register without using memory.
>  (define_predicate "easy_vector_constant"
> @@ -653,6 +667,9 @@ (define_predicate "easy_vector_constant"
>        if (zero_constant (op, mode) || all_ones_constant (op, mode))
>  	return true;
> 
> +      if (xxspltidp_operand (op, mode))
> +	return true;
> +
>        if (TARGET_P9_VECTOR
>            && xxspltib_constant_p (op, mode, &num_insns, &value))
>  	return true;
> diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
> index 14f6b313105..9bba57c22f2 100644
> --- a/gcc/config/rs6000/rs6000-protos.h
> +++ b/gcc/config/rs6000/rs6000-protos.h
> @@ -32,6 +32,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, int, int, int,
> 
>  extern int easy_altivec_constant (rtx, machine_mode);
>  extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
> +extern bool xxspltidp_constant_p (rtx, machine_mode, HOST_WIDE_INT *);
>  extern int vspltis_shifted (rtx);
>  extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
>  extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
> @@ -198,6 +199,7 @@ enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode);
>  extern bool prefixed_load_p (rtx_insn *);
>  extern bool prefixed_store_p (rtx_insn *);
>  extern bool prefixed_paddi_p (rtx_insn *);
> +extern bool prefixed_permute_p (rtx_insn *);
>  extern void rs6000_asm_output_opcode (FILE *);
>  extern void output_pcrel_opt_reloc (rtx);
>  extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int);
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index e073b26b430..322b3c83925 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -6533,6 +6533,74 @@ xxspltib_constant_p (rtx op,
>    return true;
>  }
> 
> +/* Return true if OP is of the given MODE and can be synthesized with ISA 3.1
> +   XXSPLTIDP instruction.
> +
> +   Return the constant that is being split via CONSTANT_PTR to use in the
> +   XXSPLTIDP instruction.  */

Appears to return true or false.  Is the "Return the constant" comment
meant to go on the predicate definition earlier?

> +
> +bool
> +xxspltidp_constant_p (rtx op,
> +		      machine_mode mode,
> +		      HOST_WIDE_INT *constant_ptr)
> +{
> +  *constant_ptr = 0;
> +
> +  if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX)
> +    return false;
> +
> +  if (mode == VOIDmode)
> +    mode = GET_MODE (op);
> +
> +  rtx element = op;
> +  if (mode == V2DFmode)
> +    {
> +      if (CONST_VECTOR_P (op))
> +	{
> +	  element = CONST_VECTOR_ELT (op, 0);
> +	  if (!rtx_equal_p (element, CONST_VECTOR_ELT (op, 1)))
> +	    return false;
> +	}
> +
> +      else if (GET_CODE (op) == VEC_DUPLICATE)
> +	element = XEXP (op, 0);
> +
> +      else
> +	return false;
> +
> +      mode = DFmode;
> +    }
> +
> +  if (mode != SFmode && mode != DFmode)
> +    return false;
> +
> +  if (GET_MODE (element) != mode)
> +    return false;
> +
> +  if (!CONST_DOUBLE_P (element))
> +    return false;
> +
> +  /* Don't return true for 0.0 since that is easy to create without
> +     XXSPLTIDP.  */
> +  if (element == CONST0_RTX (mode))
> +    return false;
> +
> +  /* If the value doesn't fit in a SFmode, exactly, we can't use XXSPLTIDP.  */
> +  const struct real_value *rv = CONST_DOUBLE_REAL_VALUE (element);
> +  if (!exact_real_truncate (SFmode, rv))
> +    return false;

The 'exactly' caught my eye.  Per a glance at comments in
extract_real_truncate this indicates that the value is identical after
conversion to the new format.   Ok.


> +
> +  long value;
> +  REAL_VALUE_TO_TARGET_SINGLE (*rv, value);
> +
> +  /* Test for SFmode denormal (exponent is 0, mantissa field is non-zero).  */
> +  if (((value & 0x7F800000) == 0) && ((value & 0x7FFFFF) != 0))
> +    return false;
> +
> +  *constant_ptr = value;
> +  return true;
> +}

ok


> +
>  const char *
>  output_vec_const_move (rtx *operands)
>  {
> @@ -6548,6 +6616,7 @@ output_vec_const_move (rtx *operands)
>      {
>        bool dest_vmx_p = ALTIVEC_REGNO_P (REGNO (dest));
>        int xxspltib_value = 256;
> +      HOST_WIDE_INT xxspltidp_value = 0;
>        int num_insns = -1;
> 
>        if (zero_constant (vec, mode))
> @@ -6577,6 +6646,12 @@ output_vec_const_move (rtx *operands)
>  	    gcc_unreachable ();
>  	}
> 
> +      if (xxspltidp_constant_p (vec, mode, &xxspltidp_value))
> +	{
> +	  operands[2] = GEN_INT (xxspltidp_value);
> +	  return "xxspltidp %x0,%2";
> +	}
> +
>        if (TARGET_P9_VECTOR
>  	  && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
>  	{

ok

> @@ -26219,6 +26294,37 @@ prefixed_paddi_p (rtx_insn *insn)
>    return (iform == INSN_FORM_PCREL_EXTERNAL || iform == INSN_FORM_PCREL_LOCAL);
>  }
> 
> +/* Whether a permute type instruction is a prefixed instruction.  This is
> +   called from the prefixed attribute processing.  */
> +
> +bool
> +prefixed_permute_p (rtx_insn *insn)
> +{
> +  rtx set = single_set (insn);
> +  if (!set)
> +    return false;
> +
> +  rtx dest = SET_DEST (set);
> +  rtx src = SET_SRC (set);
> +  machine_mode mode = GET_MODE (dest);
> +
> +  if (!REG_P (dest) && !SUBREG_P (dest))
> +    return false;
> +
> +  switch (mode)
> +    {
> +    case DFmode:
> +    case SFmode:
> +    case V2DFmode:
> +      return xxspltidp_operand (src, mode);
> +
> +    default:
> +      break;
> +    }
> +
> +  return false;
> +}
> +
ok


>  /* Whether the next instruction needs a 'p' prefix issued before the
>     instruction is printed out.  */
>  static bool prepend_p_to_next_insn;
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index a84438f8545..bf3bfed3b88 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -314,6 +314,11 @@ (define_attr "prefixed" "no,yes"
> 
>  	 (eq_attr "type" "integer,add")
>  	 (if_then_else (match_test "prefixed_paddi_p (insn)")
> +		       (const_string "yes")
> +		       (const_string "no"))
> +
> +	 (eq_attr "type" "vecperm")
> +	 (if_then_else (match_test "prefixed_permute_p (insn)")
>  		       (const_string "yes")
>  		       (const_string "no"))]
> 
> @@ -7723,17 +7728,17 @@ (define_split
>  ;;
>  ;;	LWZ          LFS        LXSSP       LXSSPX     STFS       STXSSP
>  ;;	STXSSPX      STW        XXLXOR      LI         FMR        XSCPSGNDP
> -;;	MR           MT<x>      MF<x>       NOP
> +;;	MR           MT<x>      MF<x>       NOP        XXSPLTIDP
> 
>  (define_insn "movsf_hardfloat"
>    [(set (match_operand:SF 0 "nonimmediate_operand"
>  	 "=!r,       f,         v,          wa,        m,         wY,
>  	  Z,         m,         wa,         !r,        f,         wa,
> -	  !r,        *c*l,      !r,         *h")
> +	  !r,        *c*l,      !r,         *h,        wa")
>  	(match_operand:SF 1 "input_operand"
>  	 "m,         m,         wY,         Z,         f,         v,
>  	  wa,        r,         j,          j,         f,         wa,
> -	  r,         r,         *h,         0"))]
> +	  r,         r,         *h,         0,         eF"))]
>    "(register_operand (operands[0], SFmode)
>     || register_operand (operands[1], SFmode))
>     && TARGET_HARD_FLOAT
> @@ -7755,15 +7760,16 @@ (define_insn "movsf_hardfloat"
>     mr %0,%1
>     mt%0 %1
>     mf%1 %0
> -   nop"
> +   nop
> +   #"
>    [(set_attr "type"
>  	"load,       fpload,    fpload,     fpload,    fpstore,   fpstore,
>  	 fpstore,    store,     veclogical, integer,   fpsimple,  fpsimple,
> -	 *,          mtjmpr,    mfjmpr,     *")
> +	 *,          mtjmpr,    mfjmpr,     *,         vecperm")
>     (set_attr "isa"
>  	"*,          *,         p9v,        p8v,       *,         p9v,
>  	 p8v,        *,         *,          *,         *,         *,
> -	 *,          *,         *,          *")])
> +	 *,          *,         *,          *,         p10")])

OK, i think.   The addition of vecperm for type and p10 for the isa
entries catch my eye, but I expect this is obvious to others.  

> 
>  ;;	LWZ          LFIWZX     STW        STFIWX     MTVSRWZ    MFVSRWZ
>  ;;	FMR          MR         MT%0       MF%1       NOP
> @@ -8023,18 +8029,18 @@ (define_split
> 
>  ;;           STFD         LFD         FMR         LXSD        STXSD
>  ;;           LXSD         STXSD       XXLOR       XXLXOR      GPR<-0
> -;;           LWZ          STW         MR
> +;;           LWZ          STW         MR          XXSPLTIDP
> 
> 
>  (define_insn "*mov<mode>_hardfloat32"
>    [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
>              "=m,          d,          d,          <f64_p9>,   wY,
>                <f64_av>,   Z,          <f64_vsx>,  <f64_vsx>,  !r,
> -              Y,          r,          !r")
> +              Y,          r,          !r,         wa")
>  	(match_operand:FMOVE64 1 "input_operand"
>               "d,          m,          d,          wY,         <f64_p9>,
>                Z,          <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
> -              r,          Y,          r"))]
> +              r,          Y,          r,          eF"))]
>    "! TARGET_POWERPC64 && TARGET_HARD_FLOAT
>     && (gpc_reg_operand (operands[0], <MODE>mode)
>         || gpc_reg_operand (operands[1], <MODE>mode))"
> @@ -8051,20 +8057,21 @@ (define_insn "*mov<mode>_hardfloat32"
>     #
>     #
>     #
> +   #
>     #"
>    [(set_attr "type"
>              "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
>               fpload,      fpstore,    veclogical, veclogical, two,
> -             store,       load,       two")
> +             store,       load,       two,        vecperm")
>     (set_attr "size" "64")
>     (set_attr "length"
>              "*,           *,          *,          *,          *,
>               *,           *,          *,          *,          8,
> -             8,           8,          8")
> +             8,           8,          8,          *")
>     (set_attr "isa"
>              "*,           *,          *,          p9v,        p9v,
>               p7v,         p7v,        *,          *,          *,
> -             *,           *,          *")])
> +             *,           *,          *,          p10")])
> 
>  ;;           STW      LWZ     MR      G-const H-const F-const
> 
> @@ -8091,19 +8098,19 @@ (define_insn "*mov<mode>_softfloat32"
>  ;;           STFD         LFD         FMR         LXSD        STXSD
>  ;;           LXSDX        STXSDX      XXLOR       XXLXOR      LI 0
>  ;;           STD          LD          MR          MT{CTR,LR}  MF{CTR,LR}
> -;;           NOP          MFVSRD      MTVSRD
> +;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP
> 
>  (define_insn "*mov<mode>_hardfloat64"
>    [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
>             "=m,           d,          d,          <f64_p9>,   wY,
>               <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
>               YZ,          r,          !r,         *c*l,       !r,
> -            *h,           r,          <f64_dm>")
> +            *h,           r,          <f64_dm>,   wa")
>  	(match_operand:FMOVE64 1 "input_operand"
>              "d,           m,          d,          wY,         <f64_p9>,
>               Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
>               r,           YZ,         r,          r,          *h,
> -             0,           <f64_dm>,   r"))]
> +             0,           <f64_dm>,   r,          eF"))]
>    "TARGET_POWERPC64 && TARGET_HARD_FLOAT
>     && (gpc_reg_operand (operands[0], <MODE>mode)
>         || gpc_reg_operand (operands[1], <MODE>mode))"
> @@ -8125,18 +8132,19 @@ (define_insn "*mov<mode>_hardfloat64"
>     mf%1 %0
>     nop
>     mfvsrd %0,%x1
> -   mtvsrd %x0,%1"
> +   mtvsrd %x0,%1
> +   #"
>    [(set_attr "type"
>              "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
>               fpload,      fpstore,    veclogical, veclogical, integer,
>               store,       load,       *,          mtjmpr,     mfjmpr,
> -             *,           mfvsr,      mtvsr")
> +             *,           mfvsr,      mtvsr,      vecperm")
>     (set_attr "size" "64")
>     (set_attr "isa"
>              "*,           *,          *,          p9v,        p9v,
>               p7v,         p7v,        *,          *,          *,
>               *,           *,          *,          *,          *,
> -             *,           p8v,        p8v")])
> +             *,           p8v,        p8v,        p10")])
> 
>  ;;           STD      LD       MR      MT<SPR> MF<SPR> G-const
>  ;;           H-const  F-const  Special


Ok.

> @@ -8170,6 +8178,7 @@ (define_insn "*mov<mode>_softfloat64"
>     (set_attr "length"
>              "*,       *,      *,      *,      *,      8,
>               12,      16,     *")])
> +
>  

Unnecessarily blank line?


>  (define_expand "mov<mode>"
>    [(set (match_operand:FMOVE128 0 "general_operand")
> diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
> index 0538db387dc..928c4fafe07 100644
> --- a/gcc/config/rs6000/rs6000.opt
> +++ b/gcc/config/rs6000/rs6000.opt
> @@ -639,3 +639,7 @@ Enable instructions that guard against return-oriented programming attacks.
>  mprivileged
>  Target Var(rs6000_privileged) Init(0)
>  Generate code that will run in privileged state.
> +
> +mxxspltidp
> +Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
> +Generate (do not generate) XXSPLTIDP instructions.


Ok.


> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index bf033e31c1c..af9a04870d4 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -1191,16 +1191,19 @@ (define_insn_and_split "*xxspltib_<mode>_split"
>  ;; instruction). But generate XXLXOR/XXLORC if it will avoid a register move.
> 
>  ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ (GPR)
> +;;		XXSPLTIDP
>  ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    VSPLTISW
>  ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
>  (define_insn "vsx_mov<mode>_64bit"
>    [(set (match_operand:VSX_M 0 "nonimmediate_operand"
>                 "=ZwO,      wa,        wa,        r,         we,        ?wQ,
> +                wa,
>                  ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
>                  ?wa,       v,         <??r>,     wZ,        v")
> 
>  	(match_operand:VSX_M 1 "input_operand" 
>                 "wa,        ZwO,       wa,        we,        r,         r,
> +                eF,
>                  wQ,        Y,         r,         r,         wE,        jwM,
>                  ?jwM,      W,         <nW>,      v,         wZ"))]
> 
> @@ -1212,36 +1215,44 @@ (define_insn "vsx_mov<mode>_64bit"
>  }
>    [(set_attr "type"
>                 "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
> +                vecperm,
>                  store,     load,      store,     *,         vecsimple, vecsimple,
>                  vecsimple, *,         *,         vecstore,  vecload")
>     (set_attr "num_insns"
>                 "*,         *,         *,         2,         *,         2,
> +                *,
>                  2,         2,         2,         2,         *,         *,
>                  *,         5,         2,         *,         *")
>     (set_attr "max_prefixed_insns"
>                 "*,         *,         *,         *,         *,         2,
> +                *,
>                  2,         2,         2,         2,         *,         *,
>                  *,         *,         *,         *,         *")
>     (set_attr "length"
>                 "*,         *,         *,         8,         *,         8,
> +                *,
>                  8,         8,         8,         8,         *,         *,
>                  *,         20,        8,         *,         *")
>     (set_attr "isa"
>                 "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
> +                p10,
>                  *,         *,         *,         *,         p9v,       *,
>                  <VSisa>,   *,         *,         *,         *")])
> 
>  ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR move
> +;;		XXSPLTIDP
>  ;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
>  ;;              LVX (VMX)  STVX (VMX)
>  (define_insn "*vsx_mov<mode>_32bit"
>    [(set (match_operand:VSX_M 0 "nonimmediate_operand"
>                 "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
> +                wa,
>                  wa,        v,         ?wa,       v,         <??r>,
>                  wZ,        v")
> 
>  	(match_operand:VSX_M 1 "input_operand" 
>                 "wa,        ZwO,       wa,        Y,         r,         r,
> +                eF,
>                  wE,        jwM,       ?jwM,      W,         <nW>,
>                  v,         wZ"))]
> 
> @@ -1253,14 +1264,17 @@ (define_insn "*vsx_mov<mode>_32bit"
>  }
>    [(set_attr "type"
>                 "vecstore,  vecload,   vecsimple, load,      store,    *,
> +                vecperm,
>                  vecsimple, vecsimple, vecsimple, *,         *,
>                  vecstore,  vecload")
>     (set_attr "length"
>                 "*,         *,         *,         16,        16,        16,
> +                *,
>                  *,         *,         *,         20,        16,
>                  *,         *")
>     (set_attr "isa"
>                 "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
> +                p10,
>                  p9v,       *,         <VSisa>,   *,         *,
>                  *,         *")])
> 
ok

> @@ -4580,6 +4594,23 @@ (define_insn "vsx_splat_<mode>_reg"
>     mtvsrdd %x0,%1,%1"
>    [(set_attr "type" "vecperm,vecmove")])
> 
> +(define_insn "*vsx_splat_v2df_xxspltidp"
> +  [(set (match_operand:V2DF 0 "vsx_register_operand" "=wa")
> +	(vec_duplicate:V2DF
> +	 (match_operand:DF 1 "xxspltidp_operand" "eF")))]
> +  "TARGET_POWER10"
> +{
> +  HOST_WIDE_INT value;
> +
> +  if (!xxspltidp_constant_p (operands[1], DFmode, &value))
> +    gcc_unreachable ();
> +
> +  operands[2] = GEN_INT (value);
> +  return "xxspltidp %x0,%1";
> +}
> +  [(set_attr "type" "vecperm")
> +   (set_attr "prefixed" "yes")])
> +
>  (define_insn "vsx_splat_<mode>_mem"
>    [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
>  	(vec_duplicate:VSX_D
> @@ -6449,15 +6480,40 @@ (define_expand "xxspltidp_v2df"
>    DONE;
>  })
> 
> -(define_insn "xxspltidp_v2df_inst"
> -  [(set (match_operand:V2DF 0 "register_operand" "=wa")
> -	(unspec:V2DF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
> -		     UNSPEC_XXSPLTIDP))]
> +(define_mode_iterator XXSPLTIDP [SF DF V2DF])
> +
> +(define_insn "xxspltidp_<mode>_inst"
> +  [(set (match_operand:XXSPLTIDP 0 "register_operand" "=wa")
> +	(unspec:XXSPLTIDP [(match_operand:SI 1 "c32bit_cint_operand" "n")]
> +			  UNSPEC_XXSPLTIDP))]
>    "TARGET_POWER10"
>    "xxspltidp %x0,%1"
>    [(set_attr "type" "vecperm")
>     (set_attr "prefixed" "yes")])
> 
> +;; Generate the XXSPLTIDP instruction to support SFmode and DFmode scalar
> +;; constants and V2DF vector constants where both elements are the same.  The
> +;; constant has to be expressible as a SFmode constant that is not a SFmode
> +;; denormal value.
> +(define_insn_and_split "*xxspltidp_<mode>_internal"
> +  [(set (match_operand:XXSPLTIDP 0 "vsx_register_operand" "=wa")
> +	(match_operand:XXSPLTIDP 1 "xxspltidp_operand"     "eF"))]

Extra spaces there.


> +  "TARGET_POWER10"
> +  "#"
> +  "&& 1"
> +  [(set (match_operand:XXSPLTIDP 0 "vsx_register_operand")
> +	(unspec:XXSPLTIDP [(match_dup 2)] UNSPEC_XXSPLTIDP))]
> +{
> +  HOST_WIDE_INT value = 0;
> +
> +  if (!xxspltidp_constant_p (operands[1], <MODE>mode, &value))
> +    gcc_unreachable ();
> +
> +  operands[2] = GEN_INT (value);
> +}
> + [(set_attr "type" "vecperm")
> +  (set_attr "prefixed" "yes")])
> +
>  ;; XXSPLTI32DX built-in function support
>  (define_expand "xxsplti32dx_v4si"
>    [(set (match_operand:V4SI 0 "register_operand" "=wa")

ok


Just briefly looed at testcases.. nothing jumped out at me below.
Thanks
-Will

> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
> new file mode 100644
> index 00000000000..8f6e176f9af
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
> @@ -0,0 +1,60 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
> +
> +#include <math.h>
> +
> +/* Test generating DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
> +   instruction.  */
> +
> +double
> +scalar_double_0 (void)
> +{
> +  return 0.0;			/* XXSPLTIB or XXLXOR.  */
> +}
> +
> +double
> +scalar_double_1 (void)
> +{
> +  return 1.0;			/* XXSPLTIDP.  */
> +}
> +
> +#ifndef __FAST_MATH__
> +double
> +scalar_double_m0 (void)
> +{
> +  return -0.0;			/* XXSPLTIDP.  */
> +}
> +
> +double
> +scalar_double_nan (void)
> +{
> +  return __builtin_nan ("");	/* XXSPLTIDP.  */
> +}
> +
> +double
> +scalar_double_inf (void)
> +{
> +  return __builtin_inf ();	/* XXSPLTIDP.  */
> +}
> +
> +double
> +scalar_double_m_inf (void)	/* XXSPLTIDP.  */
> +{
> +  return - __builtin_inf ();
> +}
> +#endif
> +
> +double
> +scalar_double_pi (void)
> +{
> +  return M_PI;			/* PLFD.  */
> +}
> +
> +double
> +scalar_double_denorm (void)
> +{
> +  return 0x1p-149f;		/* PLFD.  */
> +}
> +
> +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
> new file mode 100644
> index 00000000000..72504bdfbbd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
> @@ -0,0 +1,60 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
> +
> +#include <math.h>
> +
> +/* Test generating SFmode constants with the ISA 3.1 (power10) XXSPLTIDP
> +   instruction.  */
> +
> +float
> +scalar_float_0 (void)
> +{
> +  return 0.0f;			/* XXSPLTIB or XXLXOR.  */
> +}
> +
> +float
> +scalar_float_1 (void)
> +{
> +  return 1.0f;			/* XXSPLTIDP.  */
> +}
> +
> +#ifndef __FAST_MATH__
> +float
> +scalar_float_m0 (void)
> +{
> +  return -0.0f;			/* XXSPLTIDP.  */
> +}
> +
> +float
> +scalar_float_nan (void)
> +{
> +  return __builtin_nanf ("");	/* XXSPLTIDP.  */
> +}
> +
> +float
> +scalar_float_inf (void)
> +{
> +  return __builtin_inff ();	/* XXSPLTIDP.  */
> +}
> +
> +float
> +scalar_float_m_inf (void)	/* XXSPLTIDP.  */
> +{
> +  return - __builtin_inff ();
> +}
> +#endif
> +
> +float
> +scalar_float_pi (void)
> +{
> +  return (float)M_PI;		/* XXSPLTIDP.  */
> +}
> +
> +float
> +scalar_float_denorm (void)
> +{
> +  return 0x1p-149f;		/* PLFS.  */
> +}
> +
> +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
> new file mode 100644
> index 00000000000..82ffc86f8aa
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
> @@ -0,0 +1,64 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
> +
> +#include <math.h>
> +
> +/* Test generating V2DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
> +   instruction.  */
> +
> +vector double
> +v2df_double_0 (void)
> +{
> +  return (vector double) { 0.0, 0.0 };			/* XXSPLTIB or XXLXOR.  */
> +}
> +
> +vector double
> +v2df_double_1 (void)
> +{
> +  return (vector double) { 1.0, 1.0 };			/* XXSPLTIDP.  */
> +}
> +
> +#ifndef __FAST_MATH__
> +vector double
> +v2df_double_m0 (void)
> +{
> +  return (vector double) { -0.0, -0.0 };		/* XXSPLTIDP.  */
> +}
> +
> +vector double
> +v2df_double_nan (void)
> +{
> +  return (vector double) { __builtin_nan (""),
> +			   __builtin_nan ("") };	/* XXSPLTIDP.  */
> +}
> +
> +vector double
> +v2df_double_inf (void)
> +{
> +  return (vector double) { __builtin_inf (),
> +			   __builtin_inf () };		/* XXSPLTIDP.  */
> +}
> +
> +vector double
> +v2df_double_m_inf (void)
> +{
> +  return (vector double) { - __builtin_inf (),
> +			   - __builtin_inf () };	/* XXSPLTIDP.  */
> +}
> +#endif
> +
> +vector double
> +v2df_double_pi (void)
> +{
> +  return (vector double) { M_PI, M_PI };		/* PLVX.  */
> +}
> +
> +vector double
> +v2df_double_denorm (void)
> +{
> +  return (vector double) { (double)0x1p-149f,
> +			   (double)0x1p-149f };		/* PLVX.  */
> +}
> +
> +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
> -- 
> 2.31.1
> 
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Generate XXSPLTIDP on power10.
  2021-08-26 19:17 ` will schmidt
@ 2021-08-26 21:28   ` Michael Meissner
  2021-08-27  2:21     ` Michael Meissner
  2021-08-31 22:52     ` Segher Boessenkool
  0 siblings, 2 replies; 10+ messages in thread
From: Michael Meissner @ 2021-08-26 21:28 UTC (permalink / raw)
  To: will schmidt
  Cc: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner

On Thu, Aug 26, 2021 at 02:17:57PM -0500, will schmidt wrote:
> On Wed, 2021-08-25 at 15:46 -0400, Michael Meissner wrote:
> > Generate XXSPLTIDP on power10.
> > 
> > I have added a temporary switch (-mxxspltidp) to control whether or not the
> > XXSPLTIDP instruction is generated.
> 
> How temporary?  

Until we decide we no longer need to disable the option to do tests.  Probably
at the end of stage1.

> > 
> > I added 3 new tests to test loading up SF/DF scalar and V2DF vector
> > constants.

...

> > gcc/
> > 	* config/rs6000/constraints.md (eF): New constraint.
> > 	* config/rs6000/predicates.md (easy_fp_constant): If we can load
> > 	the scalar constant with XXSPLTIDP, the floating point constant is
> > 	easy.
> 
> Could be shortened to something like ? 
>   Add clause to accept xxspltidp_operand as easy.

Sure.

> > 	(xxspltidp_operand): New predicate.
> 
> Will there ever be another instruction using the SF/DF CONST_DOUBLE  or
> V2DF CONST_VECTOR ?   I tentatively question the name of the operand,
> but defer..

This is the convention I've used for adding other instructions like xxspltib.
You use the name of the instruction followed by operand.  The predicate only
matches RTXes that we would use the XXSPLTIDP for.  And then for operands like
XXSPLTIDP and XXSPLTIB where you want additional information, there is the C++
function in rs6000.c that has the additional arguments.  In the case of
XXSPLTIDP, it returns the 32-bit immediate value that is used in the
instruction.  The predicate calls this internal function, adding the address of
internal variables that aren't used in this case.

This way we have just one place that centralizes the knowledge about the
instruction.  This means that the places that deal with decomposing the
XXSPLTIDP instruction don't have to do their own parsing.  And if we need to
add new enhancements or restrictions, there is only one place to go.

There is the XXSPLTI32DX instruction that can also load SFmode, DFmode, and
V2DFmode constants.  You use a pair of instructions to either fill in the top
32-bits or the bottom 32-bits of each 64-bit element.

I have patches for adding XXSPLTI32DX, but so far, I'm not sure whether it is a
win or not.  This has the xxsplti32dx_operand predicate and the
xxsplti32dx_constant_p internal function and a separate constraint ("eD") for
matching it.  If xxspltidp_constant_p returns false, then xxsplti32dx_operand
and xxsplti32dx_constant_p return false.  I.e. it only returns true if we
should use XXSPLTI32DX instead of some other instruction.

You can't combine XXSPLTIDP and XXSPLTI32DX because you have to set the various
attributes differently (XXSPLTIDP is a single prefixed instruction, while
before split, XXSPLTI32DX is two prefixed instructions).

There is another instruction (XXSPLTIW) that can be used for loading up certain
V16QImode, V8HImode, V4SImode, and V4SFmode constants.  I have patches for this
as well.  At the moment, there are 1-2 regressions if I enable XXSPLTIW, and
I'm trying to figure out how to improve them.  But it likely will be submitted
as a future patch also.  This would have the xxspltiw_operand predicate, "eW"
constraint and the xxspltiw_constant_p internal function .

> 
> > 	(easy_vector_constant): If we can generate XXSPLTIDP, mark the
> > 	vector constant as easy.
> 
> Duplicated from above.

Yep, both of the easy_*_functions need to call this.  The easy_fp_function is
for scalar floating point (i.e. SFmode and DFmode).  The easy_vector_function
is for vector (i.e. V2DFmode).

What the easy_{fp,vector}_constant functions are used for is to decide whether
we can load up the constant via insns that are created via define_splits or if
we need to push the constant to memory (because it isn't 'easy' to load up the
constants).  These two functions predate my current involvement with PowerPC on
GCC (i.e. 10 years or so).  I don't recall if they were used back in the 1990's
when I worked on PowerPC at Cygnus solutions.

> 
> > 	* config/rs6000/rs6000-protos.h (xxspltidp_constant_p): New
> > 	declaration.
> > 	(prefixed_permute_p): Likewise.
> 
> 
> > 	* config/rs6000/rs6000.c (xxspltidp_constant_p): New function.
> > 	(output_vec_const_move): Add support for XXSPLTIDP.
> > 	(prefixed_permute_p): New function.
> 
> Duplicated.

Yes.  You need the external declaration in rs6000-protos.h and the definition
in rs6000.c.  So you will have the duplication.

> > 	* config/rs6000/rs6000.md (prefixed attribute): Add support for
> > 	permute prefixed instructions.
> > 	(movsf_hardfloat): Add XXSPLTIDP support.
> > 	(mov<mode>_hardfloat32, FMOVE64 iterator): Likewise.
> > 	(mov<mode>_hardfloat64, FMOVE64 iterator): Likewise.
> > 	* config/rs6000/rs6000.opt (-mxxspltidp): New switch.
> > 	* config/rs6000/vsx.md (vsx_move<mode>_64bit): Add XXSPLTIDP
> > 	support.
> > 	(vsx_move<mode>_32bit): Likewise.
> 
> No e in mov (per patch contents below).

Thanks.

> > 	(vsx_splat_v2df_xxspltidp): New insn.
> > 	(XXSPLTIDP): New mode iterator.
> > 	(xxspltidp_<mode>_internal): New insn and splits.
> > 	(xxspltidp_<mode>_inst): Replace xxspltidp_v2df_inst with an
> > 	iterated form that also does SFmode, and DFmode.
> Swap "an iterated form" with "xxspltidp_<mode>_inst  ?

Ok.

> > @@ -8170,6 +8178,7 @@ (define_insn "*mov<mode>_softfloat64"
> >     (set_attr "length"
> >              "*,       *,      *,      *,      *,      8,
> >               12,      16,     *")])
> > +
> >  
> 
> Unnecessarily blank line?

Probably.

> >  (define_expand "mov<mode>"
> >    [(set (match_operand:FMOVE128 0 "general_operand")
> > diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
> > index 0538db387dc..928c4fafe07 100644
> > --- a/gcc/config/rs6000/rs6000.opt
> > +++ b/gcc/config/rs6000/rs6000.opt
> > @@ -639,3 +639,7 @@ Enable instructions that guard against return-oriented programming attacks.
> >  mprivileged
> >  Target Var(rs6000_privileged) Init(0)
> >  Generate code that will run in privileged state.
> > +
> > +mxxspltidp
> > +Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
> > +Generate (do not generate) XXSPLTIDP instructions.
> 
> 
> Ok.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Generate XXSPLTIDP on power10.
  2021-08-26 21:28   ` Michael Meissner
@ 2021-08-27  2:21     ` Michael Meissner
  2021-08-31 22:52     ` Segher Boessenkool
  1 sibling, 0 replies; 10+ messages in thread
From: Michael Meissner @ 2021-08-27  2:21 UTC (permalink / raw)
  To: Michael Meissner, will schmidt, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner

On Thu, Aug 26, 2021 at 05:28:42PM -0400, Michael Meissner wrote:
> I have patches for adding XXSPLTI32DX, but so far, I'm not sure whether it is a
> win or not.  This has the xxsplti32dx_operand predicate and the
> xxsplti32dx_constant_p internal function and a separate constraint ("eD") for
> matching it.  If xxspltidp_constant_p returns false, then xxsplti32dx_operand
> and xxsplti32dx_constant_p return false.  I.e. it only returns true if we
> should use XXSPLTI32DX instead of some other instruction.

Whoops in re-reading that paragraph, I mixed up the return value.  The sentence
should have read:

If xxspltidp_constant_p (and xxspltidp_operand) returns "true" then
xxsplti32dx_constant_p (and xxsplti32dx_operand) would return false.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Generate XXSPLTIDP on power10.
  2021-08-26 21:28   ` Michael Meissner
  2021-08-27  2:21     ` Michael Meissner
@ 2021-08-31 22:52     ` Segher Boessenkool
  2021-09-01 20:06       ` Michael Meissner
  1 sibling, 1 reply; 10+ messages in thread
From: Segher Boessenkool @ 2021-08-31 22:52 UTC (permalink / raw)
  To: Michael Meissner, will schmidt, gcc-patches, David Edelsohn,
	Bill Schmidt, Peter Bergner

Hi!

On Thu, Aug 26, 2021 at 05:28:42PM -0400, Michael Meissner wrote:
> On Thu, Aug 26, 2021 at 02:17:57PM -0500, will schmidt wrote:
> > On Wed, 2021-08-25 at 15:46 -0400, Michael Meissner wrote:
> > > Generate XXSPLTIDP on power10.
> > > 
> > > I have added a temporary switch (-mxxspltidp) to control whether or not the
> > > XXSPLTIDP instruction is generated.
> > 
> > How temporary?  
> 
> Until we decide we no longer need to disable the option to do tests.  Probably
> at the end of stage1.

Don't do it at all please.  If it is useful to disable some new strategy
for generating constants, a (temporary or at least undocumented) flag
for that can be handy.  But a flag to disable separate insns is a
liability, it makes the compiler much more fragile, makes changing the
compiler hard because of all the surprises hidden.

> > > 	(xxspltidp_operand): New predicate.
> > 
> > Will there ever be another instruction using the SF/DF CONST_DOUBLE  or
> > V2DF CONST_VECTOR ?   I tentatively question the name of the operand,
> > but defer..
> 
> This is the convention I've used for adding other instructions like xxspltib.

The only reason it is a good idea here is because of the strange
behaviour this insn has with single precision subnormals.  In general
a better name here would be something like "sf_as_int_operand".  The
insn should probably not allow anything else than bit patterns, not
floating point constants, have a separate pattern for that (that can
then forward to the integer one).

> This way we have just one place that centralizes the knowledge about the
> instruction.

That one place should be the define_insn for it.

Segher

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Generate XXSPLTIDP on power10.
  2021-08-31 22:52     ` Segher Boessenkool
@ 2021-09-01 20:06       ` Michael Meissner
  0 siblings, 0 replies; 10+ messages in thread
From: Michael Meissner @ 2021-09-01 20:06 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Michael Meissner, will schmidt, gcc-patches, David Edelsohn,
	Bill Schmidt, Peter Bergner

On Tue, Aug 31, 2021 at 05:52:48PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> On Thu, Aug 26, 2021 at 05:28:42PM -0400, Michael Meissner wrote:
> > On Thu, Aug 26, 2021 at 02:17:57PM -0500, will schmidt wrote:
> > > On Wed, 2021-08-25 at 15:46 -0400, Michael Meissner wrote:
> > > > Generate XXSPLTIDP on power10.
> > > > 
> > > > I have added a temporary switch (-mxxspltidp) to control whether or not the
> > > > XXSPLTIDP instruction is generated.
> > > 
> > > How temporary?  
> > 
> > Until we decide we no longer need to disable the option to do tests.  Probably
> > at the end of stage1.
> 
> Don't do it at all please.  If it is useful to disable some new strategy
> for generating constants, a (temporary or at least undocumented) flag
> for that can be handy.  But a flag to disable separate insns is a
> liability, it makes the compiler much more fragile, makes changing the
> compiler hard because of all the surprises hidden.
> 
> > > > 	(xxspltidp_operand): New predicate.
> > > 
> > > Will there ever be another instruction using the SF/DF CONST_DOUBLE  or
> > > V2DF CONST_VECTOR ?   I tentatively question the name of the operand,
> > > but defer..
> > 
> > This is the convention I've used for adding other instructions like xxspltib.
> 
> The only reason it is a good idea here is because of the strange
> behaviour this insn has with single precision subnormals.  In general
> a better name here would be something like "sf_as_int_operand".  The
> insn should probably not allow anything else than bit patterns, not
> floating point constants, have a separate pattern for that (that can
> then forward to the integer one).
> 
> > This way we have just one place that centralizes the knowledge about the
> > instruction.
> 
> That one place should be the define_insn for it.

But you need the predicate and constraint for this to work.  That is what I was
talking about.  The define_insn is simple with this current method.  Otherwise,
the define_insn has to reparse the RTL.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Ping: [PATCH] Generate XXSPLTIDP on power10.
  2021-08-25 19:46 [PATCH] Generate XXSPLTIDP on power10 Michael Meissner
  2021-08-26 19:17 ` will schmidt
@ 2021-08-31 14:50 ` Michael Meissner
  2021-08-31 23:41 ` Segher Boessenkool
  2 siblings, 0 replies; 10+ messages in thread
From: Michael Meissner @ 2021-08-31 14:50 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, Segher Boessenkool,
	David Edelsohn, Bill Schmidt, Peter Bergner, Will Schmidt

Ping patch.

| Date: Wed, 25 Aug 2021 15:46:43 -0400
| Subject: [PATCH] Generate XXSPLTIDP on power10.
| Message-ID: <YSaeI5BVUxofem9y@toto.the-meissners.org>

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Generate XXSPLTIDP on power10.
  2021-08-25 19:46 [PATCH] Generate XXSPLTIDP on power10 Michael Meissner
  2021-08-26 19:17 ` will schmidt
  2021-08-31 14:50 ` Ping: " Michael Meissner
@ 2021-08-31 23:41 ` Segher Boessenkool
  2021-09-01 20:22   ` Michael Meissner
  2 siblings, 1 reply; 10+ messages in thread
From: Segher Boessenkool @ 2021-08-31 23:41 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, David Edelsohn, Bill Schmidt,
	Peter Bergner, Will Schmidt

Hi!

Please do two separate patches.  The first that adds the instruction
(with a bit pattern, i.e. integer, input), and perhaps a second pattern
that has an fp as input and uses it if the constant is valid for the
insn (survives being converted to SP and back to DP (or the other way
around), and is not denormal).  That can be two patches if you want,
but :-)

Having the integer intermediate step not only makes the code hugely less
complicated, but is also allows e.g.

===
typedef unsigned long long v2u64 __attribute__ ((vector_size (16)));
v2u64 f(void)
{
        v2u64 x = { 0x8000000000000000, 0x8000000000000000 };
        return x;
}
===

to be optimised properly.

The second part is letting the existing code use such FP (and integer!)
contants.

On Wed, Aug 25, 2021 at 03:46:43PM -0400, Michael Meissner wrote:
> +;; SF/DF/V2DF scalar or vector constant that can be loaded with XXSPLTIDP
> +(define_constraint "eF"
> +  "A vector constant that can be loaded with the XXSPLTIDP instruction."
> +  (match_operand 0 "xxspltidp_operand"))

vector *or float*.  It should allow all vectors, not just FP ones.

> +;; Return 1 if operand is a SF/DF CONST_DOUBLE or V2DF CONST_VECTOR that can be
> +;; loaded via the ISA 3.1 XXSPLTIDP instruction.
> +(define_predicate "xxspltidp_operand"
> +  (match_code "const_double,const_vector,vec_duplicate")
> +{
> +  HOST_WIDE_INT value = 0;
> +  return xxspltidp_constant_p (op, mode, &value);
> +})

Don't do that.  Factor the code properly.  A predicate function should
never have side effects.

Since this is the only place you want to convert the value to its bit
pattern, you should just do that here.

(Btw, initialising the value (although the function always writes it) is
not defensive programming, it is hiding bugs.  IMNSHO :-) )

> +bool
> +xxspltidp_constant_p (rtx op,
> +		      machine_mode mode,
> +		      HOST_WIDE_INT *constant_ptr)
> +{
> +  *constant_ptr = 0;

And a second time, too!  Don't do either.

> +  if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX)
> +    return false;

This is the wrong place to test these.  It belongs in the caller.

> +      if (CONST_VECTOR_P (op))
> +	{
> +	  element = CONST_VECTOR_ELT (op, 0);
> +	  if (!rtx_equal_p (element, CONST_VECTOR_ELT (op, 1)))
> +	    return false;
> +	}

const_vec_duplicate_p

(But you actually should check if the bit pattern is valid, nothing
more, nothing less).

> +  /* Don't return true for 0.0 since that is easy to create without
> +     XXSPLTIDP.  */
> +  if (element == CONST0_RTX (mode))
> +    return false;

Don't do that.  Instead have whatever decides what insn to use choose
more directly.

> +/* Whether a permute type instruction is a prefixed instruction.  This is
> +   called from the prefixed attribute processing.  */
> +
> +bool
> +prefixed_permute_p (rtx_insn *insn)

What does this have to do with this patch?

> +{
> +  rtx set = single_set (insn);
> +  if (!set)
> +    return false;
> +
> +  rtx dest = SET_DEST (set);
> +  rtx src = SET_SRC (set);
> +  machine_mode mode = GET_MODE (dest);
> +
> +  if (!REG_P (dest) && !SUBREG_P (dest))
> +    return false;
> +
> +  switch (mode)
> +    {
> +    case DFmode:
> +    case SFmode:
> +    case V2DFmode:
> +      return xxspltidp_operand (src, mode);

??!!??

That is not a permute insn at all.

Perhaps you mean it is executed in the PM pipe on current
implementations (all one of-em).  That does not make it a permute insn.
It is not a good idea to call insns that do not have semantics similar
to permutations "permute".

> @@ -7755,15 +7760,16 @@ (define_insn "movsf_hardfloat"
> @@ -8051,20 +8057,21 @@ (define_insn "*mov<mode>_hardfloat32"
> @@ -8091,19 +8098,19 @@ (define_insn "*mov<mode>_softfloat32"
> @@ -8125,18 +8132,19 @@ (define_insn "*mov<mode>_hardfloat64"
> @@ -8170,6 +8178,7 @@ (define_insn "*mov<mode>_softfloat64"

It would be a good idea to merge many of these patterns again.  We can
do this now that we have the "isa" and "enabled" attributes.

Segher

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Generate XXSPLTIDP on power10.
  2021-08-31 23:41 ` Segher Boessenkool
@ 2021-09-01 20:22   ` Michael Meissner
  2021-09-02 18:38     ` Segher Boessenkool
  0 siblings, 1 reply; 10+ messages in thread
From: Michael Meissner @ 2021-09-01 20:22 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Michael Meissner, gcc-patches, David Edelsohn, Bill Schmidt,
	Peter Bergner, Will Schmidt

On Tue, Aug 31, 2021 at 06:41:30PM -0500, Segher Boessenkool wrote:
> Hi!
> 
> Please do two separate patches.  The first that adds the instruction
> (with a bit pattern, i.e. integer, input), and perhaps a second pattern
> that has an fp as input and uses it if the constant is valid for the
> insn (survives being converted to SP and back to DP (or the other way
> around), and is not denormal).  That can be two patches if you want,
> but :-)

Well we already have the two stages for the built-in.  But the point of the
work is to make it work without the explicit builtin.

> Having the integer intermediate step not only makes the code hugely less
> complicated, but is also allows e.g.


> ===
> typedef unsigned long long v2u64 __attribute__ ((vector_size (16)));
> v2u64 f(void)
> {
>         v2u64 x = { 0x8000000000000000, 0x8000000000000000 };
>         return x;
> }
> ===
> 
> to be optimised properly.
> 
> The second part is letting the existing code use such FP (and integer!)
> contants.

It is somewhat harder to do this as integer values, because you have to check
whether all of the bits are correct, including skipping the SF denormal bits.

I tend to think in real life, the only win will be be -0.0
(i.e. 0x8000000000000000 that you mentioned).  I doubt many V2DI constants will
be appropriate canidates for this.

> 
> On Wed, Aug 25, 2021 at 03:46:43PM -0400, Michael Meissner wrote:
> > +;; SF/DF/V2DF scalar or vector constant that can be loaded with XXSPLTIDP
> > +(define_constraint "eF"
> > +  "A vector constant that can be loaded with the XXSPLTIDP instruction."
> > +  (match_operand 0 "xxspltidp_operand"))
> 
> vector *or float*.  It should allow all vectors, not just FP ones.
> 
> > +;; Return 1 if operand is a SF/DF CONST_DOUBLE or V2DF CONST_VECTOR that can be
> > +;; loaded via the ISA 3.1 XXSPLTIDP instruction.
> > +(define_predicate "xxspltidp_operand"
> > +  (match_code "const_double,const_vector,vec_duplicate")
> > +{
> > +  HOST_WIDE_INT value = 0;
> > +  return xxspltidp_constant_p (op, mode, &value);
> > +})
> 
> Don't do that.  Factor the code properly.  A predicate function should
> never have side effects.
> 
> Since this is the only place you want to convert the value to its bit
> pattern, you should just do that here.
> 
> (Btw, initialising the value (although the function always writes it) is
> not defensive programming, it is hiding bugs.  IMNSHO :-) )

And avoiding warnings.

> 
> > +bool
> > +xxspltidp_constant_p (rtx op,
> > +		      machine_mode mode,
> > +		      HOST_WIDE_INT *constant_ptr)
> > +{
> > +  *constant_ptr = 0;
> 
> And a second time, too!  Don't do either.
> 
> > +  if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX)
> > +    return false;
> 
> This is the wrong place to test these.  It belongs in the caller.
> 
> > +      if (CONST_VECTOR_P (op))
> > +	{
> > +	  element = CONST_VECTOR_ELT (op, 0);
> > +	  if (!rtx_equal_p (element, CONST_VECTOR_ELT (op, 1)))
> > +	    return false;
> > +	}
> 
> const_vec_duplicate_p

Ok.

> (But you actually should check if the bit pattern is valid, nothing
> more, nothing less).
> 
> > +  /* Don't return true for 0.0 since that is easy to create without
> > +     XXSPLTIDP.  */
> > +  if (element == CONST0_RTX (mode))
> > +    return false;
> 
> Don't do that.  Instead have whatever decides what insn to use choose
> more directly.
> 
> > +/* Whether a permute type instruction is a prefixed instruction.  This is
> > +   called from the prefixed attribute processing.  */
> > +
> > +bool
> > +prefixed_permute_p (rtx_insn *insn)
> 
> What does this have to do with this patch?

This is used by the prefixed type attribute to know that the xxsplti{w,dp,32dx}
instructions are prefixed by default, but unlike paddi, they don't want a
leading 'p' in the instruction.

The alternative is for each of the moves to add an explicit prefixed attribute,
instead of letting the default be set.


> > +{
> > +  rtx set = single_set (insn);
> > +  if (!set)
> > +    return false;
> > +
> > +  rtx dest = SET_DEST (set);
> > +  rtx src = SET_SRC (set);
> > +  machine_mode mode = GET_MODE (dest);
> > +
> > +  if (!REG_P (dest) && !SUBREG_P (dest))
> > +    return false;
> > +
> > +  switch (mode)
> > +    {
> > +    case DFmode:
> > +    case SFmode:
> > +    case V2DFmode:
> > +      return xxspltidp_operand (src, mode);
> 
> ??!!??
> 
> That is not a permute insn at all.
> 
> Perhaps you mean it is executed in the PM pipe on current
> implementations (all one of-em).  That does not make it a permute insn.
> It is not a good idea to call insns that do not have semantics similar
> to permutations "permute".

I'll come up with a different name.  But because it includes SPLAT-ing to the
other double word, I would tend to believe these instructions will always be a
permute class instruction.

> 
> > @@ -7755,15 +7760,16 @@ (define_insn "movsf_hardfloat"
> > @@ -8051,20 +8057,21 @@ (define_insn "*mov<mode>_hardfloat32"
> > @@ -8091,19 +8098,19 @@ (define_insn "*mov<mode>_softfloat32"
> > @@ -8125,18 +8132,19 @@ (define_insn "*mov<mode>_hardfloat64"
> > @@ -8170,6 +8178,7 @@ (define_insn "*mov<mode>_softfloat64"
> 
> It would be a good idea to merge many of these patterns again.  We can
> do this now that we have the "isa" and "enabled" attributes.

I don't plan to do this right now.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.ibm.com, phone: +1 (978) 899-4797

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] Generate XXSPLTIDP on power10.
  2021-09-01 20:22   ` Michael Meissner
@ 2021-09-02 18:38     ` Segher Boessenkool
  0 siblings, 0 replies; 10+ messages in thread
From: Segher Boessenkool @ 2021-09-02 18:38 UTC (permalink / raw)
  To: Michael Meissner, gcc-patches, David Edelsohn, Bill Schmidt,
	Peter Bergner, Will Schmidt

On Wed, Sep 01, 2021 at 04:22:13PM -0400, Michael Meissner wrote:
> On Tue, Aug 31, 2021 at 06:41:30PM -0500, Segher Boessenkool wrote:
> > Hi!
> > 
> > Please do two separate patches.  The first that adds the instruction
> > (with a bit pattern, i.e. integer, input), and perhaps a second pattern
> > that has an fp as input and uses it if the constant is valid for the
> > insn (survives being converted to SP and back to DP (or the other way
> > around), and is not denormal).  That can be two patches if you want,
> > but :-)
> 
> Well we already have the two stages for the built-in.  But the point of the
> work is to make it work without the explicit builtin.

There should be a define_insn that works with just as just a bit
pattern, without floating point in sight.

> > Having the integer intermediate step not only makes the code hugely less
> > complicated, but is also allows e.g.
> 
> 
> > ===
> > typedef unsigned long long v2u64 __attribute__ ((vector_size (16)));
> > v2u64 f(void)
> > {
> >         v2u64 x = { 0x8000000000000000, 0x8000000000000000 };
> >         return x;
> > }
> > ===
> > 
> > to be optimised properly.
> > 
> > The second part is letting the existing code use such FP (and integer!)
> > contants.
> 
> It is somewhat harder to do this as integer values, because you have to check
> whether all of the bits are correct, including skipping the SF denormal bits.

You have to *anyway*.  And it is easy enough.

> I tend to think in real life, the only win will be be -0.0
> (i.e. 0x8000000000000000 that you mentioned).  I doubt many V2DI constants will
> be appropriate canidates for this.

There are a few more.  But the point is that the code is *simpler* like
this.  It usually is -- shortcuts often take more time in the end.

> > (Btw, initialising the value (although the function always writes it) is
> > not defensive programming, it is hiding bugs.  IMNSHO :-) )
> 
> And avoiding warnings.

Shutting up warnings that your control flow is not good is not a good
idea usually.  Don't fight the compiler: it does not matter if you have
better taste than it, it will win.  Instead, write better control flow.
There might even be an actual bug hiding in there!

> > > +/* Whether a permute type instruction is a prefixed instruction.  This is
> > > +   called from the prefixed attribute processing.  */
> > > +
> > > +bool
> > > +prefixed_permute_p (rtx_insn *insn)
> > 
> > What does this have to do with this patch?
> 
> This is used by the prefixed type attribute to know that the xxsplti{w,dp,32dx}
> instructions are prefixed by default, but unlike paddi, they don't want a
> leading 'p' in the instruction.

They are not permute insns, that's where your code turns into an enigma.

You should not have a function for prefixed insns that are not a variant
of non-prefixed insns: that is *most* prefixed insns (eventually it will
be, right now you see memory insns mostly of course).  Pick out the
special case, instead (hint: if it has a "not" in the description, you
probably picked the wrong side).

So you have an "is_prefixed" thing, all prefixed insns, and an
"is_prefixed_variant" (or some better name :-) ) function, that returns
true for those prefixed insns written with a leading "p" on an existing
base insn (and actually have the original opcode as suffix).

> > > @@ -7755,15 +7760,16 @@ (define_insn "movsf_hardfloat"
> > > @@ -8051,20 +8057,21 @@ (define_insn "*mov<mode>_hardfloat32"
> > > @@ -8091,19 +8098,19 @@ (define_insn "*mov<mode>_softfloat32"
> > > @@ -8125,18 +8132,19 @@ (define_insn "*mov<mode>_hardfloat64"
> > > @@ -8170,6 +8178,7 @@ (define_insn "*mov<mode>_softfloat64"
> > 
> > It would be a good idea to merge many of these patterns again.  We can
> > do this now that we have the "isa" and "enabled" attributes.
> 
> I don't plan to do this right now.

The longer you wait the more work it becomes, and the more work other
things (like this patch) become as well.

Segher

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-09-02 18:39 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-08-25 19:46 [PATCH] Generate XXSPLTIDP on power10 Michael Meissner
2021-08-26 19:17 ` will schmidt
2021-08-26 21:28   ` Michael Meissner
2021-08-27  2:21     ` Michael Meissner
2021-08-31 22:52     ` Segher Boessenkool
2021-09-01 20:06       ` Michael Meissner
2021-08-31 14:50 ` Ping: " Michael Meissner
2021-08-31 23:41 ` Segher Boessenkool
2021-09-01 20:22   ` Michael Meissner
2021-09-02 18:38     ` Segher Boessenkool

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).