From: will schmidt <will_schmidt@vnet.ibm.com>
To: Michael Meissner <meissner@linux.ibm.com>,
gcc-patches@gcc.gnu.org,
Segher Boessenkool <segher@kernel.crashing.org>,
David Edelsohn <dje.gcc@gmail.com>,
Bill Schmidt <wschmidt@linux.ibm.com>,
Peter Bergner <bergner@linux.ibm.com>
Subject: Re: [PATCH] Generate XXSPLTIDP on power10.
Date: Thu, 26 Aug 2021 14:17:57 -0500 [thread overview]
Message-ID: <f83b1e45fae0ddbabf3d0322362a23c92bd67481.camel@vnet.ibm.com> (raw)
In-Reply-To: <YSaeI5BVUxofem9y@toto.the-meissners.org>
On Wed, 2021-08-25 at 15:46 -0400, Michael Meissner wrote:
> Generate XXSPLTIDP on power10.
>
> This patch implements XXSPLTIDP support for SF and DF scalar constants and V2DF
> vector constants. The XXSPLTIDP instruction is given a 32-bit immediate that
> is converted to a vector of two DFmode constants. The immediate is in SFmode
> format, so only constants that fit as SFmode values can be loaded with
> XXSPLTIDP.
ok
>
> I added a new constraint (eF) to match constants that can be loaded with the
> XXSPLTIDP instruction.
>
> I have added a temporary switch (-mxxspltidp) to control whether or not the
> XXSPLTIDP instruction is generated.
How temporary?
>
> I added 3 new tests to test loading up SF/DF scalar and V2DF vector
> constants.
>
> I have tested this with bootstrap compilers on power10 systems and there was no
> regression. I have built GCC with these patches on little endian power9 and
> big endian power8 systems, and there were no regressions.
>
> In addition, I have built and run the full Spec 2017 rate suite, comparing with
> the patches enabled and not enabled. There were roughly 66,000 XXSPLTIDP's
> generated in the rate build for Spec 2017. On a stand-alone system that is
> running single threaded, blender_r has a 1.9% increase in performance, and rest
> of the benchmarks are performance neutral. However, I would expect that in a
> real world scenario, switching to use XXSPLTIDP will increase performance due
> to removing all of the loads.
ok
>
> Can I check this into the master branch?
>
> 2021-08-25 Michael Meissner <meissner@linux.ibm.com>
>
> gcc/
> * config/rs6000/constraints.md (eF): New constraint.
> * config/rs6000/predicates.md (easy_fp_constant): If we can load
> the scalar constant with XXSPLTIDP, the floating point constant is
> easy.
Could be shortened to something like ?
Add clause to accept xxspltidp_operand as easy.
> (xxspltidp_operand): New predicate.
Will there ever be another instruction using the SF/DF CONST_DOUBLE or
V2DF CONST_VECTOR ? I tentatively question the name of the operand,
but defer..
> (easy_vector_constant): If we can generate XXSPLTIDP, mark the
> vector constant as easy.
Duplicated from above.
> * config/rs6000/rs6000-protos.h (xxspltidp_constant_p): New
> declaration.
> (prefixed_permute_p): Likewise.
> * config/rs6000/rs6000.c (xxspltidp_constant_p): New function.
> (output_vec_const_move): Add support for XXSPLTIDP.
> (prefixed_permute_p): New function.
Duplicated.
> * config/rs6000/rs6000.md (prefixed attribute): Add support for
> permute prefixed instructions.
> (movsf_hardfloat): Add XXSPLTIDP support.
> (mov<mode>_hardfloat32, FMOVE64 iterator): Likewise.
> (mov<mode>_hardfloat64, FMOVE64 iterator): Likewise.
> * config/rs6000/rs6000.opt (-mxxspltidp): New switch.
> * config/rs6000/vsx.md (vsx_move<mode>_64bit): Add XXSPLTIDP
> support.
> (vsx_move<mode>_32bit): Likewise.
No e in mov (per patch contents below).
> (vsx_splat_v2df_xxspltidp): New insn.
> (XXSPLTIDP): New mode iterator.
> (xxspltidp_<mode>_internal): New insn and splits.
> (xxspltidp_<mode>_inst): Replace xxspltidp_v2df_inst with an
> iterated form that also does SFmode, and DFmode.
Swap "an iterated form" with "xxspltidp_<mode>_inst ?
>
> gcc/testsuite/
> * gcc.target/powerpc/vec-splat-constant-sf.c: New test.
> * gcc.target/powerpc/vec-splat-constant-df.c: New test.
> * gcc.target/powerpc/vec-splat-constant-v2df.c: New test.
> ---
> gcc/config/rs6000/constraints.md | 5 +
> gcc/config/rs6000/predicates.md | 17 +++
> gcc/config/rs6000/rs6000-protos.h | 2 +
> gcc/config/rs6000/rs6000.c | 106 ++++++++++++++++++
> gcc/config/rs6000/rs6000.md | 45 +++++---
> gcc/config/rs6000/rs6000.opt | 4 +
> gcc/config/rs6000/vsx.md | 64 ++++++++++-
> .../powerpc/vec-splat-constant-df.c | 60 ++++++++++
> .../powerpc/vec-splat-constant-sf.c | 60 ++++++++++
> .../powerpc/vec-splat-constant-v2df.c | 64 +++++++++++
> 10 files changed, 405 insertions(+), 22 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
> create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
> create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
>
> diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
> index c8cff1a3038..ea2e4a267c3 100644
> --- a/gcc/config/rs6000/constraints.md
> +++ b/gcc/config/rs6000/constraints.md
> @@ -208,6 +208,11 @@ (define_constraint "P"
> (and (match_code "const_int")
> (match_test "((- (unsigned HOST_WIDE_INT) ival) + 0x8000) < 0x10000")))
>
> +;; SF/DF/V2DF scalar or vector constant that can be loaded with XXSPLTIDP
> +(define_constraint "eF"
> + "A vector constant that can be loaded with the XXSPLTIDP instruction."
> + (match_operand 0 "xxspltidp_operand"))
> +
> ;; 34-bit signed integer constant
> (define_constraint "eI"
> "A signed 34-bit integer constant if prefixed instructions are supported."
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index 956e42bc514..134243e404b 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -601,6 +601,11 @@ (define_predicate "easy_fp_constant"
> if (TARGET_VSX && op == CONST0_RTX (mode))
> return 1;
>
> + /* If we have the ISA 3.1 XXSPLTIDP instruction, see if the constant can
> + be loaded with that instruction. */
> + if (xxspltidp_operand (op, mode))
> + return 1;
> +
> /* Otherwise consider floating point constants hard, so that the
> constant gets pushed to memory during the early RTL phases. This
> has the advantage that double precision constants that can be
> @@ -640,6 +645,15 @@ (define_predicate "xxspltib_constant_nosplit"
> return num_insns == 1;
> })
>
> +;; Return 1 if operand is a SF/DF CONST_DOUBLE or V2DF CONST_VECTOR that can be
> +;; loaded via the ISA 3.1 XXSPLTIDP instruction.
"Return 1 if" doesnt seem right given the return statement here.
> +(define_predicate "xxspltidp_operand"
> + (match_code "const_double,const_vector,vec_duplicate")
> +{
> + HOST_WIDE_INT value = 0;
> + return xxspltidp_constant_p (op, mode, &value);
> +})
> +
> ;; Return 1 if the operand is a CONST_VECTOR and can be loaded into a
> ;; vector register without using memory.
> (define_predicate "easy_vector_constant"
> @@ -653,6 +667,9 @@ (define_predicate "easy_vector_constant"
> if (zero_constant (op, mode) || all_ones_constant (op, mode))
> return true;
>
> + if (xxspltidp_operand (op, mode))
> + return true;
> +
> if (TARGET_P9_VECTOR
> && xxspltib_constant_p (op, mode, &num_insns, &value))
> return true;
> diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
> index 14f6b313105..9bba57c22f2 100644
> --- a/gcc/config/rs6000/rs6000-protos.h
> +++ b/gcc/config/rs6000/rs6000-protos.h
> @@ -32,6 +32,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, int, int, int,
>
> extern int easy_altivec_constant (rtx, machine_mode);
> extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
> +extern bool xxspltidp_constant_p (rtx, machine_mode, HOST_WIDE_INT *);
> extern int vspltis_shifted (rtx);
> extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
> extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
> @@ -198,6 +199,7 @@ enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode);
> extern bool prefixed_load_p (rtx_insn *);
> extern bool prefixed_store_p (rtx_insn *);
> extern bool prefixed_paddi_p (rtx_insn *);
> +extern bool prefixed_permute_p (rtx_insn *);
> extern void rs6000_asm_output_opcode (FILE *);
> extern void output_pcrel_opt_reloc (rtx);
> extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int);
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index e073b26b430..322b3c83925 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -6533,6 +6533,74 @@ xxspltib_constant_p (rtx op,
> return true;
> }
>
> +/* Return true if OP is of the given MODE and can be synthesized with ISA 3.1
> + XXSPLTIDP instruction.
> +
> + Return the constant that is being split via CONSTANT_PTR to use in the
> + XXSPLTIDP instruction. */
Appears to return true or false. Is the "Return the constant" comment
meant to go on the predicate definition earlier?
> +
> +bool
> +xxspltidp_constant_p (rtx op,
> + machine_mode mode,
> + HOST_WIDE_INT *constant_ptr)
> +{
> + *constant_ptr = 0;
> +
> + if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX)
> + return false;
> +
> + if (mode == VOIDmode)
> + mode = GET_MODE (op);
> +
> + rtx element = op;
> + if (mode == V2DFmode)
> + {
> + if (CONST_VECTOR_P (op))
> + {
> + element = CONST_VECTOR_ELT (op, 0);
> + if (!rtx_equal_p (element, CONST_VECTOR_ELT (op, 1)))
> + return false;
> + }
> +
> + else if (GET_CODE (op) == VEC_DUPLICATE)
> + element = XEXP (op, 0);
> +
> + else
> + return false;
> +
> + mode = DFmode;
> + }
> +
> + if (mode != SFmode && mode != DFmode)
> + return false;
> +
> + if (GET_MODE (element) != mode)
> + return false;
> +
> + if (!CONST_DOUBLE_P (element))
> + return false;
> +
> + /* Don't return true for 0.0 since that is easy to create without
> + XXSPLTIDP. */
> + if (element == CONST0_RTX (mode))
> + return false;
> +
> + /* If the value doesn't fit in a SFmode, exactly, we can't use XXSPLTIDP. */
> + const struct real_value *rv = CONST_DOUBLE_REAL_VALUE (element);
> + if (!exact_real_truncate (SFmode, rv))
> + return false;
The 'exactly' caught my eye. Per a glance at comments in
extract_real_truncate this indicates that the value is identical after
conversion to the new format. Ok.
> +
> + long value;
> + REAL_VALUE_TO_TARGET_SINGLE (*rv, value);
> +
> + /* Test for SFmode denormal (exponent is 0, mantissa field is non-zero). */
> + if (((value & 0x7F800000) == 0) && ((value & 0x7FFFFF) != 0))
> + return false;
> +
> + *constant_ptr = value;
> + return true;
> +}
ok
> +
> const char *
> output_vec_const_move (rtx *operands)
> {
> @@ -6548,6 +6616,7 @@ output_vec_const_move (rtx *operands)
> {
> bool dest_vmx_p = ALTIVEC_REGNO_P (REGNO (dest));
> int xxspltib_value = 256;
> + HOST_WIDE_INT xxspltidp_value = 0;
> int num_insns = -1;
>
> if (zero_constant (vec, mode))
> @@ -6577,6 +6646,12 @@ output_vec_const_move (rtx *operands)
> gcc_unreachable ();
> }
>
> + if (xxspltidp_constant_p (vec, mode, &xxspltidp_value))
> + {
> + operands[2] = GEN_INT (xxspltidp_value);
> + return "xxspltidp %x0,%2";
> + }
> +
> if (TARGET_P9_VECTOR
> && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
> {
ok
> @@ -26219,6 +26294,37 @@ prefixed_paddi_p (rtx_insn *insn)
> return (iform == INSN_FORM_PCREL_EXTERNAL || iform == INSN_FORM_PCREL_LOCAL);
> }
>
> +/* Whether a permute type instruction is a prefixed instruction. This is
> + called from the prefixed attribute processing. */
> +
> +bool
> +prefixed_permute_p (rtx_insn *insn)
> +{
> + rtx set = single_set (insn);
> + if (!set)
> + return false;
> +
> + rtx dest = SET_DEST (set);
> + rtx src = SET_SRC (set);
> + machine_mode mode = GET_MODE (dest);
> +
> + if (!REG_P (dest) && !SUBREG_P (dest))
> + return false;
> +
> + switch (mode)
> + {
> + case DFmode:
> + case SFmode:
> + case V2DFmode:
> + return xxspltidp_operand (src, mode);
> +
> + default:
> + break;
> + }
> +
> + return false;
> +}
> +
ok
> /* Whether the next instruction needs a 'p' prefix issued before the
> instruction is printed out. */
> static bool prepend_p_to_next_insn;
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index a84438f8545..bf3bfed3b88 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -314,6 +314,11 @@ (define_attr "prefixed" "no,yes"
>
> (eq_attr "type" "integer,add")
> (if_then_else (match_test "prefixed_paddi_p (insn)")
> + (const_string "yes")
> + (const_string "no"))
> +
> + (eq_attr "type" "vecperm")
> + (if_then_else (match_test "prefixed_permute_p (insn)")
> (const_string "yes")
> (const_string "no"))]
>
> @@ -7723,17 +7728,17 @@ (define_split
> ;;
> ;; LWZ LFS LXSSP LXSSPX STFS STXSSP
> ;; STXSSPX STW XXLXOR LI FMR XSCPSGNDP
> -;; MR MT<x> MF<x> NOP
> +;; MR MT<x> MF<x> NOP XXSPLTIDP
>
> (define_insn "movsf_hardfloat"
> [(set (match_operand:SF 0 "nonimmediate_operand"
> "=!r, f, v, wa, m, wY,
> Z, m, wa, !r, f, wa,
> - !r, *c*l, !r, *h")
> + !r, *c*l, !r, *h, wa")
> (match_operand:SF 1 "input_operand"
> "m, m, wY, Z, f, v,
> wa, r, j, j, f, wa,
> - r, r, *h, 0"))]
> + r, r, *h, 0, eF"))]
> "(register_operand (operands[0], SFmode)
> || register_operand (operands[1], SFmode))
> && TARGET_HARD_FLOAT
> @@ -7755,15 +7760,16 @@ (define_insn "movsf_hardfloat"
> mr %0,%1
> mt%0 %1
> mf%1 %0
> - nop"
> + nop
> + #"
> [(set_attr "type"
> "load, fpload, fpload, fpload, fpstore, fpstore,
> fpstore, store, veclogical, integer, fpsimple, fpsimple,
> - *, mtjmpr, mfjmpr, *")
> + *, mtjmpr, mfjmpr, *, vecperm")
> (set_attr "isa"
> "*, *, p9v, p8v, *, p9v,
> p8v, *, *, *, *, *,
> - *, *, *, *")])
> + *, *, *, *, p10")])
OK, i think. The addition of vecperm for type and p10 for the isa
entries catch my eye, but I expect this is obvious to others.
>
> ;; LWZ LFIWZX STW STFIWX MTVSRWZ MFVSRWZ
> ;; FMR MR MT%0 MF%1 NOP
> @@ -8023,18 +8029,18 @@ (define_split
>
> ;; STFD LFD FMR LXSD STXSD
> ;; LXSD STXSD XXLOR XXLXOR GPR<-0
> -;; LWZ STW MR
> +;; LWZ STW MR XXSPLTIDP
>
>
> (define_insn "*mov<mode>_hardfloat32"
> [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
> "=m, d, d, <f64_p9>, wY,
> <f64_av>, Z, <f64_vsx>, <f64_vsx>, !r,
> - Y, r, !r")
> + Y, r, !r, wa")
> (match_operand:FMOVE64 1 "input_operand"
> "d, m, d, wY, <f64_p9>,
> Z, <f64_av>, <f64_vsx>, <zero_fp>, <zero_fp>,
> - r, Y, r"))]
> + r, Y, r, eF"))]
> "! TARGET_POWERPC64 && TARGET_HARD_FLOAT
> && (gpc_reg_operand (operands[0], <MODE>mode)
> || gpc_reg_operand (operands[1], <MODE>mode))"
> @@ -8051,20 +8057,21 @@ (define_insn "*mov<mode>_hardfloat32"
> #
> #
> #
> + #
> #"
> [(set_attr "type"
> "fpstore, fpload, fpsimple, fpload, fpstore,
> fpload, fpstore, veclogical, veclogical, two,
> - store, load, two")
> + store, load, two, vecperm")
> (set_attr "size" "64")
> (set_attr "length"
> "*, *, *, *, *,
> *, *, *, *, 8,
> - 8, 8, 8")
> + 8, 8, 8, *")
> (set_attr "isa"
> "*, *, *, p9v, p9v,
> p7v, p7v, *, *, *,
> - *, *, *")])
> + *, *, *, p10")])
>
> ;; STW LWZ MR G-const H-const F-const
>
> @@ -8091,19 +8098,19 @@ (define_insn "*mov<mode>_softfloat32"
> ;; STFD LFD FMR LXSD STXSD
> ;; LXSDX STXSDX XXLOR XXLXOR LI 0
> ;; STD LD MR MT{CTR,LR} MF{CTR,LR}
> -;; NOP MFVSRD MTVSRD
> +;; NOP MFVSRD MTVSRD XXSPLTIDP
>
> (define_insn "*mov<mode>_hardfloat64"
> [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
> "=m, d, d, <f64_p9>, wY,
> <f64_av>, Z, <f64_vsx>, <f64_vsx>, !r,
> YZ, r, !r, *c*l, !r,
> - *h, r, <f64_dm>")
> + *h, r, <f64_dm>, wa")
> (match_operand:FMOVE64 1 "input_operand"
> "d, m, d, wY, <f64_p9>,
> Z, <f64_av>, <f64_vsx>, <zero_fp>, <zero_fp>,
> r, YZ, r, r, *h,
> - 0, <f64_dm>, r"))]
> + 0, <f64_dm>, r, eF"))]
> "TARGET_POWERPC64 && TARGET_HARD_FLOAT
> && (gpc_reg_operand (operands[0], <MODE>mode)
> || gpc_reg_operand (operands[1], <MODE>mode))"
> @@ -8125,18 +8132,19 @@ (define_insn "*mov<mode>_hardfloat64"
> mf%1 %0
> nop
> mfvsrd %0,%x1
> - mtvsrd %x0,%1"
> + mtvsrd %x0,%1
> + #"
> [(set_attr "type"
> "fpstore, fpload, fpsimple, fpload, fpstore,
> fpload, fpstore, veclogical, veclogical, integer,
> store, load, *, mtjmpr, mfjmpr,
> - *, mfvsr, mtvsr")
> + *, mfvsr, mtvsr, vecperm")
> (set_attr "size" "64")
> (set_attr "isa"
> "*, *, *, p9v, p9v,
> p7v, p7v, *, *, *,
> *, *, *, *, *,
> - *, p8v, p8v")])
> + *, p8v, p8v, p10")])
>
> ;; STD LD MR MT<SPR> MF<SPR> G-const
> ;; H-const F-const Special
Ok.
> @@ -8170,6 +8178,7 @@ (define_insn "*mov<mode>_softfloat64"
> (set_attr "length"
> "*, *, *, *, *, 8,
> 12, 16, *")])
> +
>
Unnecessarily blank line?
> (define_expand "mov<mode>"
> [(set (match_operand:FMOVE128 0 "general_operand")
> diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
> index 0538db387dc..928c4fafe07 100644
> --- a/gcc/config/rs6000/rs6000.opt
> +++ b/gcc/config/rs6000/rs6000.opt
> @@ -639,3 +639,7 @@ Enable instructions that guard against return-oriented programming attacks.
> mprivileged
> Target Var(rs6000_privileged) Init(0)
> Generate code that will run in privileged state.
> +
> +mxxspltidp
> +Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
> +Generate (do not generate) XXSPLTIDP instructions.
Ok.
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index bf033e31c1c..af9a04870d4 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -1191,16 +1191,19 @@ (define_insn_and_split "*xxspltib_<mode>_split"
> ;; instruction). But generate XXLXOR/XXLORC if it will avoid a register move.
>
> ;; VSX store VSX load VSX move VSX->GPR GPR->VSX LQ (GPR)
> +;; XXSPLTIDP
> ;; STQ (GPR) GPR load GPR store GPR move XXSPLTIB VSPLTISW
> ;; VSX 0/-1 VMX const GPR const LVX (VMX) STVX (VMX)
> (define_insn "vsx_mov<mode>_64bit"
> [(set (match_operand:VSX_M 0 "nonimmediate_operand"
> "=ZwO, wa, wa, r, we, ?wQ,
> + wa,
> ?&r, ??r, ??Y, <??r>, wa, v,
> ?wa, v, <??r>, wZ, v")
>
> (match_operand:VSX_M 1 "input_operand"
> "wa, ZwO, wa, we, r, r,
> + eF,
> wQ, Y, r, r, wE, jwM,
> ?jwM, W, <nW>, v, wZ"))]
>
> @@ -1212,36 +1215,44 @@ (define_insn "vsx_mov<mode>_64bit"
> }
> [(set_attr "type"
> "vecstore, vecload, vecsimple, mtvsr, mfvsr, load,
> + vecperm,
> store, load, store, *, vecsimple, vecsimple,
> vecsimple, *, *, vecstore, vecload")
> (set_attr "num_insns"
> "*, *, *, 2, *, 2,
> + *,
> 2, 2, 2, 2, *, *,
> *, 5, 2, *, *")
> (set_attr "max_prefixed_insns"
> "*, *, *, *, *, 2,
> + *,
> 2, 2, 2, 2, *, *,
> *, *, *, *, *")
> (set_attr "length"
> "*, *, *, 8, *, 8,
> + *,
> 8, 8, 8, 8, *, *,
> *, 20, 8, *, *")
> (set_attr "isa"
> "<VSisa>, <VSisa>, <VSisa>, *, *, *,
> + p10,
> *, *, *, *, p9v, *,
> <VSisa>, *, *, *, *")])
>
> ;; VSX store VSX load VSX move GPR load GPR store GPR move
> +;; XXSPLTIDP
> ;; XXSPLTIB VSPLTISW VSX 0/-1 VMX const GPR const
> ;; LVX (VMX) STVX (VMX)
> (define_insn "*vsx_mov<mode>_32bit"
> [(set (match_operand:VSX_M 0 "nonimmediate_operand"
> "=ZwO, wa, wa, ??r, ??Y, <??r>,
> + wa,
> wa, v, ?wa, v, <??r>,
> wZ, v")
>
> (match_operand:VSX_M 1 "input_operand"
> "wa, ZwO, wa, Y, r, r,
> + eF,
> wE, jwM, ?jwM, W, <nW>,
> v, wZ"))]
>
> @@ -1253,14 +1264,17 @@ (define_insn "*vsx_mov<mode>_32bit"
> }
> [(set_attr "type"
> "vecstore, vecload, vecsimple, load, store, *,
> + vecperm,
> vecsimple, vecsimple, vecsimple, *, *,
> vecstore, vecload")
> (set_attr "length"
> "*, *, *, 16, 16, 16,
> + *,
> *, *, *, 20, 16,
> *, *")
> (set_attr "isa"
> "<VSisa>, <VSisa>, <VSisa>, *, *, *,
> + p10,
> p9v, *, <VSisa>, *, *,
> *, *")])
>
ok
> @@ -4580,6 +4594,23 @@ (define_insn "vsx_splat_<mode>_reg"
> mtvsrdd %x0,%1,%1"
> [(set_attr "type" "vecperm,vecmove")])
>
> +(define_insn "*vsx_splat_v2df_xxspltidp"
> + [(set (match_operand:V2DF 0 "vsx_register_operand" "=wa")
> + (vec_duplicate:V2DF
> + (match_operand:DF 1 "xxspltidp_operand" "eF")))]
> + "TARGET_POWER10"
> +{
> + HOST_WIDE_INT value;
> +
> + if (!xxspltidp_constant_p (operands[1], DFmode, &value))
> + gcc_unreachable ();
> +
> + operands[2] = GEN_INT (value);
> + return "xxspltidp %x0,%1";
> +}
> + [(set_attr "type" "vecperm")
> + (set_attr "prefixed" "yes")])
> +
> (define_insn "vsx_splat_<mode>_mem"
> [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
> (vec_duplicate:VSX_D
> @@ -6449,15 +6480,40 @@ (define_expand "xxspltidp_v2df"
> DONE;
> })
>
> -(define_insn "xxspltidp_v2df_inst"
> - [(set (match_operand:V2DF 0 "register_operand" "=wa")
> - (unspec:V2DF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
> - UNSPEC_XXSPLTIDP))]
> +(define_mode_iterator XXSPLTIDP [SF DF V2DF])
> +
> +(define_insn "xxspltidp_<mode>_inst"
> + [(set (match_operand:XXSPLTIDP 0 "register_operand" "=wa")
> + (unspec:XXSPLTIDP [(match_operand:SI 1 "c32bit_cint_operand" "n")]
> + UNSPEC_XXSPLTIDP))]
> "TARGET_POWER10"
> "xxspltidp %x0,%1"
> [(set_attr "type" "vecperm")
> (set_attr "prefixed" "yes")])
>
> +;; Generate the XXSPLTIDP instruction to support SFmode and DFmode scalar
> +;; constants and V2DF vector constants where both elements are the same. The
> +;; constant has to be expressible as a SFmode constant that is not a SFmode
> +;; denormal value.
> +(define_insn_and_split "*xxspltidp_<mode>_internal"
> + [(set (match_operand:XXSPLTIDP 0 "vsx_register_operand" "=wa")
> + (match_operand:XXSPLTIDP 1 "xxspltidp_operand" "eF"))]
Extra spaces there.
> + "TARGET_POWER10"
> + "#"
> + "&& 1"
> + [(set (match_operand:XXSPLTIDP 0 "vsx_register_operand")
> + (unspec:XXSPLTIDP [(match_dup 2)] UNSPEC_XXSPLTIDP))]
> +{
> + HOST_WIDE_INT value = 0;
> +
> + if (!xxspltidp_constant_p (operands[1], <MODE>mode, &value))
> + gcc_unreachable ();
> +
> + operands[2] = GEN_INT (value);
> +}
> + [(set_attr "type" "vecperm")
> + (set_attr "prefixed" "yes")])
> +
> ;; XXSPLTI32DX built-in function support
> (define_expand "xxsplti32dx_v4si"
> [(set (match_operand:V4SI 0 "register_operand" "=wa")
ok
Just briefly looed at testcases.. nothing jumped out at me below.
Thanks
-Will
> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
> new file mode 100644
> index 00000000000..8f6e176f9af
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
> @@ -0,0 +1,60 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
> +
> +#include <math.h>
> +
> +/* Test generating DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
> + instruction. */
> +
> +double
> +scalar_double_0 (void)
> +{
> + return 0.0; /* XXSPLTIB or XXLXOR. */
> +}
> +
> +double
> +scalar_double_1 (void)
> +{
> + return 1.0; /* XXSPLTIDP. */
> +}
> +
> +#ifndef __FAST_MATH__
> +double
> +scalar_double_m0 (void)
> +{
> + return -0.0; /* XXSPLTIDP. */
> +}
> +
> +double
> +scalar_double_nan (void)
> +{
> + return __builtin_nan (""); /* XXSPLTIDP. */
> +}
> +
> +double
> +scalar_double_inf (void)
> +{
> + return __builtin_inf (); /* XXSPLTIDP. */
> +}
> +
> +double
> +scalar_double_m_inf (void) /* XXSPLTIDP. */
> +{
> + return - __builtin_inf ();
> +}
> +#endif
> +
> +double
> +scalar_double_pi (void)
> +{
> + return M_PI; /* PLFD. */
> +}
> +
> +double
> +scalar_double_denorm (void)
> +{
> + return 0x1p-149f; /* PLFD. */
> +}
> +
> +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
> new file mode 100644
> index 00000000000..72504bdfbbd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
> @@ -0,0 +1,60 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
> +
> +#include <math.h>
> +
> +/* Test generating SFmode constants with the ISA 3.1 (power10) XXSPLTIDP
> + instruction. */
> +
> +float
> +scalar_float_0 (void)
> +{
> + return 0.0f; /* XXSPLTIB or XXLXOR. */
> +}
> +
> +float
> +scalar_float_1 (void)
> +{
> + return 1.0f; /* XXSPLTIDP. */
> +}
> +
> +#ifndef __FAST_MATH__
> +float
> +scalar_float_m0 (void)
> +{
> + return -0.0f; /* XXSPLTIDP. */
> +}
> +
> +float
> +scalar_float_nan (void)
> +{
> + return __builtin_nanf (""); /* XXSPLTIDP. */
> +}
> +
> +float
> +scalar_float_inf (void)
> +{
> + return __builtin_inff (); /* XXSPLTIDP. */
> +}
> +
> +float
> +scalar_float_m_inf (void) /* XXSPLTIDP. */
> +{
> + return - __builtin_inff ();
> +}
> +#endif
> +
> +float
> +scalar_float_pi (void)
> +{
> + return (float)M_PI; /* XXSPLTIDP. */
> +}
> +
> +float
> +scalar_float_denorm (void)
> +{
> + return 0x1p-149f; /* PLFS. */
> +}
> +
> +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
> new file mode 100644
> index 00000000000..82ffc86f8aa
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
> @@ -0,0 +1,64 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
> +
> +#include <math.h>
> +
> +/* Test generating V2DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
> + instruction. */
> +
> +vector double
> +v2df_double_0 (void)
> +{
> + return (vector double) { 0.0, 0.0 }; /* XXSPLTIB or XXLXOR. */
> +}
> +
> +vector double
> +v2df_double_1 (void)
> +{
> + return (vector double) { 1.0, 1.0 }; /* XXSPLTIDP. */
> +}
> +
> +#ifndef __FAST_MATH__
> +vector double
> +v2df_double_m0 (void)
> +{
> + return (vector double) { -0.0, -0.0 }; /* XXSPLTIDP. */
> +}
> +
> +vector double
> +v2df_double_nan (void)
> +{
> + return (vector double) { __builtin_nan (""),
> + __builtin_nan ("") }; /* XXSPLTIDP. */
> +}
> +
> +vector double
> +v2df_double_inf (void)
> +{
> + return (vector double) { __builtin_inf (),
> + __builtin_inf () }; /* XXSPLTIDP. */
> +}
> +
> +vector double
> +v2df_double_m_inf (void)
> +{
> + return (vector double) { - __builtin_inf (),
> + - __builtin_inf () }; /* XXSPLTIDP. */
> +}
> +#endif
> +
> +vector double
> +v2df_double_pi (void)
> +{
> + return (vector double) { M_PI, M_PI }; /* PLVX. */
> +}
> +
> +vector double
> +v2df_double_denorm (void)
> +{
> + return (vector double) { (double)0x1p-149f,
> + (double)0x1p-149f }; /* PLVX. */
> +}
> +
> +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
> --
> 2.31.1
>
>
next prev parent reply other threads:[~2021-08-26 19:18 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-25 19:46 Michael Meissner
2021-08-26 19:17 ` will schmidt [this message]
2021-08-26 21:28 ` Michael Meissner
2021-08-27 2:21 ` Michael Meissner
2021-08-31 22:52 ` Segher Boessenkool
2021-09-01 20:06 ` Michael Meissner
2021-08-31 14:50 ` Ping: " Michael Meissner
2021-08-31 23:41 ` Segher Boessenkool
2021-09-01 20:22 ` Michael Meissner
2021-09-02 18:38 ` Segher Boessenkool
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f83b1e45fae0ddbabf3d0322362a23c92bd67481.camel@vnet.ibm.com \
--to=will_schmidt@vnet.ibm.com \
--cc=bergner@linux.ibm.com \
--cc=dje.gcc@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=meissner@linux.ibm.com \
--cc=segher@kernel.crashing.org \
--cc=wschmidt@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).