From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 1005) id B8829385700A; Thu, 15 Apr 2021 17:56:26 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org B8829385700A Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Michael Meissner To: gcc-cvs@gcc.gnu.org Subject: [gcc(refs/users/meissner/heads/work048)] Use XXSPLTI32DX to generate some constants. X-Act-Checkin: gcc X-Git-Author: Michael Meissner X-Git-Refname: refs/users/meissner/heads/work048 X-Git-Oldrev: a6fc6d45da05ff8c8e172245ec251cbd07385de4 X-Git-Newrev: e0dfd4a9c73fefb98aaf5102d025d8d7d3cde709 Message-Id: <20210415175626.B8829385700A@sourceware.org> Date: Thu, 15 Apr 2021 17:56:26 +0000 (GMT) X-BeenThere: gcc-cvs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-cvs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 15 Apr 2021 17:56:26 -0000 https://gcc.gnu.org/g:e0dfd4a9c73fefb98aaf5102d025d8d7d3cde709 commit e0dfd4a9c73fefb98aaf5102d025d8d7d3cde709 Author: Michael Meissner Date: Thu Apr 15 13:56:07 2021 -0400 Use XXSPLTI32DX to generate some constants. This patch generates a pair of XXSPLTI32DX instructions to load 64-bit scalar or 128-bit vector constants into the vector registers. I added a new constraint (eD) for constants that can be loaded with XXSPLTI32DX, but cannot be loaded with the XXSPLTIDP or XXSPLTIW instructions. I added a debug switch (-mxxsplti32dx) to control whether this behavior is on or off. For vector moves, I bumped up the size of expanding a vector constant from 5 instructions (20 bytes) to 6 instructions (24 bytes). This is to accomidate the size of two prefixed instructions. gcc/ 2021-04-15 Michael Meissner * config/rs6000/altivec.me (UNSPEC_XXSPLTI32DX): Move to vsx.md. (xxsplti32dx_v4si): Move to vsx.md. (xxsplti32dx_v4si_inst): Move to vsx.md. (xxsplti32dx_v4sf): Move to vsx.md. (xxsplti32dx_v4sf_inst): Move to vsx.md. * config/rs6000/contraints.md (eD): New constraint. * config/rs6000/predicates.md (easy_fp_constant): If we can load the constant with a pair of XXSPLTI32DX instructions, it is easy. (xxsplti32dx_operand): New predicate. (easy_vector_constant): If we can load the constant with a pair of XXSPLTI32DX instructions, it is easy. * config/rs6000/rs6000-cpus.def (OTHER_POWER10_MASKS): Add -mxxsplti32dx. (POWERPC_MASKS): Add -mxxsplti32dx. * config/rs6000/rs6000-protos.h (xxsplti32dx_constant_p): New declaration. * config/rs6000/rs6000.c (rs6000_option_override_internal): Add -mxxsplti32dx support. (xxsplti32dx_constant_p): New helper function. (output_vec_const_move): Split constants that need XXSPLTI32DX. (rs6000_opt_masks): Add -mxxsplti32dx. * config/rs6000/rs6000.md (movsf_hardfloat): Add support for loading constants with XXSPLTI32DX. (mov_hardfloat32, FMOVE64 iterator): Add support for loading constants with XXSPLTI32DX. (mov_hardfloat64, FMOVE64 iterator): Add support for loading constants with XXSPLTI32DX. * config/rs6000/rs6000.opt (-mxxsplti32dx): New switch. * config/rs6000/vsx.md (UNSPEC_XXSPLTI32DX): Move unspec here from altivec.md. (UNSPEC_XXSPLTI32DX_CONST): New unspec. (vsx_mov_64bit): Bump up size of 'W' vector constants to accomidate a pair of XXSPLTI32DX instructions. (vsx_mov_32bit): Bump up size of 'W' vector constants to accomidate a pair of XXSPLTI32DX instructions. (XXSPLTI32DX): New mode iterator. (xxsplti32dx_): New insn and splits. (xxsplti32dx__first): New insns. (xxsplti32dx__second): New insns. (xxsplti32dx_v4si): Move here from altivec.md. (xxsplti32dx_v4si_inst): Move here from altivec.md. (xxsplti32dx_v4sf): Move here from altivec.md. (xxsplti32dx_v4sf_inst): Move here from altivec.md. gcc/testsuite/ 2021-04-15 Michael Meissner * gcc.target/powerpc/vec-splati-runnable.c: Update insn count. * gcc.target/powerpc/vec-splat-constant-sf.c: Update insn count. * gcc.target/powerpc/vec-splat-constant-df.c: Update insn count. * gcc.target/powerpc/vec-splat-constant-v2df.c: Update insn count. Diff: --- gcc/config/rs6000/altivec.md | 60 --------- gcc/config/rs6000/constraints.md | 6 + gcc/config/rs6000/predicates.md | 22 ++++ gcc/config/rs6000/rs6000-cpus.def | 2 + gcc/config/rs6000/rs6000-protos.h | 1 + gcc/config/rs6000/rs6000.c | 133 ++++++++++++++++++- gcc/config/rs6000/rs6000.md | 44 +++++-- gcc/config/rs6000/rs6000.opt | 4 + gcc/config/rs6000/vsx.md | 142 ++++++++++++++++++++- .../gcc.target/powerpc/vec-splat-constant-df.c | 9 +- .../gcc.target/powerpc/vec-splat-constant-sf.c | 5 +- .../gcc.target/powerpc/vec-splat-constant-v2df.c | 10 +- .../gcc.target/powerpc/vec-splati-runnable.c | 2 +- 13 files changed, 354 insertions(+), 86 deletions(-) diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index ad6ead04cfa..9af71e036ab 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -176,7 +176,6 @@ UNSPEC_VSTRIL UNSPEC_SLDB UNSPEC_SRDB - UNSPEC_XXSPLTI32DX UNSPEC_XXBLEND UNSPEC_XXPERMX ]) @@ -818,65 +817,6 @@ "vsdbi %0,%1,%2,%3" [(set_attr "type" "vecsimple")]) -(define_expand "xxsplti32dx_v4si" - [(set (match_operand:V4SI 0 "register_operand" "=wa") - (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "0") - (match_operand:QI 2 "u1bit_cint_operand" "n") - (match_operand:SI 3 "s32bit_cint_operand" "n")] - UNSPEC_XXSPLTI32DX))] - "TARGET_POWER10" -{ - int index = INTVAL (operands[2]); - - if (!BYTES_BIG_ENDIAN) - index = 1 - index; - - emit_insn (gen_xxsplti32dx_v4si_inst (operands[0], operands[1], - GEN_INT (index), operands[3])); - DONE; -} - [(set_attr "type" "vecsimple")]) - -(define_insn "xxsplti32dx_v4si_inst" - [(set (match_operand:V4SI 0 "register_operand" "=wa") - (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "0") - (match_operand:QI 2 "u1bit_cint_operand" "n") - (match_operand:SI 3 "s32bit_cint_operand" "n")] - UNSPEC_XXSPLTI32DX))] - "TARGET_POWER10" - "xxsplti32dx %x0,%2,%3" - [(set_attr "type" "vecsimple") - (set_attr "prefixed" "yes")]) - -(define_expand "xxsplti32dx_v4sf" - [(set (match_operand:V4SF 0 "register_operand" "=wa") - (unspec:V4SF [(match_operand:V4SF 1 "register_operand" "0") - (match_operand:QI 2 "u1bit_cint_operand" "n") - (match_operand:SF 3 "const_double_operand" "n")] - UNSPEC_XXSPLTI32DX))] - "TARGET_POWER10" -{ - int index = INTVAL (operands[2]); - long value = rs6000_const_f32_to_i32 (operands[3]); - if (!BYTES_BIG_ENDIAN) - index = 1 - index; - - emit_insn (gen_xxsplti32dx_v4sf_inst (operands[0], operands[1], - GEN_INT (index), GEN_INT (value))); - DONE; -}) - -(define_insn "xxsplti32dx_v4sf_inst" - [(set (match_operand:V4SF 0 "register_operand" "=wa") - (unspec:V4SF [(match_operand:V4SF 1 "register_operand" "0") - (match_operand:QI 2 "u1bit_cint_operand" "n") - (match_operand:SI 3 "s32bit_cint_operand" "n")] - UNSPEC_XXSPLTI32DX))] - "TARGET_POWER10" - "xxsplti32dx %x0,%2,%3" - [(set_attr "type" "vecsimple") - (set_attr "prefixed" "yes")]) - (define_insn "xxblend_" [(set (match_operand:VM3 0 "register_operand" "=wa") (unspec:VM3 [(match_operand:VM3 1 "register_operand" "wa") diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md index e1fadd63580..d665e2a94db 100644 --- a/gcc/config/rs6000/constraints.md +++ b/gcc/config/rs6000/constraints.md @@ -208,6 +208,12 @@ (and (match_code "const_int") (match_test "((- (unsigned HOST_WIDE_INT) ival) + 0x8000) < 0x10000"))) +;; SF/DF/V2DF/DI/V2DI scalar or vector constant that can be loaded with a pair +;; of XXSPLTI32DX instructions. +(define_constraint "eD" + "A vector constant that can be loaded with XXSPLTI32DX instructions." + (match_operand 0 "xxsplti32dx_operand")) + ;; SF/DF/V2DF scalar or vector constant that can be loaded with XXSPLTIDP (define_constraint "eF" "A vector constant that can be loaded with the XXSPLTIDP instruction." diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md index 8c461ba2b76..01e5e09e0a6 100644 --- a/gcc/config/rs6000/predicates.md +++ b/gcc/config/rs6000/predicates.md @@ -606,6 +606,11 @@ if (xxspltidp_operand (op, mode)) return 1; + /* If we have the ISA 3.1 XXSPLTI32DX instruction, see if the constant can + be loaded with a pair of those instructions. */ + if (xxsplti32dx_operand (op, mode)) + return 1; + /* Otherwise consider floating point constants hard, so that the constant gets pushed to memory during the early RTL phases. This has the advantage that double precision constants that can be @@ -684,6 +689,20 @@ return xxspltidp_constant_p (op, mode, &value); }) +;; Return 1 if operand is a SF/DF CONST_DOUBLE or V2DF CONST_VECTOR that can be +;; loaded via a pair f ISA 3.1 XXSPLTI32DX instructions. Do not return true if +;; the value is 0.0 or it can be loaded with XXSPLTIDP, since that is easy to +;; generate without using XXSPLTI32DX. +(define_predicate "xxsplti32dx_operand" + (match_code "const_double,const_int,const_vector,vec_duplicate") +{ + if (op == CONST0_RTX (mode)) + return false; + + HOST_WIDE_INT value = 0; + return xxsplti32dx_constant_p (op, mode, &value); +}) + ;; Return 1 if the operand is a CONST_VECTOR and can be loaded into a ;; vector register without using memory. (define_predicate "easy_vector_constant" @@ -703,6 +722,9 @@ if (xxspltidp_operand (op, mode)) return true; + if (xxsplti32dx_operand (op, mode)) + return true; + if (TARGET_P9_VECTOR && xxspltib_constant_p (op, mode, &num_insns, &value)) return true; diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def index 3b657e490b1..b0821b34a69 100644 --- a/gcc/config/rs6000/rs6000-cpus.def +++ b/gcc/config/rs6000/rs6000-cpus.def @@ -86,6 +86,7 @@ | OPTION_MASK_P10_FUSION \ | OPTION_MASK_P10_FUSION_LD_CMPI \ | OPTION_MASK_P10_FUSION_2LOGICAL \ + | OPTION_MASK_XXSPLTI32DX \ | OPTION_MASK_XXSPLTIDP \ | OPTION_MASK_XXSPLTIW) @@ -163,6 +164,7 @@ | OPTION_MASK_SOFT_FLOAT \ | OPTION_MASK_STRICT_ALIGN_OPTIONAL \ | OPTION_MASK_VSX \ + | OPTION_MASK_XXSPLTI32DX \ | OPTION_MASK_XXSPLTIDP \ | OPTION_MASK_XXSPLTIW) #endif diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h index e87a51f42de..a13da44cc3c 100644 --- a/gcc/config/rs6000/rs6000-protos.h +++ b/gcc/config/rs6000/rs6000-protos.h @@ -33,6 +33,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, int, int, int, extern bool easy_altivec_constant (rtx, machine_mode); extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *); extern bool xxspltidp_constant_p (rtx, machine_mode, HOST_WIDE_INT *); +extern bool xxsplti32dx_constant_p (rtx, machine_mode, HOST_WIDE_INT *); extern int vspltis_shifted (rtx); extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int); extern bool macho_lo_sum_memory_operand (rtx, machine_mode); diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 024eaeecfca..f7f91db5f62 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -4481,6 +4481,9 @@ rs6000_option_override_internal (bool global_init_p) if (TARGET_POWER10 && TARGET_VSX) { + if ((rs6000_isa_flags_explicit & OPTION_MASK_XXSPLTI32DX) == 0) + rs6000_isa_flags |= OPTION_MASK_XXSPLTI32DX; + if ((rs6000_isa_flags_explicit & OPTION_MASK_XXSPLTIW) == 0) rs6000_isa_flags |= OPTION_MASK_XXSPLTIW; @@ -4488,7 +4491,9 @@ rs6000_option_override_internal (bool global_init_p) rs6000_isa_flags |= OPTION_MASK_XXSPLTIDP; } else - rs6000_isa_flags &= ~(OPTION_MASK_XXSPLTIW | OPTION_MASK_XXSPLTIDP); + rs6000_isa_flags &= ~(OPTION_MASK_XXSPLTIW + | OPTION_MASK_XXSPLTIDP + | OPTION_MASK_XXSPLTI32DX); if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET) rs6000_print_isa_options (stderr, 0, "after subtarget", rs6000_isa_flags); @@ -6549,6 +6554,128 @@ xxspltidp_constant_p (rtx op, return true; } +/* Return true if OP is of the given MODE and can be synthesized with ISA 3.1 + XXSPLTI32DX instruction. If the instruction can be synthesized with + XXSPLTIDP or is 0/-1, return false. + + Return the 64-bit constant to use in the two XXSPLTI32DX instructions via + CONSTANT_PTR. */ + +bool +xxsplti32dx_constant_p (rtx op, + machine_mode mode, + HOST_WIDE_INT *constant_ptr) +{ + *constant_ptr = 0; + + if (!TARGET_XXSPLTI32DX) + return false; + + if (mode == VOIDmode) + mode = GET_MODE (op); + + if (op == CONST0_RTX (mode)) + return false; + + rtx element = op; + if (mode == V2DFmode || mode == V2DImode) + { + /* Handle VEC_DUPLICATE and CONST_VECTOR. */ + if (GET_CODE (op) == VEC_DUPLICATE) + element = XEXP (op, 0); + + else if (GET_CODE (op) == CONST_VECTOR) + { + element = CONST_VECTOR_ELT (op, 0); + if (!rtx_equal_p (element, CONST_VECTOR_ELT (op, 1))) + return false; + } + + else + return false; + + mode = GET_MODE_INNER (mode); + } + + else if (mode == V4SImode || mode == V4SFmode) + { + /* For V4SI/V4SF, the XXSPLTI32DX instruction pair can represent vectors + where the two even elements are equal and the two odd elements are + equal. */ + if (GET_CODE (op) != CONST_VECTOR) + return false; + + rtx op0 = CONST_VECTOR_ELT (op, 0); + if (!rtx_equal_p (op0, CONST_VECTOR_ELT (op, 2))) + return false; + + rtx op1 = CONST_VECTOR_ELT (op, 1); + if (!rtx_equal_p (op1, CONST_VECTOR_ELT (op, 3))) + return false; + + if (rtx_equal_p (op0, op1)) + return false; + + long op0_value; + long op1_value; + if (mode == V4SImode) + { + op0_value = INTVAL (op0); + op1_value = INTVAL (op1); + } + else + { + op0_value = rs6000_const_f32_to_i32 (op0); + op1_value = rs6000_const_f32_to_i32 (op1); + } + + *constant_ptr = (op0_value << 32) | (op1_value & 0xffffffff); + return true; + } + + if (GET_MODE (element) != mode) + return false; + + /* Handle floating point constants. */ + if (mode == SFmode || mode == DFmode) + { + HOST_WIDE_INT xxspltidp_value = 0; + + if (!CONST_DOUBLE_P (element)) + return false; + + if (xxspltidp_constant_p (element, mode, &xxspltidp_value)) + return false; + + long high_low[2]; + const struct real_value *rv = CONST_DOUBLE_REAL_VALUE (element); + REAL_VALUE_TO_TARGET_DOUBLE (*rv, high_low); + + if (!BYTES_BIG_ENDIAN) + std::swap (high_low[0], high_low[1]); + + *constant_ptr = (high_low[0] << 32) | (high_low[1] & 0xffffffff); + return true; + } + + /* Handle integer constants. */ + else if (mode == DImode) + { + if (!CONST_INT_P (element)) + return false; + + HOST_WIDE_INT value = INTVAL (element); + if (value == -1) + return false; + + *constant_ptr = value; + return true; + } + + else + return false; +} + const char * output_vec_const_move (rtx *operands) { @@ -6597,6 +6724,9 @@ output_vec_const_move (rtx *operands) || xxspltidp_operand (vec, mode)) return "#"; + if (xxsplti32dx_operand (vec, mode)) + return "#"; + if (TARGET_P9_VECTOR && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value)) { @@ -24094,6 +24224,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] = { "string", 0, false, true }, { "update", OPTION_MASK_NO_UPDATE, true , true }, { "vsx", OPTION_MASK_VSX, false, true }, + { "xxsplti32dx", OPTION_MASK_XXSPLTI32DX, false, true }, { "xxspltiw", OPTION_MASK_XXSPLTIW, false, true }, { "xxspltidp", OPTION_MASK_XXSPLTIDP, false, true }, #ifdef OPTION_MASK_64BIT diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 3d4dc820bdd..9e7f507c440 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -7612,17 +7612,17 @@ ;; ;; LWZ LFS LXSSP LXSSPX STFS STXSSP ;; STXSSPX STW XXLXOR LI FMR XSCPSGNDP -;; MR MT MF NOP XXSPLTIDP +;; MR MT MF NOP XXSPLTIDP XXSPLTI32DX (define_insn "movsf_hardfloat" [(set (match_operand:SF 0 "nonimmediate_operand" "=!r, f, v, wa, m, wY, Z, m, wa, !r, f, wa, - !r, *c*l, !r, *h, wa") + !r, *c*l, !r, *h, wa, wa") (match_operand:SF 1 "input_operand" "m, m, wY, Z, f, v, wa, r, j, j, f, wa, - r, r, *h, 0, eF"))] + r, r, *h, 0, eF, eD"))] "(register_operand (operands[0], SFmode) || register_operand (operands[1], SFmode)) && TARGET_HARD_FLOAT @@ -7645,19 +7645,28 @@ mt%0 %1 mf%1 %0 nop + # #" [(set_attr "type" "load, fpload, fpload, fpload, fpstore, fpstore, fpstore, store, veclogical, integer, fpsimple, fpsimple, - *, mtjmpr, mfjmpr, *, vecperm") + *, mtjmpr, mfjmpr, *, vecperm, vecperm") (set_attr "isa" "*, *, p9v, p8v, *, p9v, p8v, *, *, *, *, *, - *, *, *, *, p10") + *, *, *, *, p10, p10") (set_attr "prefixed" "*, *, *, *, *, *, *, *, *, *, *, *, - *, *, *, *, yes")]) + *, *, *, *, yes, yes") + (set_attr "max_prefixed_insns" + "*, *, *, *, *, *, + *, *, *, *, *, *, + *, *, *, *, *, 2") + (set_attr "num_insns" + "*, *, *, *, *, *, + *, *, *, *, *, *, + *, *, *, *, *, 2")]) ;; LWZ LFIWZX STW STFIWX MTVSRWZ MFVSRWZ ;; FMR MR MT%0 MF%1 NOP @@ -7917,18 +7926,18 @@ ;; STFD LFD FMR LXSD STXSD ;; LXSD STXSD XXLOR XXLXOR GPR<-0 -;; LWZ STW MR XXSPLTIDP +;; LWZ STW MR XXSPLTIDP XXSPLTI32DX (define_insn "*mov_hardfloat32" [(set (match_operand:FMOVE64 0 "nonimmediate_operand" "=m, d, d, , wY, , Z, , , !r, - Y, r, !r, wa") + Y, r, !r, wa, wa") (match_operand:FMOVE64 1 "input_operand" "d, m, d, wY, , Z, , , , , - r, Y, r, eF"))] + r, Y, r, eF, eD"))] "! TARGET_POWERPC64 && TARGET_HARD_FLOAT && (gpc_reg_operand (operands[0], mode) || gpc_reg_operand (operands[1], mode))" @@ -7946,24 +7955,33 @@ # # # + # #" [(set_attr "type" "fpstore, fpload, fpsimple, fpload, fpstore, fpload, fpstore, veclogical, veclogical, two, - store, load, two, vecperm") + store, load, two, vecperm, vecperm") (set_attr "size" "64") (set_attr "length" "*, *, *, *, *, *, *, *, *, 8, - 8, 8, 8, *") + 8, 8, 8, *, *") (set_attr "isa" "*, *, *, p9v, p9v, p7v, p7v, *, *, *, - *, *, *, p10") + *, *, *, p10, p10") (set_attr "prefixed" "*, *, *, *, *, *, *, *, *, *, - *, *, *, yes")]) + *, *, *, yes, yes") + (set_attr "max_prefixed_insns" + "*, *, *, *, *, + *, *, *, *, *, + *, *, *, *, 2") + (set_attr "num_insns" + "*, *, *, *, *, + *, *, *, *, *, + *, *, *, *, 2")]) ;; STW LWZ MR G-const H-const F-const diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt index 6620cdb7716..bd269369ca0 100644 --- a/gcc/config/rs6000/rs6000.opt +++ b/gcc/config/rs6000/rs6000.opt @@ -627,3 +627,7 @@ Generate (do not generate) the XXSPLTIW instruction. mxxspltidp Target Undocumented Mask(XXSPLTIDP) Var(rs6000_isa_flags) Generate (do not generate) the XXSPLTIDP instruction. + +mxxsplti32dx +Target Undocumented Mask(XXSPLTI32DX) Var(rs6000_isa_flags) +Generate (do not generate) the XXSPLTI32DX instruction. diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index f2dfef88c90..16d0555a684 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -370,6 +370,8 @@ UNSPEC_VDIVES UNSPEC_VDIVEU UNSPEC_XXSPLTIDP + UNSPEC_XXSPLTI32DX + UNSPEC_XXSPLTI32DX_CONST ]) (define_int_iterator XVCVBF16 [UNSPEC_VSX_XVCVSPBF16 @@ -1193,7 +1195,7 @@ (set_attr "num_insns" "*, *, *, 2, *, 2, 2, 2, 2, 2, *, *, - *, 5, 2, *, *") + *, 6, 2, *, *") (set_attr "max_prefixed_insns" "*, *, *, *, *, 2, 2, 2, 2, 2, *, *, @@ -1201,7 +1203,7 @@ (set_attr "length" "*, *, *, 8, *, 8, 8, 8, 8, 8, *, *, - *, 20, 8, *, *") + *, 24, 8, *, *") (set_attr "isa" ", , , *, *, *, *, *, *, *, p9v, *, @@ -1233,7 +1235,7 @@ vecstore, vecload") (set_attr "length" "*, *, *, 16, 16, 16, - *, *, *, 20, 16, + *, *, *, 24, 16, *, *") (set_attr "isa" ", , , *, *, *, @@ -6326,3 +6328,137 @@ rs6000_emit_xxspltidp_v2df (operands[0], value); DONE; }) + +;; XXSPLTI32DX used to create 64-bit constants or 32-bit vector constants where +;; the even elements match and the odd elements match. +(define_mode_iterator XXSPLTI32DX [SF DF V4SF V4SI V2DF V2DI]) + +(define_insn_and_split "*xxsplti32dx_" + [(set (match_operand:XXSPLTI32DX 0 "vsx_register_operand" "=wa") + (match_operand:XXSPLTI32DX 1 "xxsplti32dx_operand"))] + "TARGET_XXSPLTI32DX" + "#" + "&& 1" + [(set (match_dup 0) + (unspec:XXSPLTI32DX [(match_dup 2) + (match_dup 3)] UNSPEC_XXSPLTI32DX_CONST)) + (set (match_dup 0) + (unspec:XXSPLTI32DX [(match_dup 0) + (match_dup 4) + (match_dup 5)] UNSPEC_XXSPLTI32DX_CONST))] +{ + HOST_WIDE_INT value = 0; + + if (!xxsplti32dx_constant_p (operands[1], mode, &value)) + gcc_unreachable (); + + HOST_WIDE_INT high = value >> 32; + HOST_WIDE_INT low = ((value & 0xffffffff) ^ 0x80000000) - 0x80000000; + + /* If the low bits are 0 or all 1s, initialize that word first. This way we + can use a smaller XXSPLTIB instruction instead the first XXSPLTI32DX. */ + if (low == 0 || low == -1) + { + operands[2] = const1_rtx; + operands[3] = GEN_INT (low); + operands[4] = const0_rtx; + operands[5] = GEN_INT (high); + } + else + { + operands[2] = const0_rtx; + operands[3] = GEN_INT (high); + operands[4] = const1_rtx; + operands[5] = GEN_INT (low); + } +} + [(set_attr "type" "vecperm") + (set_attr "prefixed" "yes") + (set_attr "num_insns" "2") + (set_attr "max_prefixed_insns" "2")]) + +;; First word of XXSPLTI32DX +(define_insn "*xxsplti32dx__first" + [(set (match_operand:XXSPLTI32DX 0 "vsx_register_operand" "=wa,wa,wa") + (unspec:XXSPLTI32DX [(match_operand 1 "u1bit_cint_operand" "n,n,n") + (match_operand 2 "const_int_operand" "O,wM,n")] + UNSPEC_XXSPLTI32DX_CONST))] + "TARGET_XXSPLTI32DX" + "@ + xxspltib %x0,0 + xxspltib %x0,255 + xxsplti32dx %x0,%1,%2" + [(set_attr "type" "vecperm") + (set_attr "prefixed" "*,*,yes")]) + +;; Second word of XXSPLTI32DX +(define_insn "*xxsplti32dx__second" + [(set (match_operand:XXSPLTI32DX 0 "vsx_register_operand" "=wa") + (unspec:XXSPLTI32DX [(match_operand:XXSPLTI32DX 1 "vsx_register_operand" "0") + (match_operand 2 "u1bit_cint_operand" "n") + (match_operand 3 "const_int_operand" "n")] + UNSPEC_XXSPLTI32DX_CONST))] + "TARGET_XXSPLTI32DX" + "xxsplti32dx %x0,%2,%3" + [(set_attr "type" "vecperm") + (set_attr "prefixed" "yes")]) + +;; XXSPLTI32DX built-in support. +(define_expand "xxsplti32dx_v4si" + [(set (match_operand:V4SI 0 "register_operand" "=wa") + (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "0") + (match_operand:QI 2 "u1bit_cint_operand" "n") + (match_operand:SI 3 "s32bit_cint_operand" "n")] + UNSPEC_XXSPLTI32DX))] + "TARGET_POWER10" +{ + int index = INTVAL (operands[2]); + + if (!BYTES_BIG_ENDIAN) + index = 1 - index; + + emit_insn (gen_xxsplti32dx_v4si_inst (operands[0], operands[1], + GEN_INT (index), operands[3])); + DONE; +} + [(set_attr "type" "vecsimple")]) + +(define_insn "xxsplti32dx_v4si_inst" + [(set (match_operand:V4SI 0 "register_operand" "=wa") + (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "0") + (match_operand:QI 2 "u1bit_cint_operand" "n") + (match_operand:SI 3 "s32bit_cint_operand" "n")] + UNSPEC_XXSPLTI32DX))] + "TARGET_POWER10" + "xxsplti32dx %x0,%2,%3" + [(set_attr "type" "vecsimple") + (set_attr "prefixed" "yes")]) + +(define_expand "xxsplti32dx_v4sf" + [(set (match_operand:V4SF 0 "register_operand" "=wa") + (unspec:V4SF [(match_operand:V4SF 1 "register_operand" "0") + (match_operand:QI 2 "u1bit_cint_operand" "n") + (match_operand:SF 3 "const_double_operand" "n")] + UNSPEC_XXSPLTI32DX))] + "TARGET_POWER10" +{ + int index = INTVAL (operands[2]); + long value = rs6000_const_f32_to_i32 (operands[3]); + if (!BYTES_BIG_ENDIAN) + index = 1 - index; + + emit_insn (gen_xxsplti32dx_v4sf_inst (operands[0], operands[1], + GEN_INT (index), GEN_INT (value))); + DONE; +}) + +(define_insn "xxsplti32dx_v4sf_inst" + [(set (match_operand:V4SF 0 "register_operand" "=wa") + (unspec:V4SF [(match_operand:V4SF 1 "register_operand" "0") + (match_operand:QI 2 "u1bit_cint_operand" "n") + (match_operand:SI 3 "s32bit_cint_operand" "n")] + UNSPEC_XXSPLTI32DX))] + "TARGET_POWER10" + "xxsplti32dx %x0,%2,%3" + [(set_attr "type" "vecsimple") + (set_attr "prefixed" "yes")]) diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c index 8f6e176f9af..1435ef4ef4f 100644 --- a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c @@ -48,13 +48,16 @@ scalar_double_m_inf (void) /* XXSPLTIDP. */ double scalar_double_pi (void) { - return M_PI; /* PLFD. */ + return M_PI; /* 2x XXSPLTI32DX. */ } double scalar_double_denorm (void) { - return 0x1p-149f; /* PLFD. */ + return 0x1p-149f; /* XXSPLTIB, XXSPLTI32DX. */ } -/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */ +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */ +/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */ +/* { dg-final { scan-assembler-not {\mplfd\M} } } */ +/* { dg-final { scan-assembler-not {\mplxsd\M} } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c index 72504bdfbbd..e9a45d5159d 100644 --- a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c @@ -57,4 +57,7 @@ scalar_float_denorm (void) return 0x1p-149f; /* PLFS. */ } -/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */ +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */ +/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 1 } } */ +/* { dg-final { scan-assembler-not {\mplfs\M} } } */ +/* { dg-final { scan-assembler-not {\mplxssp\M} } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c index d509459292c..d81198b163d 100644 --- a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c @@ -51,14 +51,16 @@ v2df_double_m_inf (void) vector double v2df_double_pi (void) { - return (vector double) { M_PI, M_PI }; /* PLFD. */ + return (vector double) { M_PI, M_PI }; /* 2x XXSPLTI32DX. */ } vector double v2df_double_denorm (void) { - return (vector double) { (double)0x1p-149f, - (double)0x1p-149f }; /* PLFD. */ + return (vector double) { (double)0x1p-149f, /* XXSPLTIB, */ + (double)0x1p-149f }; /* XXSPLTI32DX. */ } -/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */ +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */ +/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */ +/* { dg-final { scan-assembler-not {\mplxv\M} } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c index 06a8289d09b..f0eb982eadf 100644 --- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c +++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c @@ -162,4 +162,4 @@ main (int argc, char *argv []) /* { dg-final { scan-assembler-times {\mxxspltiw\M} 1 } } */ /* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */ -/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */ +/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 4 } } */