From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 1005) id 2DC973858402; Mon, 4 Oct 2021 22:22:21 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 2DC973858402 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Michael Meissner To: gcc-cvs@gcc.gnu.org Subject: [gcc(refs/users/meissner/heads/work070)] Generate XXSPLTI32DX on power10. X-Act-Checkin: gcc X-Git-Author: Michael Meissner X-Git-Refname: refs/users/meissner/heads/work070 X-Git-Oldrev: e336a9e635d7a0e621549da6a04264e6f259c5a3 X-Git-Newrev: 0a4de651122984294d43f1bdae9e8517f2eee78d Message-Id: <20211004222221.2DC973858402@sourceware.org> Date: Mon, 4 Oct 2021 22:22:21 +0000 (GMT) X-BeenThere: gcc-cvs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-cvs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 04 Oct 2021 22:22:21 -0000 https://gcc.gnu.org/g:0a4de651122984294d43f1bdae9e8517f2eee78d commit 0a4de651122984294d43f1bdae9e8517f2eee78d Author: Michael Meissner Date: Mon Oct 4 18:21:49 2021 -0400 Generate XXSPLTI32DX on power10. This patch generates XXSPLTI32DX for SF/DF floating point constants that cannot be generated with the XXSPLTIDP instruction. In addition, it adds support for using XXSPLTI32DX to load up V2DF constants, where both constants are the same. At the present time, XXSPLTI32DX is not enabled by default. 2021-10-04 Michael Meissner gcc/ * config/rs6000/constraint.md (eD): New constraint. * config/rs6000/predicates.md (easy_fp_constant): If the constant can be loaded with XXSPLTI32DX, it is easy. (easy_vector_constant_2insns): New predicate. (easy_vector_constant): If the constant can be loaded with XXSPLTI32DX, it is easy. * config/rs6000/rs6000-protos.h (xxsplti32dx_constant_immediate): New declaration. * config/rs6000/rs6000.c (xxsplti32dx_constant_immediate): New helper function. (output_vec_const_move): If the operand can be loaded with XXSPLTI32DX, split it. (rs6000_output_move_128bit): Likewise. (prefixed_xxsplti_p): Constants loaded with XXSPLTI32DX are prefixed. * config/rs6000/rs6000.md (movsf_hardfloat): Add support for constants loaded with XXSPLTI32DX. (mov_hardfloat32, FMOVE64 iterator): Likewise. (mov_hardfloat64, FMOVE64 iterator): Likewise. (movdi_internal32): Likewise. (movdi_internal64): Likewise. * config/rs6000/rs6000.opt (-mxxsplti32dx): New option. * config/rs6000/vsx.md (UNSPEC_XXSPLTI32DX_CONST): New unspec. (vsx_mov_64bit): Add support for constants loaded with XXSPLTI32DX. (vsx_mov_32bit): Likewise. (XXSPLTI32DX): New mode iterator. (splitter for XXSPLTI32DX): Add splitter for constants loaded with XXSPLTI32DX. (xxsplti32dx__first): New insns. (xxsplti32dx__second): New insns. * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the eD constraint. gcc/testsuite/ * gcc.target/powerpc/vec-splat-constant-df-2.c: New test. * gcc.target/powerpc/vec-splat-constant-di-2.c: New test. * gcc.target/powerpc/vec-splat-constant-v2df-2.c: New test. * gcc.target/powerpc/vec-splat-constant-v2di-2.c: New test. Diff: --- gcc/config/rs6000/constraints.md | 6 ++ gcc/config/rs6000/predicates.md | 62 +++++++++++ gcc/config/rs6000/rs6000-protos.h | 1 + gcc/config/rs6000/rs6000.c | 65 ++++++++++- gcc/config/rs6000/rs6000.md | 119 ++++++++++++++++----- gcc/config/rs6000/rs6000.opt | 5 + gcc/config/rs6000/vsx.md | 107 +++++++++++++++--- gcc/doc/md.texi | 3 + .../gcc.target/powerpc/vec-splat-constant-df-2.c | 24 +++++ .../gcc.target/powerpc/vec-splat-constant-di-2.c | 38 +++++++ .../gcc.target/powerpc/vec-splat-constant-v2df-2.c | 24 +++++ .../gcc.target/powerpc/vec-splat-constant-v2di-2.c | 29 +++++ 12 files changed, 439 insertions(+), 44 deletions(-) diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md index 46daeb0861c..f9d1d1ab446 100644 --- a/gcc/config/rs6000/constraints.md +++ b/gcc/config/rs6000/constraints.md @@ -213,6 +213,12 @@ "A 64-bit scalar constant that can be loaded with the XXSPLTIDP instruction." (match_operand 0 "easy_fp_constant_64bit_scalar")) +;; DImode, DFmode, V2DImode, V2DFmode constant that can be loaded with 2 +;; XXSPLTI32DX instruction. +(define_constraint "eD" + "A constant that can be loaded with a pair of XXSPLTI32DX instructions." + (match_operand 0 "easy_vector_constant_2insns")) + ;; 34-bit signed integer constant (define_constraint "eI" "A signed 34-bit integer constant if prefixed instructions are supported." diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md index 78e64a8a1d4..c8f0d62d75b 100644 --- a/gcc/config/rs6000/predicates.md +++ b/gcc/config/rs6000/predicates.md @@ -611,6 +611,11 @@ if (easy_fp_constant_ieee128 (op, mode)) return 1; + /* If we have the ISA 3.1 XXSPLTI32DX instruction, see if the constant can be + loaded with a pair of instructions. */ + if (easy_vector_constant_2insns (op, mode)) + return 1; + /* Otherwise consider floating point constants hard, so that the constant gets pushed to memory during the early RTL phases. This has the advantage that double precision constants that can be @@ -751,6 +756,60 @@ return easy_fp_constant_64bit_scalar (op, GET_MODE_INNER (mode)); }) +;; Return 1 if the operand is either a DImode/DFmode scalar constant or +;; V2DImode/V2DFmode vector constant that needs 2 XXSPLTI32DX instructions to +;; load the value + +(define_predicate "easy_vector_constant_2insns" + (match_code "const_vector,vec_duplicate,const_int,const_double") +{ + /* Can we do the XXSPLTI32DX instruction? */ + if (!TARGET_XXSPLTI32DX || !TARGET_PREFIXED || !TARGET_VSX) + return false; + + if (mode == VOIDmode) + mode = GET_MODE (op); + + /* Convert vector constant/duplicate into a scalar. */ + if (CONST_VECTOR_P (op)) + { + if (!CONST_VECTOR_DUPLICATE_P (op)) + return false; + + op = CONST_VECTOR_ELT (op, 0); + mode = GET_MODE_INNER (mode); + } + + else if (GET_CODE (op) == VEC_DUPLICATE) + { + op = XEXP (op, 0); + mode = GET_MODE_INNER (mode); + } + + if (GET_MODE_SIZE (mode) > 8) + return false; + + /* 0.0 or 0 is easy to generate. */ + if (op == CONST0_RTX (mode)) + return false; + + /* If we can load up the constant in other ways (either a single load + constant and a direct move or XXSPLTIDP), don't generate the + XXSPLTI32DX. */ + if (CONST_INT_P (op)) + return !(satisfies_constraint_I (op) + || satisfies_constraint_L (op) + || satisfies_constraint_eI (op) + || easy_fp_constant_64bit_scalar (op, mode)); + + /* For floating point, if we can use XXSPLTIDP, we don't want to + generate XXSPLTI32DX's. */ + else if (CONST_DOUBLE_P (op) && (mode == SFmode || mode == DFmode)) + return !easy_fp_constant_64bit_scalar (op, mode); + + return false; +}) + ;; Return 1 if the operand is a constant that can be loaded with the XXSPLTIW ;; instruction that loads up a 32-bit immediate and splats it into the vector. @@ -975,6 +1034,9 @@ if (easy_vector_constant_splat_word (op, mode)) return true; + if (easy_vector_constant_2insns (op, mode)) + return 1; + if (TARGET_P9_VECTOR && xxspltib_constant_p (op, mode, &num_insns, &value)) return true; diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h index 540c401e7ad..f517624cc56 100644 --- a/gcc/config/rs6000/rs6000-protos.h +++ b/gcc/config/rs6000/rs6000-protos.h @@ -32,6 +32,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, int, int, int, extern int easy_altivec_constant (rtx, machine_mode); extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *); +extern void xxsplti32dx_constant_immediate (rtx, machine_mode, long *, long *); extern long xxspltidp_constant_immediate (rtx, machine_mode); extern long xxspltiw_constant_immediate (rtx, machine_mode); extern int lxvkq_constant_immediate (rtx, machine_mode); diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 79123f4e834..f5ca8eb1703 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -6946,6 +6946,59 @@ xxspltib_constant_p (rtx op, return true; } +/* Return the two 32-bit constants to use in the two XXSPLTI32DX instructions + via HIGH_PTR and LOW_PTR. */ + +void +xxsplti32dx_constant_immediate (rtx op, + machine_mode mode, + long *high_ptr, + long *low_ptr) +{ + gcc_assert (easy_vector_constant_2insns (op, mode)); + + if (mode == VOIDmode) + mode = GET_MODE (op); + + if (CONST_VECTOR_P (op)) + { + op = CONST_VECTOR_ELT (op, 0); + mode = GET_MODE_INNER (mode); + } + + else if (GET_CODE (op) == VEC_DUPLICATE) + { + op = XEXP (op, 0); + mode = GET_MODE_INNER (mode); + } + + if (CONST_INT_P (op)) + { + HOST_WIDE_INT value = INTVAL (op); + *high_ptr = (value >> 32) & 0xffffffff; + *low_ptr = value & 0xffffffff; + return; + } + + else if (CONST_DOUBLE_P (op) && (mode == SFmode || mode == DFmode)) + { + long high_low[2]; + const struct real_value *rv = CONST_DOUBLE_REAL_VALUE (op); + REAL_VALUE_TO_TARGET_DOUBLE (*rv, high_low); + + /* The double precision value is laid out in memory order. We need to + undo this for XXSPLTI32DX. */ + if (!BYTES_BIG_ENDIAN) + std::swap (high_low[0], high_low[1]); + + *high_ptr = high_low[0] & 0xffffffff; + *low_ptr = high_low[1] & 0xffffffff; + return; + } + + gcc_unreachable (); +} + /* Return the immediate value used in the XXSPLTIDP instruction. */ long @@ -7229,6 +7282,9 @@ output_vec_const_move (rtx *operands) return "lxvkq %x0,%2"; } + if (easy_vector_constant_2insns (vec, mode)) + return "#"; + if (TARGET_P9_VECTOR && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value)) { @@ -14077,6 +14133,9 @@ rs6000_output_move_128bit (rtx operands[]) return "lxvkq %x0,%2"; } + else if (dest_vsx_p && easy_vector_constant_2insns (src, mode)) + return "#"; + else if (dest_regno >= 0 && (CONST_INT_P (src) || CONST_WIDE_INT_P (src) @@ -26991,11 +27050,13 @@ prefixed_xxsplti_p (rtx_insn *insn) case E_DImode: case E_DFmode: case E_SFmode: - return easy_fp_constant_64bit_scalar (src, mode); + return (easy_fp_constant_64bit_scalar (src, mode) + || easy_vector_constant_2insns (src, mode)); case E_V2DImode: case E_V2DFmode: - return easy_vector_constant_64bit_element (src, mode); + return (easy_vector_constant_64bit_element (src, mode) + || easy_vector_constant_2insns (src, mode)); case E_V16QImode: case E_V8HImode: diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 8afc4b2756d..5c120ef1672 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -7764,17 +7764,17 @@ ;; ;; LWZ LFS LXSSP LXSSPX STFS STXSSP ;; STXSSPX STW XXLXOR LI FMR XSCPSGNDP -;; MR MT MF NOP XXSPLTIDP +;; MR MT MF NOP XXSPLTIDP XXSPLTI32DX (define_insn "movsf_hardfloat" [(set (match_operand:SF 0 "nonimmediate_operand" "=!r, f, v, wa, m, wY, Z, m, wa, !r, f, wa, - !r, *c*l, !r, *h, wa") + !r, *c*l, !r, *h, wa, wa") (match_operand:SF 1 "input_operand" "m, m, wY, Z, f, v, wa, r, j, j, f, wa, - r, r, *h, 0, eF"))] + r, r, *h, 0, eF, eD"))] "(register_operand (operands[0], SFmode) || register_operand (operands[1], SFmode)) && TARGET_HARD_FLOAT @@ -7797,15 +7797,24 @@ mt%0 %1 mf%1 %0 nop + # #" [(set_attr "type" "load, fpload, fpload, fpload, fpstore, fpstore, fpstore, store, veclogical, integer, fpsimple, fpsimple, - *, mtjmpr, mfjmpr, *, vecperm") + *, mtjmpr, mfjmpr, *, vecperm, vecperm") (set_attr "isa" "*, *, p9v, p8v, *, p9v, p8v, *, *, *, *, *, - *, *, *, *, p10")]) + *, *, *, *, p10, p10") + (set_attr "max_prefixed_insns" + "*, *, *, *, *, *, + *, *, *, *, *, *, + *, *, *, *, *, 2") + (set_attr "num_insns" + "*, *, *, *, *, *, + *, *, *, *, *, *, + *, *, *, *, *, 2")]) ;; LWZ LFIWZX STW STFIWX MTVSRWZ MFVSRWZ ;; FMR MR MT%0 MF%1 NOP @@ -8065,18 +8074,18 @@ ;; STFD LFD FMR LXSD STXSD ;; LXSD STXSD XXLOR XXLXOR GPR<-0 -;; LWZ STW MR XXSPLTIDP +;; LWZ STW MR XXSPLTIDP XXSPLTI32DX (define_insn "*mov_hardfloat32" [(set (match_operand:FMOVE64 0 "nonimmediate_operand" "=m, d, d, , wY, , Z, , , !r, - Y, r, !r, wa") + Y, r, !r, wa, wa") (match_operand:FMOVE64 1 "input_operand" "d, m, d, wY, , Z, , , , , - r, Y, r, eF"))] + r, Y, r, eF, eD"))] "! TARGET_POWERPC64 && TARGET_HARD_FLOAT && (gpc_reg_operand (operands[0], mode) || gpc_reg_operand (operands[1], mode))" @@ -8094,20 +8103,29 @@ # # # + # #" [(set_attr "type" "fpstore, fpload, fpsimple, fpload, fpstore, fpload, fpstore, veclogical, veclogical, two, - store, load, two, vecperm") + store, load, two, vecperm, vecperm") (set_attr "size" "64") (set_attr "length" "*, *, *, *, *, *, *, *, *, 8, - 8, 8, 8, *") + 8, 8, 8, *, *") + (set_attr "num_insns" + "*, *, *, *, *, + *, *, *, *, *, + *, *, *, *, 2") + (set_attr "max_prefixed_insns" + "*, *, *, *, *, + *, *, *, *, *, + *, *, *, *, 2") (set_attr "isa" "*, *, *, p9v, p9v, p7v, p7v, *, *, *, - *, *, *, p10")]) + *, *, *, p10, p10")]) ;; STW LWZ MR G-const H-const F-const @@ -8134,19 +8152,19 @@ ;; STFD LFD FMR LXSD STXSD ;; LXSDX STXSDX XXLOR XXLXOR LI 0 ;; STD LD MR MT{CTR,LR} MF{CTR,LR} -;; NOP MFVSRD MTVSRD XXSPLTIDP +;; NOP MFVSRD MTVSRD XXSPLTIDP XXSPLTI32DX (define_insn "*mov_hardfloat64" [(set (match_operand:FMOVE64 0 "nonimmediate_operand" "=m, d, d, , wY, , Z, , , !r, YZ, r, !r, *c*l, !r, - *h, r, , wa") + *h, r, , wa, wa") (match_operand:FMOVE64 1 "input_operand" "d, m, d, wY, , Z, , , , , r, YZ, r, r, *h, - 0, , r, eF"))] + 0, , r, eF, eD"))] "TARGET_POWERPC64 && TARGET_HARD_FLOAT && (gpc_reg_operand (operands[0], mode) || gpc_reg_operand (operands[1], mode))" @@ -8169,18 +8187,29 @@ nop mfvsrd %0,%x1 mtvsrd %x0,%1 + # #" [(set_attr "type" "fpstore, fpload, fpsimple, fpload, fpstore, fpload, fpstore, veclogical, veclogical, integer, store, load, *, mtjmpr, mfjmpr, - *, mfvsr, mtvsr, vecperm") + *, mfvsr, mtvsr, vecperm, vecperm") (set_attr "size" "64") (set_attr "isa" "*, *, *, p9v, p9v, p7v, p7v, *, *, *, *, *, *, *, *, - *, p8v, p8v, p10")]) + *, p8v, p8v, p10, p10") + (set_attr "num_insns" + "*, *, *, *, *, + *, *, *, *, *, + *, *, *, *, *, + *, *, *, *, 2") + (set_attr "max_prefixed_insns" + "*, *, *, *, *, + *, *, *, *, *, + *, *, *, *, *, + *, *, *, *, 2")]) ;; STD LD MR MT MF G-const ;; H-const F-const Special @@ -9228,7 +9257,7 @@ ;; a gpr into a fpr instead of reloading an invalid 'Y' address ;; GPR store GPR load GPR move FPR store FPR load FPR move -;; XXSPLTIDP +;; XXSPLTIDP XXSPLTI32DX ;; GPR const AVX store AVX store AVX load AVX load VSX move ;; P9 0 P9 -1 AVX 0/-1 VSX 0 VSX -1 P9 const ;; AVX const @@ -9236,13 +9265,13 @@ (define_insn "*movdi_internal32" [(set (match_operand:DI 0 "nonimmediate_operand" "=Y, r, r, m, ^d, ^d, - ^wa, + ^wa, ^wa, r, wY, Z, ^v, $v, ^wa, wa, wa, v, wa, *i, v, v") (match_operand:DI 1 "input_operand" "r, Y, r, ^d, m, ^d, - eF, + eF, eD, IJKnF, ^v, $v, wY, Z, ^wa, Oj, wM, OjwM, Oj, wM, wS, wB"))] @@ -9258,6 +9287,7 @@ fmr %0,%1 # # + # stxsd %1,%0 stxsdx %x1,%y0 lxsd %0,%1 @@ -9272,20 +9302,32 @@ #" [(set_attr "type" "store, load, *, fpstore, fpload, fpsimple, - vecperm, + vecperm, vecperm, *, fpstore, fpstore, fpload, fpload, veclogical, vecsimple, vecsimple, vecsimple, veclogical,veclogical,vecsimple, vecsimple") (set_attr "size" "64") (set_attr "length" "8, 8, 8, *, *, *, - *, + *, *, 16, *, *, *, *, *, *, *, *, *, *, 8, *") + (set_attr "num_insns" + "*, *, *, *, *, *, + *, *, + *, *, *, *, *, *, + *, *, *, *, *, *, + *") + (set_attr "max_prefixed_insns" + "*, *, *, *, *, *, + *, *, + *, *, *, *, *, *, + *, *, *, *, *, *, + *") (set_attr "isa" "*, *, *, *, *, *, - p10, + p10, p10, *, p9v, p7v, p9v, p7v, *, p9v, p9v, p7v, *, *, p7v, p7v")]) @@ -9321,7 +9363,7 @@ }) ;; GPR store GPR load GPR move -;; XXSPLTIDP +;; XXSPLTIDP XXSPLTI32DX ;; GPR li GPR lis GPR pli GPR # ;; FPR store FPR load FPR move ;; AVX store AVX store AVX load AVX load VSX move @@ -9332,7 +9374,7 @@ (define_insn "*movdi_internal64" [(set (match_operand:DI 0 "nonimmediate_operand" "=YZ, r, r, - ^wa, + ^wa, ^wa, r, r, r, r, m, ^d, ^d, wY, Z, $v, $v, ^wa, @@ -9342,7 +9384,7 @@ ?r, ?wa") (match_operand:DI 1 "input_operand" "r, YZ, r, - eF, + eF, eD, I, L, eI, nF, ^d, m, ^d, ^v, $v, wY, Z, ^wa, @@ -9358,6 +9400,7 @@ ld%U1%X1 %0,%1 mr %0,%1 # + # li %0,%1 lis %0,%v1 li %0,%1 @@ -9384,7 +9427,7 @@ mtvsrd %x0,%1" [(set_attr "type" "store, load, *, - vecperm, + vecperm, vecperm, *, *, *, *, fpstore, fpload, fpsimple, fpstore, fpstore, fpload, fpload, veclogical, @@ -9395,7 +9438,7 @@ (set_attr "size" "64") (set_attr "length" "*, *, *, - *, + *, *, *, *, *, 20, *, *, *, *, *, *, *, *, @@ -9403,9 +9446,29 @@ 8, *, *, *, *, *, *") + (set_attr "num_insns" + "*, *, *, + *, 2, + *, *, *, *, + *, *, *, + *, *, *, *, *, + *, *, *, *, *, + 8, *, + *, *, *, + *, *") + (set_attr "max_prefixed_insns" + "*, *, *, + *, 2, + *, *, *, *, + *, *, *, + *, *, *, *, *, + *, *, *, *, *, + 8, *, + *, *, *, + *, *") (set_attr "isa" "*, *, *, - p10, + p10, p10, *, *, p10, *, *, *, *, p9v, p7v, p9v, p7v, *, diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt index a53aad72547..898bc4e9e6e 100644 --- a/gcc/config/rs6000/rs6000.opt +++ b/gcc/config/rs6000/rs6000.opt @@ -640,6 +640,11 @@ mprivileged Target Var(rs6000_privileged) Init(0) Generate code that will run in privileged state. +;; Do not enable at this time. +mxxsplti32dx +Target Undocumented Var(TARGET_XXSPLTI32DX) Init(0) Save +Generate (do not generate) XXSPLTI32DX instructions. + mxxspltidp Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save Generate (do not generate) XXSPLTIDP instructions. diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 712e5df0c02..cc21c454491 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -376,6 +376,7 @@ UNSPEC_XXSPLTIW UNSPEC_XXSPLTIDP UNSPEC_XXSPLTI32DX + UNSPEC_XXSPLTI32DX_CONST UNSPEC_XXBLEND UNSPEC_XXPERMX ]) @@ -1191,19 +1192,19 @@ ;; instruction). But generate XXLXOR/XXLORC if it will avoid a register move. ;; VSX store VSX load VSX move VSX->GPR GPR->VSX LQ (GPR) -;; XXSPLTIDP XXSPLTIW LXVKQ +;; XXSPLTIDP XXSPLTIW LXVKQ XXSPLTI32DX ;; STQ (GPR) GPR load GPR store GPR move XXSPLTIB VSPLTISW ;; VSX 0/-1 VMX const GPR const LVX (VMX) STVX (VMX) (define_insn "vsx_mov_64bit" [(set (match_operand:VSX_M 0 "nonimmediate_operand" "=ZwO, wa, wa, r, we, ?wQ, - wa, wa, wa, + wa, wa, wa, wa, ?&r, ??r, ??Y, , wa, v, ?wa, v, , wZ, v") (match_operand:VSX_M 1 "input_operand" "wa, ZwO, wa, we, r, r, - eV, eW, eQ, + eV, eW, eQ, eD, wQ, Y, r, r, wE, jwM, ?jwM, W, , v, wZ"))] @@ -1215,44 +1216,44 @@ } [(set_attr "type" "vecstore, vecload, vecsimple, mtvsr, mfvsr, load, - vecperm, vecperm, vecperm, + vecperm, vecperm, vecperm, vecperm, store, load, store, *, vecsimple, vecsimple, vecsimple, *, *, vecstore, vecload") (set_attr "num_insns" "*, *, *, 2, *, 2, - *, *, *, + *, *, *, 2, 2, 2, 2, 2, *, *, *, 5, 2, *, *") (set_attr "max_prefixed_insns" "*, *, *, *, *, 2, - *, *, *, + *, *, *, 2, 2, 2, 2, 2, *, *, *, *, *, *, *") (set_attr "length" "*, *, *, 8, *, 8, - *, *, *, + *, *, *, *, 8, 8, 8, 8, *, *, *, 20, 8, *, *") (set_attr "isa" ", , , *, *, *, - p10, p10, p10, + p10, p10, p10, p10, *, *, *, *, p9v, *, , *, *, *, *")]) ;; VSX store VSX load VSX move GPR load GPR store GPR move -;; XXSPLTIDP XXSPLTIW LXVKQ +;; XXSPLTIDP XXSPLTIW LXVKQ XXSPLTI32DX ;; XXSPLTIB VSPLTISW VSX 0/-1 VMX const GPR const ;; LVX (VMX) STVX (VMX) (define_insn "*vsx_mov_32bit" [(set (match_operand:VSX_M 0 "nonimmediate_operand" "=ZwO, wa, wa, ??r, ??Y, , - wa, wa, wa, + wa, wa, wa, wa, wa, v, ?wa, v, , wZ, v") (match_operand:VSX_M 1 "input_operand" "wa, ZwO, wa, Y, r, r, - eV, eW, eQ, + eV, eW, eQ, eD, wE, jwM, ?jwM, W, , v, wZ"))] @@ -1264,17 +1265,27 @@ } [(set_attr "type" "vecstore, vecload, vecsimple, load, store, *, - vecperm, vecperm, vecperm, + vecperm, vecperm, vecperm, vecperm, vecsimple, vecsimple, vecsimple, *, *, vecstore, vecload") (set_attr "length" "*, *, *, 16, 16, 16, - *, *, *, + *, *, *, *, *, *, *, 20, 16, *, *") + (set_attr "num_insns" + "*, *, *, *, *, *, + *, *, *, 2, + *, *, *, *, *, + *, *") + (set_attr "length" + "*, *, *, *, *, *, + *, *, *, 2, + *, *, *, *, *, + *, *") (set_attr "isa" ", , , *, *, *, - p10, p10, p10, + p10, p10, p10, p10, p9v, *, , *, *, *, *")]) @@ -6570,6 +6581,74 @@ [(set_attr "type" "vecperm") (set_attr "prefixed" "yes")]) +;; XXSPLTI32DX used to create 64-bit constants or vector constants where the +;; even elements match and the odd elements match. +(define_mode_iterator XXSPLTI32DX [DI SF DF V2DF V2DI]) + +;; Don't split DImode before register allocation, so that it has a better +;; chance of winding up in a GPR register. +(define_split + [(set (match_operand:XXSPLTI32DX 0 "vsx_register_operand") + (match_operand:XXSPLTI32DX 1 "easy_vector_constant_2insns"))] + "TARGET_POWER10 && (reload_completed || mode != DImode)" + [(set (match_dup 0) + (unspec:XXSPLTI32DX [(match_dup 2) + (match_dup 3)] UNSPEC_XXSPLTI32DX_CONST)) + (set (match_dup 0) + (unspec:XXSPLTI32DX [(match_dup 0) + (match_dup 4) + (match_dup 5)] UNSPEC_XXSPLTI32DX_CONST))] +{ + long high = 0, low = 0; + + xxsplti32dx_constant_immediate (operands[1], mode, &high, &low); + + /* If the low bits are 0 or all 1s, initialize that word first. This way we + can use a smaller XXSPLTIB/XXLXOR/XXLORC instruction instead the first + XXSPLTI32DX. */ + if (low == 0 || low == -1) + { + operands[2] = const1_rtx; + operands[3] = GEN_INT (low); + operands[4] = const0_rtx; + operands[5] = GEN_INT (high); + } + else + { + operands[2] = const0_rtx; + operands[3] = GEN_INT (high); + operands[4] = const1_rtx; + operands[5] = GEN_INT (low); + } +}) + +;; First word of XXSPLTI32DX +(define_insn "*xxsplti32dx__first" + [(set (match_operand:XXSPLTI32DX 0 "vsx_register_operand" "=wa,wa,wa") + (unspec:XXSPLTI32DX [(match_operand 1 "u1bit_cint_operand" "n,n,n") + (match_operand 2 "const_int_operand" "O,wM,n")] + UNSPEC_XXSPLTI32DX_CONST))] + "TARGET_XXSPLTI32DX" + "@ + xxlxor %x0,%x0,%x0 + xxlorc %x0,%x0,%x0 + xxsplti32dx %x0,%1,%2" + [(set_attr "type" "veclogical,veclogical,vecperm") + (set_attr "prefixed" "*,*,yes")]) + +;; Second word of XXSPLTI32DX +(define_insn "*xxsplti32dx__second" + [(set (match_operand:XXSPLTI32DX 0 "vsx_register_operand" "=wa") + (unspec:XXSPLTI32DX [(match_operand:XXSPLTI32DX 1 "vsx_register_operand" "0") + (match_operand 2 "u1bit_cint_operand" "n") + (match_operand 3 "const_int_operand" "n")] + UNSPEC_XXSPLTI32DX_CONST))] + "TARGET_XXSPLTI32DX" + "xxsplti32dx %x0,%2,%3" + [(set_attr "type" "vecperm") + (set_attr "prefixed" "yes")]) + + ;; XXBLEND built-in function support (define_insn "xxblend_" [(set (match_operand:VM3 0 "register_operand" "=wa") diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 4ad0e745c94..feaa205291a 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -3333,6 +3333,9 @@ The integer constant zero. A constant whose negation is a signed 16-bit constant. @end ifset +@item eD +A constant that can be loaded with a pair of XXSPLTI32DX instructions. + @item eF A 64-bit scalar constant that can be loaded with the XXSPLTIDP instruction. diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df-2.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df-2.c new file mode 100644 index 00000000000..34ec3caa594 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df-2.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxsplti32dx" } */ + +#define M_PI 3.14159265358979323846 +#define SUBNORMAL 0x1p-149f + +/* Test generation of floating point constants with XXSPLTI32DX. */ + +double +df_double_pi (void) +{ + return M_PI; /* 2x XXSPLTI32DX. */ +} + +/* This float subnormal cannot be loaded with XXSPLTIDP. */ + +double +v2df_double_denorm (void) +{ + return SUBNORMAL; /* XXLXOR, XXSPLTI32DX. */ +} + +/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di-2.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di-2.c new file mode 100644 index 00000000000..41b1d703fe7 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di-2.c @@ -0,0 +1,38 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxsplti32dx" } */ + +/* Test generation of integer constants loaded into the vector registers with + the ISA 3.1 (power10) instruction XXSPLTI32DX. We use asm to force the + value into vector registers. */ + +#define LARGE_BITS 0x12345678ABCDEF01LL +#define SUBNORMAL 0x8000000000000001LL + +/* 0x8000000000000001LL is the bit pattern for a negative subnormal value can + be generated with XXSPLTI32DX but not XXSLTIDP. */ +double +scalar_float_subnormal (void) +{ + /* 2x XXSPLTI32DX. */ + double d; + long long ll = SUBNORMAL; + + __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll)); + return d; +} + +/* 0x12345678ABCDEF01LL is a large constant that can be loaded with 2x + XXSPLTI32DX instructions. */ +double +scalar_large_constant (void) +{ + /* 2x XXSPLTI32DX. */ + double d; + long long ll = LARGE_BITS; + + __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll)); + return d; +} + +/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 4 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df-2.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df-2.c new file mode 100644 index 00000000000..3f7b0a00655 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df-2.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxsplti32dx" } */ + +#define M_PI 3.14159265358979323846 +#define SUBNORMAL 0x1p-149f + +/* Test generation of floating point constants with XXSPLTI32DX. */ + +vector double +v2df_double_pi (void) +{ + /* 2x XXSPLTI32DX. */ + return (vector double) { M_PI, M_PI }; +} + +vector double +v2df_double_denorm (void) +{ + /* XXLXOR, XXSPLTI32DX. */ + return (vector double) { SUBNORMAL, SUBNORMAL }; +} + +/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di-2.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di-2.c new file mode 100644 index 00000000000..90027378012 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di-2.c @@ -0,0 +1,29 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2 -mxxsplti32dx" } */ + +/* Test generation of integer constants loaded into the vector registers with + the ISA 3.1 (power10) instruction XXSPLTI32DX. */ + +#define LARGE_BITS 0x12345678ABCDEF01LL +#define SUBNORMAL 0x8000000000000001LL + +/* 0x8000000000000001LL is the bit pattern for a negative subnormal value can + be generated with XXSPLTI32DX but not XXSLTIDP. */ +vector long long +vector_float_subnormal (void) +{ + /* 2x XXSPLTI32DX. */ + return (vector long long) { SUBNORMAL, SUBNORMAL }; +} + +/* 0x12345678ABCDEF01LL is a large constant that can be loaded with 2x + XXSPLTI32DX instructions. */ +vector long long +vector_large_constant (void) +{ + /* 2x XXSPLTI32DX. */ + return (vector long long) { LARGE_BITS, LARGE_BITS }; +} + +/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 4 } } */