From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 1005) id E9F273858C60; Tue, 12 Oct 2021 22:05:15 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org E9F273858C60 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Michael Meissner To: gcc-cvs@gcc.gnu.org Subject: [gcc(refs/users/meissner/heads/work071)] Generate XXSPLTIDP on power10. X-Act-Checkin: gcc X-Git-Author: Michael Meissner X-Git-Refname: refs/users/meissner/heads/work071 X-Git-Oldrev: 10668daf52a94a72af04466caed7a8553ad66433 X-Git-Newrev: cae098b61f80cdaa5ca91981205277021ab70871 Message-Id: <20211012220515.E9F273858C60@sourceware.org> Date: Tue, 12 Oct 2021 22:05:15 +0000 (GMT) X-BeenThere: gcc-cvs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-cvs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Oct 2021 22:05:16 -0000 https://gcc.gnu.org/g:cae098b61f80cdaa5ca91981205277021ab70871 commit cae098b61f80cdaa5ca91981205277021ab70871 Author: Michael Meissner Date: Tue Oct 12 18:00:01 2021 -0400 Generate XXSPLTIDP on power10. This patch implements XXSPLTIDP support for SF, DF, and DI scalar constants and V2DF and V2DI vector constants. The XXSPLTIDP instruction is given a 32-bit immediate that is converted to a vector of two DFmode constants. The immediate is in SFmode format, so only constants that fit as SFmode values can be loaded with XXSPLTIDP. I added a new constraint (eD) to match scalar and vector constants that can be loaded with the XXSPLTIDP instruction. I have added a temporary switch (-mxxspltidp) to control whether or not the XXSPLTIDP instruction is generated. I added 5 new tests to test loading up SF/DF/DI scalar and V2DI/V2DF vector constants. This patch updates the previous patch to take into account the comments from the patch review. The main change is that this patch does is to look at vector constants in general to see if the bits of the vector map into values that can be loaded with XXSPLTIDP. 2021-10-12 Michael Meissner gcc/ * config/rs6000/constraints.md (eD): New constraint. * config/rs6000/predicates.md (easy_fp_constant): Add support for generating XXSPLTIDP. (easy_vector_constant_64bit_element): New predicate. (easy_vector_constant): Add support for generating XXSPLTIDP. * config/rs6000/rs6000-protos.h (xxspltidp_constant_immediate): New declaration. (convert_vector_constant_to_bytes): Likewise. (convert_scalar_64bit_constant_to_bytes): Likewise. (prefixed_xxsplti_p): Likewise. * config/rs6000/rs6000.c (convert_vector_constant_to_bytes): New helper function. (convert_scalar_64bit_constant_to_bytes): Likewise. (xxspltidp_constant_immediate): Likewise. (output_vec_const_move): Add support for XXSPLTIDP. (prefixed_xxsplti_p): New function. * config/rs6000/rs6000.md (prefixed attribute): Add support for insns that generate XXSPLTIDP. (movsf_hardfloat): Add support for XXSPLTIDP. (mov_hardfloat32, FMOVE64 iterator): Likewise. (mov_hardfloat64, FMOVE64 iterator): Likewise. (movdi_internal32): Likewise. (movdi_internal64): Likewise. * config/rs6000/rs6000.opt (-mxxspltidp): New debug option. * config/rs6000/vsx.md (XXSPLTIDP): New mode iterator. (xxspltidp__internal): New insn. (XXSPLTIDP splitters): New splitters for XXSPLTIDP. * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the eD constraint. gcc/testsuite/ * gcc.target/powerpc/pr86731-fwrapv-longlong.c: Update insn regex for power10. * gcc.target/powerpc/vec-splat-constant-df.c: New test. * gcc.target/powerpc/vec-splat-constant-di.c: New test. * gcc.target/powerpc/vec-splat-constant-sf.c: New test. * gcc.target/powerpc/vec-splat-constant-v2df.c: New test. * gcc.target/powerpc/vec-splat-constant-v2di.c: New test. Diff: --- gcc/config/rs6000/constraints.md | 5 + gcc/config/rs6000/predicates.md | 112 +++++++++ gcc/config/rs6000/rs6000-protos.h | 6 + gcc/config/rs6000/rs6000.c | 254 +++++++++++++++++++++ gcc/config/rs6000/rs6000.md | 58 +++-- gcc/config/rs6000/rs6000.opt | 4 + gcc/config/rs6000/vsx.md | 26 +++ gcc/doc/md.texi | 3 + .../gcc.target/powerpc/pr86731-fwrapv-longlong.c | 9 +- .../gcc.target/powerpc/vec-splat-constant-df.c | 60 +++++ .../gcc.target/powerpc/vec-splat-constant-di.c | 70 ++++++ .../gcc.target/powerpc/vec-splat-constant-sf.c | 60 +++++ .../gcc.target/powerpc/vec-splat-constant-v2df.c | 64 ++++++ .../gcc.target/powerpc/vec-splat-constant-v2di.c | 50 ++++ 14 files changed, 759 insertions(+), 22 deletions(-) diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md index c8cff1a3038..d26c8940104 100644 --- a/gcc/config/rs6000/constraints.md +++ b/gcc/config/rs6000/constraints.md @@ -208,6 +208,11 @@ (and (match_code "const_int") (match_test "((- (unsigned HOST_WIDE_INT) ival) + 0x8000) < 0x10000"))) +;; A scalar or vector constant that can be loaded with the XXSPLTIDP instruction. +(define_constraint "eD" + "A constant that can be loaded with the XXSPLTIDP instruction." + (match_operand 0 "easy_vector_constant_64bit_element")) + ;; 34-bit signed integer constant (define_constraint "eI" "A signed 34-bit integer constant if prefixed instructions are supported." diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md index 956e42bc514..20da6faf6e7 100644 --- a/gcc/config/rs6000/predicates.md +++ b/gcc/config/rs6000/predicates.md @@ -601,6 +601,10 @@ if (TARGET_VSX && op == CONST0_RTX (mode)) return 1; + /* See if the constant can be generated with the XXSPLTIDP instruction. */ + if (easy_vector_constant_64bit_element (op, mode)) + return 1; + /* Otherwise consider floating point constants hard, so that the constant gets pushed to memory during the early RTL phases. This has the advantage that double precision constants that can be @@ -609,6 +613,111 @@ return 0; }) +;; Return 1 if the operand is a 64-bit vector constant that can be loaded via +;; the XXSPLTIDP instruction, which takes a SFmode value and produces a +;; V2DFmode or V2DI result. + +(define_predicate "easy_vector_constant_64bit_element" + (match_code "const_vector,vec_duplicate,const_int,const_double") +{ + unsigned char vector_bytes[16]; + + /* Can we do the XXSPLTIDP instruction? */ + if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX) + return false; + + /* We use DImode for integer constants and DFmode for floating point + constants, since SFmode scalars are stored as DFmode in the PowerPC. */ + if (CONST_INT_P (op) || CONST_DOUBLE_P (op)) + { + if (!convert_scalar_64bit_constant_to_bytes (op, vector_bytes, + sizeof (vector_bytes))) + return false; + + /* Change mode to value in the vector register. */ + mode = (CONST_INT_P (op)) ? E_DImode : E_DFmode; + } + + /* For vector constants, get the whole vector. */ + else if (!convert_vector_constant_to_bytes (op, mode, vector_bytes, + sizeof (vector_bytes))) + return false; + + /* The vector_bytes array has the 8 bytes of the upper and lower constants in + big endian order. Convert these into 64-bit constants. */ + unsigned HOST_WIDE_INT df_upper = 0, df_lower = 0; + for (int i = 0; i < 8; i++) + { + df_upper = (df_upper << 8) | vector_bytes[i]; + df_lower = (df_lower << 8) | vector_bytes[i+8]; + } + + /* Make sure that the two 64-bit segments are the same. */ + if (df_upper != df_lower) + return false; + + /* Avoid values that are easy to create with other instructions (0.0 for + floating point, and values that can be loaded with XXSPLTIB and sign + extension for integer. */ + if (op == CONST0_RTX (mode)) + return false; + + if (INTEGRAL_MODE_P (mode) && IN_RANGE (df_upper, -128, 127)) + return false; + + /* Avoid values that look like DFmode NaN's. The IEEE 754 64-bit floating + format has 1 bit for sign, 11 bits for the exponent, and 52 bits for the + mantissa. NaN values have the exponent set to all 1 bits, and the + mantissa non-zero (mantissa == 0 is infinity). */ + + int df_exponent = (df_upper >> 52) & 0x7ff; + HOST_WIDE_INT df_mantissa = df_upper & HOST_WIDE_INT_C (0x1fffffffffffff); + + if (df_exponent == 0x7ff && df_mantissa != 0) /* NaN. */ + return false; + + /* Avoid values that are DFmode subnormal values. Subnormal numbers have + the exponent all 0 bits, and the mantissa non-zero. If the value is + subnormal, then the hidden bit in the mantissa is not set. */ + if (df_exponent == 0 && df_mantissa != 0) /* subnormal. */ + return false; + + /* Change the representation to DFmode constant. */ + long df_words[2]; + df_words[0] = (df_upper >> 32) & 0xffffffff; + df_words[1] = df_upper & 0xffffffff; + + /* real_from_target takes the target words in target order. */ + if (!BYTES_BIG_ENDIAN) + std::swap (df_words[0], df_words[1]); + + REAL_VALUE_TYPE rv_type; + real_from_target (&rv_type, df_words, DFmode); + + const REAL_VALUE_TYPE *rv = &rv_type; + + /* Validate that the number can be stored as a SFmode value. */ + if (!exact_real_truncate (SFmode, rv)) + return false; + + /* Validate that the number is not a SFmode subnormal value (exponent is 0, + mantissa field is non-zero) which is undefined for the XXSPLTIDP + instruction. */ + long sf_value; + real_to_target (&sf_value, rv, SFmode); + + /* IEEE 754 32-bit values have 1 bit for the sign, 8 bits for the exponent, + and 23 bits for the mantissa. Subnormal numbers have the exponent all + 0 bits, and the mantissa non-zero. */ + long sf_exponent = (sf_value >> 23) & 0xFF; + long sf_mantissa = sf_value & 0x7FFFFF; + + if (sf_exponent == 0 && sf_mantissa != 0) + return false; + + return true; +}) + ;; Return 1 if the operand is a constant that can loaded with a XXSPLTIB ;; instruction and then a VUPKHSB, VECSB2W or VECSB2D instruction. @@ -657,6 +766,9 @@ && xxspltib_constant_p (op, mode, &num_insns, &value)) return true; + if (easy_vector_constant_64bit_element (op, mode)) + return true; + return easy_altivec_constant (op, mode); } diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h index 14f6b313105..574b3d5e17e 100644 --- a/gcc/config/rs6000/rs6000-protos.h +++ b/gcc/config/rs6000/rs6000-protos.h @@ -32,10 +32,15 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, int, int, int, extern int easy_altivec_constant (rtx, machine_mode); extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *); +extern long xxspltidp_constant_immediate (rtx, machine_mode); extern int vspltis_shifted (rtx); extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int); extern bool macho_lo_sum_memory_operand (rtx, machine_mode); extern int num_insns_constant (rtx, machine_mode); +extern bool convert_vector_constant_to_bytes (rtx, machine_mode, + unsigned char [], size_t); +extern bool convert_scalar_64bit_constant_to_bytes (rtx, unsigned char [], + size_t); extern int small_data_operand (rtx, machine_mode); extern bool mem_operand_gpr (rtx, machine_mode); extern bool mem_operand_ds_form (rtx, machine_mode); @@ -198,6 +203,7 @@ enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode); extern bool prefixed_load_p (rtx_insn *); extern bool prefixed_store_p (rtx_insn *); extern bool prefixed_paddi_p (rtx_insn *); +extern bool prefixed_xxsplti_p (rtx_insn *); extern void rs6000_asm_output_opcode (FILE *); extern void output_pcrel_opt_reloc (rtx); extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int); diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index acba4d9f26c..a433c7c9386 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -6502,6 +6502,179 @@ num_insns_constant (rtx op, machine_mode mode) return num_insns_constant_multi (val, mode); } +/* Convert a constant OP which has mode MODE into into BYTES (which is + NUM_BYTES long). Return false if the we cannot convert the constant to a + series of bytes. This code supports normal constants and vector constants. + In addition to CONST_VECTOR, we also support constant vectors formed with + VEC_DUPLICATE. Return false if we don't recognize the constant. If OP is a + scalar, it is assumed to be a vector element. + + We return the bytes in big endian order, i.e. for 128-bit vectors, byte 0 is + the most significant byte, and byte 15 is the least significant byte. */ + +bool +convert_vector_constant_to_bytes (rtx op, + machine_mode mode, + unsigned char bytes[], + size_t num_bytes) +{ + /* If we don't know the size of the constant, punt. */ + if (mode == VOIDmode) + return false; + + /* Is the buffer too small? Punt. */ + if (num_bytes < GET_MODE_SIZE (mode)) + return false; + + switch (GET_CODE (op)) + { + /* Integer constants. */ + case CONST_INT: + { + unsigned bitsize = GET_MODE_BITSIZE (mode); + unsigned HOST_WIDE_INT uvalue = UINTVAL (op); + size_t byte_num = 0; + + for (int shift = bitsize - 8; shift >= 0; shift -= 8) + bytes[byte_num++] = (uvalue >> shift) & 0xff; + + break; + } + + /* Floating point constants. */ + case CONST_DOUBLE: + { + unsigned bitsize = GET_MODE_BITSIZE (mode); + unsigned num_words = bitsize / 32; + const REAL_VALUE_TYPE *rtype = CONST_DOUBLE_REAL_VALUE (op); + size_t byte_num = 0; + long real_words[4]; + + /* Make sure we don't overflow the real_words array and that it is + filled completely. */ + if (bitsize > 128 || (bitsize % 32) != 0) + return false; + + real_to_target (real_words, rtype, mode); + + /* Iterate over each 32-bit word in the floating point constant. The + real_to_target function puts out words in endian fashion. We need + to arrange so the bytes are written in big endian order. */ + for (unsigned num = 0; num < num_words; num++) + { + unsigned endian_num = (BYTES_BIG_ENDIAN + ? num + : num_words - 1 - num); + + unsigned uvalue = real_words[endian_num]; + for (int shift = 32 - 8; shift >= 0; shift -= 8) + bytes[byte_num++] = (uvalue >> shift) & 0xff; + } + + break; + } + + /* Vector constants, iterate each element. On little endian systems, we + have to reverse the element numbers. */ + case CONST_VECTOR: + { + machine_mode ele_mode = GET_MODE_INNER (mode); + size_t nunits = GET_MODE_NUNITS (mode); + size_t size = GET_MODE_SIZE (ele_mode); + + for (size_t num = 0; num < nunits; num++) + { + rtx ele = CONST_VECTOR_ELT (op, num); + size_t byte_num = (BYTES_BIG_ENDIAN + ? num + : nunits - 1 - num) * size; + + if (!convert_vector_constant_to_bytes (ele, ele_mode, + &bytes[byte_num], size)) + return false; + } + + break; + } + + /* Vector constants, formed with VEC_DUPLICATE of a constant. */ + case VEC_DUPLICATE: + { + machine_mode ele_mode = GET_MODE_INNER (mode); + size_t nunits = GET_MODE_NUNITS (mode); + size_t size = GET_MODE_SIZE (ele_mode); + rtx ele = XEXP (op, 0); + size_t byte_num = 0; + + for (size_t num = 0; num < nunits; num++) + { + if (!convert_vector_constant_to_bytes (ele, ele_mode, + &bytes[byte_num], size)) + return false; + + byte_num += size; + } + + break; + } + + /* Any thing else, just return failure. */ + default: + return false; + } + + return true; +} + +/* Convert a CONST_INT or CONST_DOUBLE OP which has mode MODE into into BYTES + (which is NUM_BYTES long). Return false if the we cannot convert the + constant to a series of bytes. This function used for the XXSPLTIDP and + XXSPLTI32DX instructions that load up a vector register with a value into + the upper 64-bits of the vector register and then is splatted to the lower + 64-bits. + + We return the bytes in big endian order, i.e. for 128-bit vectors, byte 0 is + the most significant byte, and byte 15 is the least significant byte. */ + +bool +convert_scalar_64bit_constant_to_bytes (rtx op, + unsigned char bytes[], + size_t num_bytes) +{ + machine_mode mode; + + /* We use DImode for integer constants and DFmode for floating point + constants, since SFmode scalars are stored as DFmode in the PowerPC. */ + if (CONST_INT_P (op)) + mode = DImode; + + else if (CONST_DOUBLE_P (op)) + { + if (GET_MODE (op) != SFmode && GET_MODE (op) != DFmode) + return false; + + mode = DFmode; + } + + else + return false; + + /* Verify that the buffer is either scalar sized (64-bits) or vector + sized. */ + if (num_bytes != 8 && num_bytes != 16) + return false; + + if (!convert_vector_constant_to_bytes (op, mode, bytes, 8)) + return false; + + /* If the caller wanted the bytes in a vector size, duplicate the bytes to + mimic the behavior of the XXSPLTIDP and XXSPLTI32DX instructions. */ + if (num_bytes == 16) + memcpy (&bytes[8], &bytes[0], 8); + + return true; +} + /* Interpret element ELT of the CONST_VECTOR OP as an integer value. If the mode of OP is MODE_VECTOR_INT, this simply returns the corresponding element of the vector, but for V4SFmode, the @@ -6946,6 +7119,58 @@ xxspltib_constant_p (rtx op, return true; } +/* Return the 32-bit immediate value that is used for the XXSPLTIDP instruction + to load a DFmode value that is splatted into a 128-bit vector. */ + +long +xxspltidp_constant_immediate (rtx op, machine_mode mode) +{ + long ret; + unsigned char vector_bytes[16]; + + gcc_assert (easy_vector_constant_64bit_element (op, mode)); + + /* We use DImode for integer constants and DFmode for floating point + constants, since SFmode scalars are stored as DFmode in the PowerPC. */ + if (CONST_INT_P (op) || CONST_DOUBLE_P (op)) + { + if (!convert_scalar_64bit_constant_to_bytes (op, vector_bytes, + sizeof (vector_bytes))) + gcc_unreachable (); + + /* Change mode to value in the vector register. */ + mode = CONST_INT_P (op) ? E_DImode : E_DFmode; + } + + /* For vector constants, get the whole vector. */ + else if (!convert_vector_constant_to_bytes (op, mode, vector_bytes, + sizeof (vector_bytes))) + gcc_unreachable (); + + /* The vector_bytes array has the 8 bytes of the upper and lower constants in + big endian order. Convert the upper 8 bytes into a 64-bit constant. The + 64-bit constant is represented by a pair of 32-bit constants. Then + convert it to a DFmode constant. The real value support functions take + things in target endian order, so we will need to swap things on little + endian. */ + long df_upper = 0, df_lower = 0; + for (int i = 0; i < 4; i++) + { + df_upper = (df_upper << 8) | vector_bytes[i]; + df_lower = (df_lower << 8) | vector_bytes[i+4]; + } + + if (!BYTES_BIG_ENDIAN) + std::swap (df_upper, df_lower); + + long df_words[2] = { df_upper, df_lower }; + REAL_VALUE_TYPE r; + real_from_target (&r, &df_words[0], DFmode); + real_to_target (&ret, &r, SFmode); + + return ret; +} + const char * output_vec_const_move (rtx *operands) { @@ -6990,6 +7215,12 @@ output_vec_const_move (rtx *operands) gcc_unreachable (); } + if (easy_vector_constant_64bit_element (vec, mode)) + { + operands[2] = GEN_INT (xxspltidp_constant_immediate (vec, mode)); + return "xxspltidp %x0,%2"; + } + if (TARGET_P9_VECTOR && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value)) { @@ -26724,6 +26955,29 @@ prefixed_paddi_p (rtx_insn *insn) return (iform == INSN_FORM_PCREL_EXTERNAL || iform == INSN_FORM_PCREL_LOCAL); } +/* Whether a permute type instruction is a prefixed XXSPLTI* instruction. + This is called from the prefixed attribute processing. */ + +bool +prefixed_xxsplti_p (rtx_insn *insn) +{ + rtx set = single_set (insn); + if (!set) + return false; + + rtx dest = SET_DEST (set); + rtx src = SET_SRC (set); + machine_mode mode = GET_MODE (dest); + + if (!REG_P (dest) && !SUBREG_P (dest)) + return false; + + if (easy_vector_constant_64bit_element (src, mode)) + return true; + + return false; +} + /* Whether the next instruction needs a 'p' prefix issued before the instruction is printed out. */ static bool prepend_p_to_next_insn; diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 6bec2bddbde..9ac5b8df173 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -314,6 +314,11 @@ (eq_attr "type" "integer,add") (if_then_else (match_test "prefixed_paddi_p (insn)") + (const_string "yes") + (const_string "no")) + + (eq_attr "type" "vecperm") + (if_then_else (match_test "prefixed_xxsplti_p (insn)") (const_string "yes") (const_string "no"))] @@ -7759,17 +7764,17 @@ ;; ;; LWZ LFS LXSSP LXSSPX STFS STXSSP ;; STXSSPX STW XXLXOR LI FMR XSCPSGNDP -;; MR MT MF NOP +;; MR MT MF NOP XXSPLTIDP (define_insn "movsf_hardfloat" [(set (match_operand:SF 0 "nonimmediate_operand" "=!r, f, v, wa, m, wY, Z, m, wa, !r, f, wa, - !r, *c*l, !r, *h") + !r, *c*l, !r, *h, wa") (match_operand:SF 1 "input_operand" "m, m, wY, Z, f, v, wa, r, j, j, f, wa, - r, r, *h, 0"))] + r, r, *h, 0, eD"))] "(register_operand (operands[0], SFmode) || register_operand (operands[1], SFmode)) && TARGET_HARD_FLOAT @@ -7791,15 +7796,16 @@ mr %0,%1 mt%0 %1 mf%1 %0 - nop" + nop + #" [(set_attr "type" "load, fpload, fpload, fpload, fpstore, fpstore, fpstore, store, veclogical, integer, fpsimple, fpsimple, - *, mtjmpr, mfjmpr, *") + *, mtjmpr, mfjmpr, *, vecperm") (set_attr "isa" "*, *, p9v, p8v, *, p9v, p8v, *, *, *, *, *, - *, *, *, *")]) + *, *, *, *, p10")]) ;; LWZ LFIWZX STW STFIWX MTVSRWZ MFVSRWZ ;; FMR MR MT%0 MF%1 NOP @@ -8059,18 +8065,18 @@ ;; STFD LFD FMR LXSD STXSD ;; LXSD STXSD XXLOR XXLXOR GPR<-0 -;; LWZ STW MR +;; LWZ STW MR XXSPLTIDP (define_insn "*mov_hardfloat32" [(set (match_operand:FMOVE64 0 "nonimmediate_operand" "=m, d, d, , wY, , Z, , , !r, - Y, r, !r") + Y, r, !r, wa") (match_operand:FMOVE64 1 "input_operand" "d, m, d, wY, , Z, , , , , - r, Y, r"))] + r, Y, r, eD"))] "! TARGET_POWERPC64 && TARGET_HARD_FLOAT && (gpc_reg_operand (operands[0], mode) || gpc_reg_operand (operands[1], mode))" @@ -8087,20 +8093,21 @@ # # # + # #" [(set_attr "type" "fpstore, fpload, fpsimple, fpload, fpstore, fpload, fpstore, veclogical, veclogical, two, - store, load, two") + store, load, two, vecperm") (set_attr "size" "64") (set_attr "length" "*, *, *, *, *, *, *, *, *, 8, - 8, 8, 8") + 8, 8, 8, *") (set_attr "isa" "*, *, *, p9v, p9v, p7v, p7v, *, *, *, - *, *, *")]) + *, *, *, p10")]) ;; STW LWZ MR G-const H-const F-const @@ -8127,19 +8134,19 @@ ;; STFD LFD FMR LXSD STXSD ;; LXSDX STXSDX XXLOR XXLXOR LI 0 ;; STD LD MR MT{CTR,LR} MF{CTR,LR} -;; NOP MFVSRD MTVSRD +;; NOP MFVSRD MTVSRD XXSPLTIDP (define_insn "*mov_hardfloat64" [(set (match_operand:FMOVE64 0 "nonimmediate_operand" "=m, d, d, , wY, , Z, , , !r, YZ, r, !r, *c*l, !r, - *h, r, ") + *h, r, , wa") (match_operand:FMOVE64 1 "input_operand" "d, m, d, wY, , Z, , , , , r, YZ, r, r, *h, - 0, , r"))] + 0, , r, eD"))] "TARGET_POWERPC64 && TARGET_HARD_FLOAT && (gpc_reg_operand (operands[0], mode) || gpc_reg_operand (operands[1], mode))" @@ -8161,18 +8168,19 @@ mf%1 %0 nop mfvsrd %0,%x1 - mtvsrd %x0,%1" + mtvsrd %x0,%1 + #" [(set_attr "type" "fpstore, fpload, fpsimple, fpload, fpstore, fpload, fpstore, veclogical, veclogical, integer, store, load, *, mtjmpr, mfjmpr, - *, mfvsr, mtvsr") + *, mfvsr, mtvsr, vecperm") (set_attr "size" "64") (set_attr "isa" "*, *, *, p9v, p9v, p7v, p7v, *, *, *, *, *, *, *, *, - *, p8v, p8v")]) + *, p8v, p8v, p10")]) ;; STD LD MR MT MF G-const ;; H-const F-const Special @@ -9220,6 +9228,7 @@ ;; a gpr into a fpr instead of reloading an invalid 'Y' address ;; GPR store GPR load GPR move FPR store FPR load FPR move +;; XXSPLTIDP ;; GPR const AVX store AVX store AVX load AVX load VSX move ;; P9 0 P9 -1 AVX 0/-1 VSX 0 VSX -1 P9 const ;; AVX const @@ -9227,11 +9236,13 @@ (define_insn "*movdi_internal32" [(set (match_operand:DI 0 "nonimmediate_operand" "=Y, r, r, m, ^d, ^d, + ^wa, r, wY, Z, ^v, $v, ^wa, wa, wa, v, wa, *i, v, v") (match_operand:DI 1 "input_operand" "r, Y, r, ^d, m, ^d, + eD, IJKnF, ^v, $v, wY, Z, ^wa, Oj, wM, OjwM, Oj, wM, wS, wB"))] @@ -9246,6 +9257,7 @@ lfd%U1%X1 %0,%1 fmr %0,%1 # + # stxsd %1,%0 stxsdx %x1,%y0 lxsd %0,%1 @@ -9260,17 +9272,20 @@ #" [(set_attr "type" "store, load, *, fpstore, fpload, fpsimple, + vecperm, *, fpstore, fpstore, fpload, fpload, veclogical, vecsimple, vecsimple, vecsimple, veclogical,veclogical,vecsimple, vecsimple") (set_attr "size" "64") (set_attr "length" "8, 8, 8, *, *, *, + *, 16, *, *, *, *, *, *, *, *, *, *, 8, *") (set_attr "isa" "*, *, *, *, *, *, + p10, *, p9v, p7v, p9v, p7v, *, p9v, p9v, p7v, *, *, p7v, p7v")]) @@ -9306,6 +9321,7 @@ }) ;; GPR store GPR load GPR move +;; XXSPLTIDP ;; GPR li GPR lis GPR pli GPR # ;; FPR store FPR load FPR move ;; AVX store AVX store AVX load AVX load VSX move @@ -9316,6 +9332,7 @@ (define_insn "*movdi_internal64" [(set (match_operand:DI 0 "nonimmediate_operand" "=YZ, r, r, + ^wa, r, r, r, r, m, ^d, ^d, wY, Z, $v, $v, ^wa, @@ -9325,6 +9342,7 @@ ?r, ?wa") (match_operand:DI 1 "input_operand" "r, YZ, r, + eD, I, L, eI, nF, ^d, m, ^d, ^v, $v, wY, Z, ^wa, @@ -9339,6 +9357,7 @@ std%U0%X0 %1,%0 ld%U1%X1 %0,%1 mr %0,%1 + # li %0,%1 lis %0,%v1 li %0,%1 @@ -9365,6 +9384,7 @@ mtvsrd %x0,%1" [(set_attr "type" "store, load, *, + vecperm, *, *, *, *, fpstore, fpload, fpsimple, fpstore, fpstore, fpload, fpload, veclogical, @@ -9375,6 +9395,7 @@ (set_attr "size" "64") (set_attr "length" "*, *, *, + *, *, *, *, 20, *, *, *, *, *, *, *, *, @@ -9384,6 +9405,7 @@ *, *") (set_attr "isa" "*, *, *, + p10, *, *, p10, *, *, *, *, p9v, p7v, p9v, p7v, *, diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt index 9d7878f144a..1d7ce4cc94a 100644 --- a/gcc/config/rs6000/rs6000.opt +++ b/gcc/config/rs6000/rs6000.opt @@ -640,6 +640,10 @@ mprivileged Target Var(rs6000_privileged) Init(0) Generate code that will run in privileged state. +mxxspltidp +Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save +Generate (do not generate) XXSPLTIDP instructions. + -param=rs6000-density-pct-threshold= Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) IntegerRange(0, 100) Param When costing for loop vectorization, we probably need to penalize the loop body diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index bf033e31c1c..48ecb41801c 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -6458,6 +6458,32 @@ [(set_attr "type" "vecperm") (set_attr "prefixed" "yes")]) +(define_mode_iterator XXSPLTIDP [DI SF DF V16QI V8HI V4SI V4SF V2DF V2DI]) + +(define_insn "*xxspltidp__internal" + [(set (match_operand:XXSPLTIDP 0 "register_operand" "=wa") + (unspec:XXSPLTIDP [(match_operand:SI 1 "c32bit_cint_operand" "n")] + UNSPEC_XXSPLTIDP))] + "TARGET_POWER10" + "xxspltidp %x0,%1" + [(set_attr "type" "vecperm") + (set_attr "prefixed" "yes")]) + +;; Generate the XXSPLTIDP instruction to support SFmode, DFmode, and DImode +;; scalar constants and vector constants that look like DFmode floating point +;; values where both elements are the same. The constant has to be expressible +;; as a SFmode constant that is not a SFmode denormal value. +(define_split + [(set (match_operand:XXSPLTIDP 0 "vsx_register_operand") + (match_operand:XXSPLTIDP 1 "easy_vector_constant_64bit_element"))] + "TARGET_POWER10" + [(set (match_dup 0) + (unspec:XXSPLTIDP [(match_dup 2)] UNSPEC_XXSPLTIDP))] +{ + long immediate = xxspltidp_constant_immediate (operands[1], mode); + operands[2] = GEN_INT (immediate); + }) + ;; XXSPLTI32DX built-in function support (define_expand "xxsplti32dx_v4si" [(set (match_operand:V4SI 0 "register_operand" "=wa") diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 41f1850bf6e..b9dfcaf0d44 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -3333,6 +3333,9 @@ The integer constant zero. A constant whose negation is a signed 16-bit constant. @end ifset +@item eD +A constant that can be loaded with the XXSPLTIDP instruction. + @item eI A signed 34-bit integer constant if prefixed instructions are supported. diff --git a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c index bd1502bb30a..dcb30e1d886 100644 --- a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c +++ b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c @@ -24,11 +24,12 @@ vector signed long long splats4(void) return (vector signed long long) vec_sl(mzero, mzero); } -/* Codegen will consist of splat and shift instructions for most types. - If folding is enabled, the vec_sl tests using vector long long type will - generate a lvx instead of a vspltisw+vsld pair. */ +/* Codegen will consist of splat and shift instructions for most types. If + folding is enabled, the vec_sl tests using vector long long type will + generate a lvx instead of a vspltisw+vsld pair. On power10, it will + generate a xxspltidp instruction instead of the lvx. */ /* { dg-final { scan-assembler-times {\mvspltis[bhw]\M} 0 } } */ /* { dg-final { scan-assembler-times {\mvsl[bhwd]\M} 0 } } */ -/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M|\mxxspltidp\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c new file mode 100644 index 00000000000..8f6e176f9af --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c @@ -0,0 +1,60 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +#include + +/* Test generating DFmode constants with the ISA 3.1 (power10) XXSPLTIDP + instruction. */ + +double +scalar_double_0 (void) +{ + return 0.0; /* XXSPLTIB or XXLXOR. */ +} + +double +scalar_double_1 (void) +{ + return 1.0; /* XXSPLTIDP. */ +} + +#ifndef __FAST_MATH__ +double +scalar_double_m0 (void) +{ + return -0.0; /* XXSPLTIDP. */ +} + +double +scalar_double_nan (void) +{ + return __builtin_nan (""); /* XXSPLTIDP. */ +} + +double +scalar_double_inf (void) +{ + return __builtin_inf (); /* XXSPLTIDP. */ +} + +double +scalar_double_m_inf (void) /* XXSPLTIDP. */ +{ + return - __builtin_inf (); +} +#endif + +double +scalar_double_pi (void) +{ + return M_PI; /* PLFD. */ +} + +double +scalar_double_denorm (void) +{ + return 0x1p-149f; /* PLFD. */ +} + +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c new file mode 100644 index 00000000000..75714d0b11d --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c @@ -0,0 +1,70 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* Test generating DImode constants that have the same bit pattern as DFmode + constants that can be loaded with the XXSPLTIDP instruction with the ISA 3.1 + (power10). We use asm to force the value into vector registers. */ + +double +scalar_0 (void) +{ + /* XXSPLTIB or XXLXOR. */ + double d; + long long ll = 0; + + __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll)); + return d; +} + +double +scalar_1 (void) +{ + /* VSPLTISW/VUPKLSW or XXSPLTIB/VEXTSB2D. */ + double d; + long long ll = 1; + + __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll)); + return d; +} + +/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated + with XXSPLTIDP. */ +double +scalar_float_neg_0 (void) +{ + /* XXSPLTIDP. */ + double d; + long long ll = 0x8000000000000000LL; + + __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll)); + return d; +} + +/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with + XXSPLTIDP. */ +double +scalar_float_1_0 (void) +{ + /* XXSPLTIDP. */ + double d; + long long ll = 0x3ff0000000000000LL; + + __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll)); + return d; +} + +/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated + with XXSPLTIDP. */ +double +scalar_pi (void) +{ + /* PLXV. */ + double d; + long long ll = 0x400921fb54442d18LL; + + __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll)); + return d; +} + +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c new file mode 100644 index 00000000000..72504bdfbbd --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c @@ -0,0 +1,60 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +#include + +/* Test generating SFmode constants with the ISA 3.1 (power10) XXSPLTIDP + instruction. */ + +float +scalar_float_0 (void) +{ + return 0.0f; /* XXSPLTIB or XXLXOR. */ +} + +float +scalar_float_1 (void) +{ + return 1.0f; /* XXSPLTIDP. */ +} + +#ifndef __FAST_MATH__ +float +scalar_float_m0 (void) +{ + return -0.0f; /* XXSPLTIDP. */ +} + +float +scalar_float_nan (void) +{ + return __builtin_nanf (""); /* XXSPLTIDP. */ +} + +float +scalar_float_inf (void) +{ + return __builtin_inff (); /* XXSPLTIDP. */ +} + +float +scalar_float_m_inf (void) /* XXSPLTIDP. */ +{ + return - __builtin_inff (); +} +#endif + +float +scalar_float_pi (void) +{ + return (float)M_PI; /* XXSPLTIDP. */ +} + +float +scalar_float_denorm (void) +{ + return 0x1p-149f; /* PLFS. */ +} + +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c new file mode 100644 index 00000000000..82ffc86f8aa --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c @@ -0,0 +1,64 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +#include + +/* Test generating V2DFmode constants with the ISA 3.1 (power10) XXSPLTIDP + instruction. */ + +vector double +v2df_double_0 (void) +{ + return (vector double) { 0.0, 0.0 }; /* XXSPLTIB or XXLXOR. */ +} + +vector double +v2df_double_1 (void) +{ + return (vector double) { 1.0, 1.0 }; /* XXSPLTIDP. */ +} + +#ifndef __FAST_MATH__ +vector double +v2df_double_m0 (void) +{ + return (vector double) { -0.0, -0.0 }; /* XXSPLTIDP. */ +} + +vector double +v2df_double_nan (void) +{ + return (vector double) { __builtin_nan (""), + __builtin_nan ("") }; /* XXSPLTIDP. */ +} + +vector double +v2df_double_inf (void) +{ + return (vector double) { __builtin_inf (), + __builtin_inf () }; /* XXSPLTIDP. */ +} + +vector double +v2df_double_m_inf (void) +{ + return (vector double) { - __builtin_inf (), + - __builtin_inf () }; /* XXSPLTIDP. */ +} +#endif + +vector double +v2df_double_pi (void) +{ + return (vector double) { M_PI, M_PI }; /* PLVX. */ +} + +vector double +v2df_double_denorm (void) +{ + return (vector double) { (double)0x1p-149f, + (double)0x1p-149f }; /* PLVX. */ +} + +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c new file mode 100644 index 00000000000..4d44f943d26 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c @@ -0,0 +1,50 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* Test generating V2DImode constants that have the same bit pattern as + V2DFmode constants that can be loaded with the XXSPLTIDP instruction with + the ISA 3.1 (power10). */ + +vector long long +vector_0 (void) +{ + /* XXSPLTIB or XXLXOR. */ + return (vector long long) { 0LL, 0LL }; +} + +vector long long +vector_1 (void) +{ + /* XXSPLTIB and VEXTSB2D. */ + return (vector long long) { 1LL, 1LL }; +} + +/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated + with XXSPLTISDP. */ +vector long long +vector_float_neg_0 (void) +{ + /* XXSPLTIDP. */ + return (vector long long) { 0x8000000000000000LL, 0x8000000000000000LL }; +} + +/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with + XXSPLTISDP. */ +vector long long +vector_float_1_0 (void) +{ + /* XXSPLTIDP. */ + return (vector long long) { 0x3ff0000000000000LL, 0x3ff0000000000000LL }; +} + +/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated + with XXSPLTIDP. */ +vector long long +scalar_pi (void) +{ + /* PLXV. */ + return (vector long long) { 0x400921fb54442d18LL, 0x400921fb54442d18LL }; +} + +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */