From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 1005) id 7E41E3858020; Thu, 14 Oct 2021 03:24:46 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 7E41E3858020 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Michael Meissner To: gcc-cvs@gcc.gnu.org Subject: [gcc(refs/users/meissner/heads/work071)] Generate XXSPLTIDP on power10. X-Act-Checkin: gcc X-Git-Author: Michael Meissner X-Git-Refname: refs/users/meissner/heads/work071 X-Git-Oldrev: a3064386c0c96e21159f151050bdf5ffcf9a1217 X-Git-Newrev: 59aa49d6fe6278fa865e4c836ef9e19fa2fd5ae9 Message-Id: <20211014032446.7E41E3858020@sourceware.org> Date: Thu, 14 Oct 2021 03:24:46 +0000 (GMT) X-BeenThere: gcc-cvs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-cvs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 14 Oct 2021 03:24:46 -0000 https://gcc.gnu.org/g:59aa49d6fe6278fa865e4c836ef9e19fa2fd5ae9 commit 59aa49d6fe6278fa865e4c836ef9e19fa2fd5ae9 Author: Michael Meissner Date: Wed Oct 13 23:24:22 2021 -0400 Generate XXSPLTIDP on power10. This patch implements XXSPLTIDP support for SF, DF, and DI scalar constants and V2DF and V2DI vector constants. The XXSPLTIDP instruction is given a 32-bit immediate that is converted to a vector of two DFmode constants. The immediate is in SFmode format, so only constants that fit as SFmode values can be loaded with XXSPLTIDP. I added a new constraint (eD) to match scalar and vector constants that can be loaded with the XXSPLTIDP instruction. I have added a temporary switch (-mxxspltidp) to control whether or not the XXSPLTIDP instruction is generated. I added 5 new tests to test loading up SF/DF/DI scalar and V2DI/V2DF vector constants. This patch updates the previous patch to take into account the comments from the patch review. The main change is that this patch does is map each vector and scalar to provide all of bits and then match those bits to see if the XXSPLTIDP instruction can generate the bits necessary, even if the values in the vector aren't DFmode constants. Some framework is provided in this patch which will also be used in future patches adding LXVKQ and XXSPLTIW support (possibly XXSPLTI32DX). This way for instance in easy_fp_constant and easy_vector_constant when we first check whether the constant can be generated by XXSPLTIDP, we don't have to build the 128-bits of the vector for each successive test. While the PowerPC is currently limited to 128-bit vectors, I have written the code so it can be changed in the future if we ever have larger vection sizes. 2021-10-13 Michael Meissner gcc/ * config/rs6000/constraints.md (eD): New constraint. * config/rs6000/predicates.md (easy_fp_constant): Add support for generating XXSPLTIDP. (easy_vector_constant_64bit_element): New predicate. (easy_vector_constant): Add support for generating XXSPLTIDP. * config/rs6000/rs6000-protos.h (prefixed_xxsplti_p): New declaration. (VECTOR_CONST_*): New macros. (rs6000_vec_const): New structure to hold information about vector constants. (vec_const_to_bytes): New function. (vec_const_use_xxspltidp): New function. * config/rs6000/rs6000.c (output_vec_const_move): Add support for XXSPLTIDP. (prefixed_xxsplti_p): New function. (vec_const_integer): New helper function. (vec_const_floating_poin): New helper function. (vec_const_use_xxspltidp): New function. (vec_const_to_bytes): New function. * config/rs6000/rs6000.md (prefixed attribute): Add support for insns that generate XXSPLTIDP. (movsf_hardfloat): Add support for XXSPLTIDP. (mov_hardfloat32, FMOVE64 iterator): Likewise. (mov_hardfloat64, FMOVE64 iterator): Likewise. (movdi_internal32): Likewise. (movdi_internal64): Likewise. * config/rs6000/rs6000.opt (-mxxspltidp): New debug option. * config/rs6000/vsx.md (vsx_mov_64bit): Add support for XXSPLTIDP. (vsx_mov_32bit): Likewise. (XXSPLTIDP): New mode iterator. (xxspltidp__internal): New insn. (XXSPLTIDP splitters): New splitters for XXSPLTIDP. * doc/md.texi (PowerPC and IBM RS6000 constraints): Document the eD constraint. gcc/testsuite/ * gcc.target/powerpc/pr86731-fwrapv-longlong.c: Update insn regex for power10. * gcc.target/powerpc/vec-splat-constant-df.c: New test. * gcc.target/powerpc/vec-splat-constant-di.c: New test. * gcc.target/powerpc/vec-splat-constant-sf.c: New test. * gcc.target/powerpc/vec-splat-constant-v2df.c: New test. * gcc.target/powerpc/vec-splat-constant-v2di.c: New test. Diff: --- gcc/config/rs6000/constraints.md | 5 + gcc/config/rs6000/predicates.md | 40 +++ gcc/config/rs6000/rs6000-protos.h | 26 ++ gcc/config/rs6000/rs6000.c | 325 +++++++++++++++++++++ gcc/config/rs6000/rs6000.md | 58 ++-- gcc/config/rs6000/rs6000.opt | 4 + gcc/config/rs6000/vsx.md | 58 +++- gcc/doc/md.texi | 3 + .../gcc.target/powerpc/pr86731-fwrapv-longlong.c | 9 +- .../gcc.target/powerpc/vec-splat-constant-df.c | 60 ++++ .../gcc.target/powerpc/vec-splat-constant-di.c | 70 +++++ .../gcc.target/powerpc/vec-splat-constant-sf.c | 60 ++++ .../gcc.target/powerpc/vec-splat-constant-v2df.c | 64 ++++ .../gcc.target/powerpc/vec-splat-constant-v2di.c | 50 ++++ 14 files changed, 796 insertions(+), 36 deletions(-) diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md index c8cff1a3038..d26c8940104 100644 --- a/gcc/config/rs6000/constraints.md +++ b/gcc/config/rs6000/constraints.md @@ -208,6 +208,11 @@ (and (match_code "const_int") (match_test "((- (unsigned HOST_WIDE_INT) ival) + 0x8000) < 0x10000"))) +;; A scalar or vector constant that can be loaded with the XXSPLTIDP instruction. +(define_constraint "eD" + "A constant that can be loaded with the XXSPLTIDP instruction." + (match_operand 0 "easy_vector_constant_64bit_element")) + ;; 34-bit signed integer constant (define_constraint "eI" "A signed 34-bit integer constant if prefixed instructions are supported." diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md index 956e42bc514..ddad7ca3ae9 100644 --- a/gcc/config/rs6000/predicates.md +++ b/gcc/config/rs6000/predicates.md @@ -601,6 +601,16 @@ if (TARGET_VSX && op == CONST0_RTX (mode)) return 1; + /* See if the constant can be generated with the ISA 3.1 + instructions. */ + rs6000_vec_const vec_const; + + if (vec_const_to_bytes (op, mode, &vec_const)) + { + if (vec_const_use_xxspltidp (&vec_const)) + return true; + } + /* Otherwise consider floating point constants hard, so that the constant gets pushed to memory during the early RTL phases. This has the advantage that double precision constants that can be @@ -609,6 +619,26 @@ return 0; }) +;; Return 1 if the operand is a 64-bit vector constant that can be loaded via +;; the XXSPLTIDP instruction, which takes a SFmode value and produces a +;; V2DFmode or V2DI result. + +(define_predicate "easy_vector_constant_64bit_element" + (match_code "const_vector,vec_duplicate,const_int,const_double") +{ + rs6000_vec_const vec_const; + + /* Can we do the XXSPLTIDP instruction? */ + if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX) + return false; + + /* Convert the vector constant to bytes. */ + if (!vec_const_to_bytes (op, mode, &vec_const)) + return false; + + return vec_const_use_xxspltidp (&vec_const); +}) + ;; Return 1 if the operand is a constant that can loaded with a XXSPLTIB ;; instruction and then a VUPKHSB, VECSB2W or VECSB2D instruction. @@ -657,6 +687,16 @@ && xxspltib_constant_p (op, mode, &num_insns, &value)) return true; + /* See if the constant can be generated with the ISA 3.1 + instructions. */ + rs6000_vec_const vec_const; + + if (vec_const_to_bytes (op, mode, &vec_const)) + { + if (vec_const_use_xxspltidp (&vec_const)) + return true; + } + return easy_altivec_constant (op, mode); } diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h index 14f6b313105..da9502bcb33 100644 --- a/gcc/config/rs6000/rs6000-protos.h +++ b/gcc/config/rs6000/rs6000-protos.h @@ -198,6 +198,7 @@ enum non_prefixed_form reg_to_non_prefixed (rtx reg, machine_mode mode); extern bool prefixed_load_p (rtx_insn *); extern bool prefixed_store_p (rtx_insn *); extern bool prefixed_paddi_p (rtx_insn *); +extern bool prefixed_xxsplti_p (rtx_insn *); extern void rs6000_asm_output_opcode (FILE *); extern void output_pcrel_opt_reloc (rtx); extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int); @@ -222,6 +223,31 @@ address_is_prefixed (rtx addr, return (iform == INSN_FORM_PREFIXED_NUMERIC || iform == INSN_FORM_PCREL_LOCAL); } + +/* Functions and data structures relating to 128-bit vector constants. All + fields are kept in big endian order. */ +#define VECTOR_CONST_BITS 128 +#define VECTOR_CONST_BYTES (VECTOR_CONST_BITS / 8) +#define VECTOR_CONST_16BIT (VECTOR_CONST_BITS / 16) +#define VECTOR_CONST_32BIT (VECTOR_CONST_BITS / 32) +#define VECTOR_CONST_64BIT (VECTOR_CONST_BITS / 64) + +typedef struct { + /* Vector constant as various sized items. */ + unsigned char bytes[VECTOR_CONST_BYTES]; + unsigned short h_words[VECTOR_CONST_16BIT]; + unsigned int words[VECTOR_CONST_32BIT]; + unsigned HOST_WIDE_INT d_words[VECTOR_CONST_64BIT]; + machine_mode orig_mode; /* Original mode. */ + enum rtx_code orig_code; /* Original rtx code. */ + bool is_xxspltidp; /* Use XXSPLTIDP to load constant. */ + machine_mode xxspltidp_mode; /* Mode to use for XXSPLTIDP. */ + unsigned int xxspltidp_immediate; /* Immediate value for XXSPLTIDP. */ + bool is_prefixed; /* Prefixed instruction used. */ +} rs6000_vec_const; + +extern bool vec_const_to_bytes (rtx, machine_mode, rs6000_vec_const *); +extern bool vec_const_use_xxspltidp (rs6000_vec_const *); #endif /* RTX_CODE */ #ifdef TREE_CODE diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index acba4d9f26c..05b2691d38a 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -6990,6 +6990,16 @@ output_vec_const_move (rtx *operands) gcc_unreachable (); } + rs6000_vec_const vec_const; + if (vec_const_to_bytes (vec, mode, &vec_const)) + { + if (vec_const_use_xxspltidp (&vec_const)) + { + operands[2] = GEN_INT (vec_const.xxspltidp_immediate); + return "xxspltidp %x0,%2"; + } + } + if (TARGET_P9_VECTOR && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value)) { @@ -26724,6 +26734,44 @@ prefixed_paddi_p (rtx_insn *insn) return (iform == INSN_FORM_PCREL_EXTERNAL || iform == INSN_FORM_PCREL_LOCAL); } +/* Whether a permute type instruction is a prefixed XXSPLTI* instruction. + This is called from the prefixed attribute processing. */ + +bool +prefixed_xxsplti_p (rtx_insn *insn) +{ + rtx set = single_set (insn); + if (!set) + return false; + + rtx dest = SET_DEST (set); + rtx src = SET_SRC (set); + machine_mode mode = GET_MODE (dest); + + if (!REG_P (dest) && !SUBREG_P (dest)) + return false; + + if (GET_CODE (src) == UNSPEC) + { + int unspec = XINT (src, 1); + return (unspec == UNSPEC_XXSPLTIW + || unspec == UNSPEC_XXSPLTIDP + || unspec == UNSPEC_XXSPLTI32DX); + } + + rs6000_vec_const vec_const; + if (vec_const_to_bytes (src, mode, &vec_const)) + { + if (vec_const.is_prefixed) + return true; + + if (vec_const_use_xxspltidp (&vec_const)) + return true; + } + + return false; +} + /* Whether the next instruction needs a 'p' prefix issued before the instruction is printed out. */ static bool prepend_p_to_next_insn; @@ -28587,6 +28635,283 @@ rs6000_output_addr_vec_elt (FILE *file, int value) fprintf (file, "\n"); } + +/* Copy an integer constant to the vector constant structure. */ + +static void +vec_const_integer (rtx op, + machine_mode mode, + size_t byte_num, + rs6000_vec_const *vec_const) +{ + unsigned HOST_WIDE_INT uvalue = UINTVAL (op); + unsigned bitsize = GET_MODE_BITSIZE (mode); + + for (int shift = bitsize - 8; shift >= 0; shift -= 8) + vec_const->bytes[byte_num++] = (uvalue >> shift) & 0xff; +} + +/* Copy an floating point constant to the vector constant structure. */ + +static void +vec_const_floating_point (rtx op, + machine_mode mode, + size_t byte_num, + rs6000_vec_const *vec_const) +{ + unsigned bitsize = GET_MODE_BITSIZE (mode); + unsigned num_words = bitsize / 32; + const REAL_VALUE_TYPE *rtype = CONST_DOUBLE_REAL_VALUE (op); + long real_words[VECTOR_CONST_32BIT]; + + /* Make sure we don't overflow the real_words array and that it is + filled completely. */ + gcc_assert (bitsize <= VECTOR_CONST_BITS && (bitsize % 32) == 0); + + real_to_target (real_words, rtype, mode); + + /* Iterate over each 32-bit word in the floating point constant. The + real_to_target function puts out words in endian fashion. We need + to arrange so the words are written in big endian order. */ + for (unsigned num = 0; num < num_words; num++) + { + unsigned endian_num = (BYTES_BIG_ENDIAN + ? num + : num_words - 1 - num); + + unsigned uvalue = real_words[endian_num]; + for (int shift = 32 - 8; shift >= 0; shift -= 8) + vec_const->bytes[byte_num++] = (uvalue >> shift) & 0xff; + } +} + +/* Determine if a vector constant can be loaded with XXSPLTIDP. If so, + fill out the fields used to generate the instruction. */ + +bool +vec_const_use_xxspltidp (rs6000_vec_const *vec_const) +{ + if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX) + return false; + + /* Make sure that the two 64-bit segments are the same. */ + unsigned HOST_WIDE_INT df_upper = vec_const->d_words[0]; + unsigned HOST_WIDE_INT df_lower = vec_const->d_words[1]; + if (df_upper != df_lower) + return false; + + /* Avoid values that are easy to create with other instructions (0.0 for + floating point, and values that can be loaded with XXSPLTIB and sign + extension for integer. */ + if (df_upper == 0) + return false; + + machine_mode mode = vec_const->orig_mode; + if (mode == VOIDmode) + mode = DImode; + + if (!FLOAT_MODE_P (mode) && IN_RANGE (df_upper, -128, 127)) + return false; + + /* Avoid values that look like DFmode NaN's, except for the normal NaN bit + pattern and signalling NaN bit pattern. Recognize infinity and negative + infinity. + + The IEEE 754 64-bit floating format has 1 bit for sign, 11 bits for the + exponent, and 52 bits for the mantissa (not counting the hidden bit used + for normal numbers). NaN values have the exponent set to all 1 bits, and + the mantissa non-zero (mantissa == 0 is infinity). */ + +#define VECTOR_CONST_DF_NAN HOST_WIDE_INT_UC (0x7ff8000000000000) +#define VECTOR_CONST_DF_NANS HOST_WIDE_INT_UC (0x7ff4000000000000) +#define VECTOR_CONST_DF_INF HOST_WIDE_INT_UC (0x7ff0000000000000) +#define VECTOR_CONST_DF_NEG_INF HOST_WIDE_INT_UC (0xfff0000000000000) + + if (df_upper != VECTOR_CONST_DF_NAN + && df_upper != VECTOR_CONST_DF_NANS + && df_upper != VECTOR_CONST_DF_INF + && df_upper != VECTOR_CONST_DF_NEG_INF) + { + int df_exponent = (df_upper >> 52) & 0x7ff; + unsigned HOST_WIDE_INT df_mantissa + = df_upper & ((HOST_WIDE_INT_1U << 52) - HOST_WIDE_INT_1U); + + if (df_exponent == 0x7ff && df_mantissa != 0) /* other NaNs. */ + return false; + + /* Avoid values that are DFmode subnormal values. Subnormal numbers have + the exponent all 0 bits, and the mantissa non-zero. If the value is + subnormal, then the hidden bit in the mantissa is not set. */ + if (df_exponent == 0 && df_mantissa != 0) /* subnormal. */ + return false; + } + + /* Change the representation to DFmode constant. */ + long df_words[2] = { vec_const->words[0], vec_const->words[1] }; + + /* real_from_target takes the target words in target order. */ + if (!BYTES_BIG_ENDIAN) + std::swap (df_words[0], df_words[1]); + + REAL_VALUE_TYPE rv_type; + real_from_target (&rv_type, df_words, DFmode); + + const REAL_VALUE_TYPE *rv = &rv_type; + + /* Validate that the number can be stored as a SFmode value. */ + if (!exact_real_truncate (SFmode, rv)) + return false; + + /* Validate that the number is not a SFmode subnormal value (exponent is 0, + mantissa field is non-zero) which is undefined for the XXSPLTIDP + instruction. */ + long sf_value; + real_to_target (&sf_value, rv, SFmode); + + /* IEEE 754 32-bit values have 1 bit for the sign, 8 bits for the exponent, + and 23 bits for the mantissa. Subnormal numbers have the exponent all + 0 bits, and the mantissa non-zero. */ + long sf_exponent = (sf_value >> 23) & 0xFF; + long sf_mantissa = sf_value & 0x7FFFFF; + + if (sf_exponent == 0 && sf_mantissa != 0) + return false; + + /* Record the information in the vec_const structure for XXSPLTIDP. */ + vec_const->is_xxspltidp = true; + vec_const->is_prefixed = true; + vec_const->xxspltidp_immediate = sf_value; + vec_const->xxspltidp_mode = FLOAT_MODE_P (mode) ? E_DFmode : E_DImode; + + return true; +} + +/* Convert a vector constant to an internal structure, breaking it out to + bytes, half words, words, and double words. Return true if we have + successfully broken it out. */ + +bool +vec_const_to_bytes (rtx op, + machine_mode mode, + rs6000_vec_const *vec_const) +{ + /* Initialize vec const structure. */ + memset ((void *)vec_const, 0, sizeof (rs6000_vec_const)); + + /* If we don't know the size of the constant, punt. */ + if (mode == VOIDmode) + return false; + + switch (GET_CODE (op)) + { + /* Integer constants, default to double word. */ + case CONST_INT: + { + if (mode == VOIDmode) + mode = DImode; + + vec_const_integer (op, mode, 0, vec_const); + + /* Splat the constant to the rest of the vector constant structure. */ + unsigned size = GET_MODE_SIZE (mode); + gcc_assert (size <= VECTOR_CONST_BYTES); + gcc_assert ((VECTOR_CONST_BYTES % size) == 0); + + for (size_t splat = size; splat < VECTOR_CONST_BYTES; splat += size) + memcpy ((void *) &vec_const->bytes[splat], + (void *) &vec_const->bytes[0], + size); + break; + } + + /* Floating point constants. */ + case CONST_DOUBLE: + { + /* SFmode stored as scalars is stored in DFmode format. */ + if (mode == SFmode) + mode = DFmode; + + vec_const_floating_point (op, mode, 0, vec_const); + + /* Splat the constant to the rest of the vector constant structure. */ + unsigned size = GET_MODE_SIZE (mode); + gcc_assert (size <= VECTOR_CONST_BYTES); + gcc_assert ((VECTOR_CONST_BYTES % size) == 0); + + for (size_t splat = size; splat < VECTOR_CONST_BYTES; splat += size) + memcpy ((void *) &vec_const->bytes[splat], + (void *) &vec_const->bytes[0], + size); + break; + } + + /* Vector constants, iterate each element. On little endian systems, we + have to reverse the element numbers. Also handle VEC_DUPLICATE. */ + case CONST_VECTOR: + case VEC_DUPLICATE: + { + machine_mode ele_mode = GET_MODE_INNER (mode); + size_t nunits = GET_MODE_NUNITS (mode); + size_t size = GET_MODE_SIZE (ele_mode); + + for (size_t num = 0; num < nunits; num++) + { + rtx ele = (GET_CODE (op) == VEC_DUPLICATE + ? XEXP (op, 0) + : CONST_VECTOR_ELT (op, num)); + size_t byte_num = (BYTES_BIG_ENDIAN + ? num + : nunits - 1 - num) * size; + + if (CONST_INT_P (ele)) + vec_const_integer (ele, ele_mode, byte_num, vec_const); + else if (CONST_DOUBLE_P (ele)) + vec_const_floating_point (ele, ele_mode, byte_num, vec_const); + else + return false; + } + + break; + } + + /* Any thing else, just return failure. */ + default: + return false; + } + + /* Pack half words together. */ + for (size_t i = 0; i < VECTOR_CONST_16BIT; i++) + vec_const->h_words[i] = ((vec_const->bytes[2*i] << 8) + | vec_const->bytes[2*i + 1]); + + /* Pack words together. */ + for (size_t i = 0; i < VECTOR_CONST_32BIT; i++) + { + unsigned word = 0; + for (size_t j = 0; j < 4; j++) + word = (word << 8) | vec_const->bytes[(4*i) + j]; + + vec_const->words[i] = word; + } + + /* Pack double words together. */ + for (size_t i = 0; i < VECTOR_CONST_64BIT; i++) + { + unsigned HOST_WIDE_INT d_word = 0; + for (size_t j = 0; j < 8; j++) + d_word = (d_word << 8) | vec_const->bytes[(8*i) + j]; + + vec_const->d_words[i] = d_word; + } + + /* Remember original mode and code. */ + vec_const->orig_mode = mode; + vec_const->orig_code = GET_CODE (op); + + return true; +} + + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-rs6000.h" diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md index 6bec2bddbde..cf42b6d2058 100644 --- a/gcc/config/rs6000/rs6000.md +++ b/gcc/config/rs6000/rs6000.md @@ -314,6 +314,11 @@ (eq_attr "type" "integer,add") (if_then_else (match_test "prefixed_paddi_p (insn)") + (const_string "yes") + (const_string "no")) + + (eq_attr "type" "vecperm") + (if_then_else (match_test "prefixed_xxsplti_p (insn)") (const_string "yes") (const_string "no"))] @@ -7759,17 +7764,17 @@ ;; ;; LWZ LFS LXSSP LXSSPX STFS STXSSP ;; STXSSPX STW XXLXOR LI FMR XSCPSGNDP -;; MR MT MF NOP +;; MR MT MF NOP XXSPLTIDP (define_insn "movsf_hardfloat" [(set (match_operand:SF 0 "nonimmediate_operand" "=!r, f, v, wa, m, wY, Z, m, wa, !r, f, wa, - !r, *c*l, !r, *h") + !r, *c*l, !r, *h, wa") (match_operand:SF 1 "input_operand" "m, m, wY, Z, f, v, wa, r, j, j, f, wa, - r, r, *h, 0"))] + r, r, *h, 0, eD"))] "(register_operand (operands[0], SFmode) || register_operand (operands[1], SFmode)) && TARGET_HARD_FLOAT @@ -7791,15 +7796,16 @@ mr %0,%1 mt%0 %1 mf%1 %0 - nop" + nop + #" [(set_attr "type" "load, fpload, fpload, fpload, fpstore, fpstore, fpstore, store, veclogical, integer, fpsimple, fpsimple, - *, mtjmpr, mfjmpr, *") + *, mtjmpr, mfjmpr, *, vecperm") (set_attr "isa" "*, *, p9v, p8v, *, p9v, p8v, *, *, *, *, *, - *, *, *, *")]) + *, *, *, *, p10")]) ;; LWZ LFIWZX STW STFIWX MTVSRWZ MFVSRWZ ;; FMR MR MT%0 MF%1 NOP @@ -8059,18 +8065,18 @@ ;; STFD LFD FMR LXSD STXSD ;; LXSD STXSD XXLOR XXLXOR GPR<-0 -;; LWZ STW MR +;; LWZ STW MR XXSPLTIDP (define_insn "*mov_hardfloat32" [(set (match_operand:FMOVE64 0 "nonimmediate_operand" "=m, d, d, , wY, , Z, , , !r, - Y, r, !r") + Y, r, !r, wa") (match_operand:FMOVE64 1 "input_operand" "d, m, d, wY, , Z, , , , , - r, Y, r"))] + r, Y, r, eD"))] "! TARGET_POWERPC64 && TARGET_HARD_FLOAT && (gpc_reg_operand (operands[0], mode) || gpc_reg_operand (operands[1], mode))" @@ -8087,20 +8093,21 @@ # # # + # #" [(set_attr "type" "fpstore, fpload, fpsimple, fpload, fpstore, fpload, fpstore, veclogical, veclogical, two, - store, load, two") + store, load, two, vecperm") (set_attr "size" "64") (set_attr "length" "*, *, *, *, *, *, *, *, *, 8, - 8, 8, 8") + 8, 8, 8, *") (set_attr "isa" "*, *, *, p9v, p9v, p7v, p7v, *, *, *, - *, *, *")]) + *, *, *, p10")]) ;; STW LWZ MR G-const H-const F-const @@ -8127,19 +8134,19 @@ ;; STFD LFD FMR LXSD STXSD ;; LXSDX STXSDX XXLOR XXLXOR LI 0 ;; STD LD MR MT{CTR,LR} MF{CTR,LR} -;; NOP MFVSRD MTVSRD +;; NOP MFVSRD MTVSRD XXSPLTIDP (define_insn "*mov_hardfloat64" [(set (match_operand:FMOVE64 0 "nonimmediate_operand" "=m, d, d, , wY, , Z, , , !r, YZ, r, !r, *c*l, !r, - *h, r, ") + *h, r, , wa") (match_operand:FMOVE64 1 "input_operand" "d, m, d, wY, , Z, , , , , r, YZ, r, r, *h, - 0, , r"))] + 0, , r, eD"))] "TARGET_POWERPC64 && TARGET_HARD_FLOAT && (gpc_reg_operand (operands[0], mode) || gpc_reg_operand (operands[1], mode))" @@ -8161,18 +8168,19 @@ mf%1 %0 nop mfvsrd %0,%x1 - mtvsrd %x0,%1" + mtvsrd %x0,%1 + #" [(set_attr "type" "fpstore, fpload, fpsimple, fpload, fpstore, fpload, fpstore, veclogical, veclogical, integer, store, load, *, mtjmpr, mfjmpr, - *, mfvsr, mtvsr") + *, mfvsr, mtvsr, vecperm") (set_attr "size" "64") (set_attr "isa" "*, *, *, p9v, p9v, p7v, p7v, *, *, *, *, *, *, *, *, - *, p8v, p8v")]) + *, p8v, p8v, p10")]) ;; STD LD MR MT MF G-const ;; H-const F-const Special @@ -9220,6 +9228,7 @@ ;; a gpr into a fpr instead of reloading an invalid 'Y' address ;; GPR store GPR load GPR move FPR store FPR load FPR move +;; XXSPLTIDP ;; GPR const AVX store AVX store AVX load AVX load VSX move ;; P9 0 P9 -1 AVX 0/-1 VSX 0 VSX -1 P9 const ;; AVX const @@ -9227,11 +9236,13 @@ (define_insn "*movdi_internal32" [(set (match_operand:DI 0 "nonimmediate_operand" "=Y, r, r, m, ^d, ^d, + ^wa, r, wY, Z, ^v, $v, ^wa, wa, wa, v, wa, *i, v, v") (match_operand:DI 1 "input_operand" "r, Y, r, ^d, m, ^d, + eD, IJKnF, ^v, $v, wY, Z, ^wa, Oj, wM, OjwM, Oj, wM, wS, wB"))] @@ -9246,6 +9257,7 @@ lfd%U1%X1 %0,%1 fmr %0,%1 # + # stxsd %1,%0 stxsdx %x1,%y0 lxsd %0,%1 @@ -9260,17 +9272,20 @@ #" [(set_attr "type" "store, load, *, fpstore, fpload, fpsimple, + vecperm, *, fpstore, fpstore, fpload, fpload, veclogical, vecsimple, vecsimple, vecsimple, veclogical,veclogical,vecsimple, vecsimple") (set_attr "size" "64") (set_attr "length" "8, 8, 8, *, *, *, + *, 16, *, *, *, *, *, *, *, *, *, *, 8, *") (set_attr "isa" "*, *, *, *, *, *, + p10, *, p9v, p7v, p9v, p7v, *, p9v, p9v, p7v, *, *, p7v, p7v")]) @@ -9306,6 +9321,7 @@ }) ;; GPR store GPR load GPR move +;; XXSPLTIDP ;; GPR li GPR lis GPR pli GPR # ;; FPR store FPR load FPR move ;; AVX store AVX store AVX load AVX load VSX move @@ -9316,6 +9332,7 @@ (define_insn "*movdi_internal64" [(set (match_operand:DI 0 "nonimmediate_operand" "=YZ, r, r, + ^wa, r, r, r, r, m, ^d, ^d, wY, Z, $v, $v, ^wa, @@ -9325,6 +9342,7 @@ ?r, ?wa") (match_operand:DI 1 "input_operand" "r, YZ, r, + eD, I, L, eI, nF, ^d, m, ^d, ^v, $v, wY, Z, ^wa, @@ -9339,6 +9357,7 @@ std%U0%X0 %1,%0 ld%U1%X1 %0,%1 mr %0,%1 + # li %0,%1 lis %0,%v1 li %0,%1 @@ -9365,6 +9384,7 @@ mtvsrd %x0,%1" [(set_attr "type" "store, load, *, + vecperm, *, *, *, *, fpstore, fpload, fpsimple, fpstore, fpstore, fpload, fpload, veclogical, @@ -9375,6 +9395,7 @@ (set_attr "size" "64") (set_attr "length" "*, *, *, + *, *, *, *, 20, *, *, *, *, *, *, *, *, @@ -9384,6 +9405,7 @@ *, *") (set_attr "isa" "*, *, *, + p10, *, *, p10, *, *, *, *, p9v, p7v, p9v, p7v, *, diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt index 9d7878f144a..1d7ce4cc94a 100644 --- a/gcc/config/rs6000/rs6000.opt +++ b/gcc/config/rs6000/rs6000.opt @@ -640,6 +640,10 @@ mprivileged Target Var(rs6000_privileged) Init(0) Generate code that will run in privileged state. +mxxspltidp +Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save +Generate (do not generate) XXSPLTIDP instructions. + -param=rs6000-density-pct-threshold= Target Undocumented Joined UInteger Var(rs6000_density_pct_threshold) Init(85) IntegerRange(0, 100) Param When costing for loop vectorization, we probably need to penalize the loop body diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index bf033e31c1c..7b2d2551c7b 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -1192,17 +1192,17 @@ ;; VSX store VSX load VSX move VSX->GPR GPR->VSX LQ (GPR) ;; STQ (GPR) GPR load GPR store GPR move XXSPLTIB VSPLTISW -;; VSX 0/-1 VMX const GPR const LVX (VMX) STVX (VMX) +;; VSX 0/-1 VMX const GPR const LVX (VMX) STVX (VMX) XXLSPLTIDP (define_insn "vsx_mov_64bit" [(set (match_operand:VSX_M 0 "nonimmediate_operand" "=ZwO, wa, wa, r, we, ?wQ, ?&r, ??r, ??Y, , wa, v, - ?wa, v, , wZ, v") + ?wa, v, , wZ, v, wa") (match_operand:VSX_M 1 "input_operand" "wa, ZwO, wa, we, r, r, wQ, Y, r, r, wE, jwM, - ?jwM, W, , v, wZ"))] + ?jwM, W, , v, wZ, eD"))] "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (mode) && (register_operand (operands[0], mode) @@ -1213,37 +1213,37 @@ [(set_attr "type" "vecstore, vecload, vecsimple, mtvsr, mfvsr, load, store, load, store, *, vecsimple, vecsimple, - vecsimple, *, *, vecstore, vecload") + vecsimple, *, *, vecstore, vecload, vecperm") (set_attr "num_insns" "*, *, *, 2, *, 2, 2, 2, 2, 2, *, *, - *, 5, 2, *, *") + *, 5, 2, *, *, *") (set_attr "max_prefixed_insns" "*, *, *, *, *, 2, 2, 2, 2, 2, *, *, - *, *, *, *, *") + *, *, *, *, *, *") (set_attr "length" "*, *, *, 8, *, 8, 8, 8, 8, 8, *, *, - *, 20, 8, *, *") + *, 20, 8, *, *, *") (set_attr "isa" ", , , *, *, *, *, *, *, *, p9v, *, - , *, *, *, *")]) + , *, *, *, *, p10")]) ;; VSX store VSX load VSX move GPR load GPR store GPR move ;; XXSPLTIB VSPLTISW VSX 0/-1 VMX const GPR const -;; LVX (VMX) STVX (VMX) +;; LVX (VMX) STVX (VMX) XXSPLTID LXVKQ (define_insn "*vsx_mov_32bit" [(set (match_operand:VSX_M 0 "nonimmediate_operand" "=ZwO, wa, wa, ??r, ??Y, , wa, v, ?wa, v, , - wZ, v") + wZ, v, wa") (match_operand:VSX_M 1 "input_operand" "wa, ZwO, wa, Y, r, r, wE, jwM, ?jwM, W, , - v, wZ"))] + v, wZ, eD"))] "!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (mode) && (register_operand (operands[0], mode) @@ -1254,15 +1254,15 @@ [(set_attr "type" "vecstore, vecload, vecsimple, load, store, *, vecsimple, vecsimple, vecsimple, *, *, - vecstore, vecload") + vecstore, vecload, vecperm") (set_attr "length" "*, *, *, 16, 16, 16, *, *, *, 20, 16, - *, *") + *, *, *") (set_attr "isa" ", , , *, *, *, p9v, *, , *, *, - *, *")]) + *, *, p10")]) ;; Explicit load/store expanders for the builtin functions (define_expand "vsx_load_" @@ -6458,6 +6458,36 @@ [(set_attr "type" "vecperm") (set_attr "prefixed" "yes")]) +(define_mode_iterator XXSPLTIDP [DI SF DF V16QI V8HI V4SI V4SF V2DF V2DI]) + +(define_insn "*xxspltidp__internal" + [(set (match_operand:XXSPLTIDP 0 "register_operand" "=wa") + (unspec:XXSPLTIDP [(match_operand:SI 1 "c32bit_cint_operand" "n")] + UNSPEC_XXSPLTIDP))] + "TARGET_POWER10" + "xxspltidp %x0,%1" + [(set_attr "type" "vecperm") + (set_attr "prefixed" "yes")]) + +;; Generate the XXSPLTIDP instruction to support SFmode, DFmode, and DImode +;; scalar constants and vector constants that look like DFmode floating point +;; values where both elements are the same. The constant has to be expressible +;; as a SFmode constant that is not a SFmode denormal value. +(define_split + [(set (match_operand:XXSPLTIDP 0 "vsx_register_operand") + (match_operand:XXSPLTIDP 1 "easy_vector_constant_64bit_element"))] + "TARGET_POWER10" + [(set (match_dup 0) + (unspec:XXSPLTIDP [(match_dup 2)] UNSPEC_XXSPLTIDP))] +{ + rs6000_vec_const vec_const; + if (!vec_const_to_bytes (operands[1], mode, &vec_const) + || !vec_const_use_xxspltidp (&vec_const)) + gcc_unreachable (); + + operands[2] = GEN_INT (vec_const.xxspltidp_immediate); +}) + ;; XXSPLTI32DX built-in function support (define_expand "xxsplti32dx_v4si" [(set (match_operand:V4SI 0 "register_operand" "=wa") diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 41f1850bf6e..b9dfcaf0d44 100644 --- a/gcc/doc/md.texi +++ b/gcc/doc/md.texi @@ -3333,6 +3333,9 @@ The integer constant zero. A constant whose negation is a signed 16-bit constant. @end ifset +@item eD +A constant that can be loaded with the XXSPLTIDP instruction. + @item eI A signed 34-bit integer constant if prefixed instructions are supported. diff --git a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c index bd1502bb30a..dcb30e1d886 100644 --- a/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c +++ b/gcc/testsuite/gcc.target/powerpc/pr86731-fwrapv-longlong.c @@ -24,11 +24,12 @@ vector signed long long splats4(void) return (vector signed long long) vec_sl(mzero, mzero); } -/* Codegen will consist of splat and shift instructions for most types. - If folding is enabled, the vec_sl tests using vector long long type will - generate a lvx instead of a vspltisw+vsld pair. */ +/* Codegen will consist of splat and shift instructions for most types. If + folding is enabled, the vec_sl tests using vector long long type will + generate a lvx instead of a vspltisw+vsld pair. On power10, it will + generate a xxspltidp instruction instead of the lvx. */ /* { dg-final { scan-assembler-times {\mvspltis[bhw]\M} 0 } } */ /* { dg-final { scan-assembler-times {\mvsl[bhwd]\M} 0 } } */ -/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M} 2 } } */ +/* { dg-final { scan-assembler-times {\mp?lxv\M|\mlxv\M|\mlxvd2x\M|\mxxspltidp\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c new file mode 100644 index 00000000000..8f6e176f9af --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c @@ -0,0 +1,60 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +#include + +/* Test generating DFmode constants with the ISA 3.1 (power10) XXSPLTIDP + instruction. */ + +double +scalar_double_0 (void) +{ + return 0.0; /* XXSPLTIB or XXLXOR. */ +} + +double +scalar_double_1 (void) +{ + return 1.0; /* XXSPLTIDP. */ +} + +#ifndef __FAST_MATH__ +double +scalar_double_m0 (void) +{ + return -0.0; /* XXSPLTIDP. */ +} + +double +scalar_double_nan (void) +{ + return __builtin_nan (""); /* XXSPLTIDP. */ +} + +double +scalar_double_inf (void) +{ + return __builtin_inf (); /* XXSPLTIDP. */ +} + +double +scalar_double_m_inf (void) /* XXSPLTIDP. */ +{ + return - __builtin_inf (); +} +#endif + +double +scalar_double_pi (void) +{ + return M_PI; /* PLFD. */ +} + +double +scalar_double_denorm (void) +{ + return 0x1p-149f; /* PLFD. */ +} + +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c new file mode 100644 index 00000000000..75714d0b11d --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-di.c @@ -0,0 +1,70 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* Test generating DImode constants that have the same bit pattern as DFmode + constants that can be loaded with the XXSPLTIDP instruction with the ISA 3.1 + (power10). We use asm to force the value into vector registers. */ + +double +scalar_0 (void) +{ + /* XXSPLTIB or XXLXOR. */ + double d; + long long ll = 0; + + __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll)); + return d; +} + +double +scalar_1 (void) +{ + /* VSPLTISW/VUPKLSW or XXSPLTIB/VEXTSB2D. */ + double d; + long long ll = 1; + + __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll)); + return d; +} + +/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated + with XXSPLTIDP. */ +double +scalar_float_neg_0 (void) +{ + /* XXSPLTIDP. */ + double d; + long long ll = 0x8000000000000000LL; + + __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll)); + return d; +} + +/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with + XXSPLTIDP. */ +double +scalar_float_1_0 (void) +{ + /* XXSPLTIDP. */ + double d; + long long ll = 0x3ff0000000000000LL; + + __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll)); + return d; +} + +/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated + with XXSPLTIDP. */ +double +scalar_pi (void) +{ + /* PLXV. */ + double d; + long long ll = 0x400921fb54442d18LL; + + __asm__ ("xxmr %x0,%x1" : "=wa" (d) : "wa" (ll)); + return d; +} + +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c new file mode 100644 index 00000000000..72504bdfbbd --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c @@ -0,0 +1,60 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +#include + +/* Test generating SFmode constants with the ISA 3.1 (power10) XXSPLTIDP + instruction. */ + +float +scalar_float_0 (void) +{ + return 0.0f; /* XXSPLTIB or XXLXOR. */ +} + +float +scalar_float_1 (void) +{ + return 1.0f; /* XXSPLTIDP. */ +} + +#ifndef __FAST_MATH__ +float +scalar_float_m0 (void) +{ + return -0.0f; /* XXSPLTIDP. */ +} + +float +scalar_float_nan (void) +{ + return __builtin_nanf (""); /* XXSPLTIDP. */ +} + +float +scalar_float_inf (void) +{ + return __builtin_inff (); /* XXSPLTIDP. */ +} + +float +scalar_float_m_inf (void) /* XXSPLTIDP. */ +{ + return - __builtin_inff (); +} +#endif + +float +scalar_float_pi (void) +{ + return (float)M_PI; /* XXSPLTIDP. */ +} + +float +scalar_float_denorm (void) +{ + return 0x1p-149f; /* PLFS. */ +} + +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c new file mode 100644 index 00000000000..82ffc86f8aa --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c @@ -0,0 +1,64 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +#include + +/* Test generating V2DFmode constants with the ISA 3.1 (power10) XXSPLTIDP + instruction. */ + +vector double +v2df_double_0 (void) +{ + return (vector double) { 0.0, 0.0 }; /* XXSPLTIB or XXLXOR. */ +} + +vector double +v2df_double_1 (void) +{ + return (vector double) { 1.0, 1.0 }; /* XXSPLTIDP. */ +} + +#ifndef __FAST_MATH__ +vector double +v2df_double_m0 (void) +{ + return (vector double) { -0.0, -0.0 }; /* XXSPLTIDP. */ +} + +vector double +v2df_double_nan (void) +{ + return (vector double) { __builtin_nan (""), + __builtin_nan ("") }; /* XXSPLTIDP. */ +} + +vector double +v2df_double_inf (void) +{ + return (vector double) { __builtin_inf (), + __builtin_inf () }; /* XXSPLTIDP. */ +} + +vector double +v2df_double_m_inf (void) +{ + return (vector double) { - __builtin_inf (), + - __builtin_inf () }; /* XXSPLTIDP. */ +} +#endif + +vector double +v2df_double_pi (void) +{ + return (vector double) { M_PI, M_PI }; /* PLVX. */ +} + +vector double +v2df_double_denorm (void) +{ + return (vector double) { (double)0x1p-149f, + (double)0x1p-149f }; /* PLVX. */ +} + +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */ diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c new file mode 100644 index 00000000000..4d44f943d26 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2di.c @@ -0,0 +1,50 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target power10_ok } */ +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ + +/* Test generating V2DImode constants that have the same bit pattern as + V2DFmode constants that can be loaded with the XXSPLTIDP instruction with + the ISA 3.1 (power10). */ + +vector long long +vector_0 (void) +{ + /* XXSPLTIB or XXLXOR. */ + return (vector long long) { 0LL, 0LL }; +} + +vector long long +vector_1 (void) +{ + /* XXSPLTIB and VEXTSB2D. */ + return (vector long long) { 1LL, 1LL }; +} + +/* 0x8000000000000000LL is the bit pattern for -0.0, which can be generated + with XXSPLTISDP. */ +vector long long +vector_float_neg_0 (void) +{ + /* XXSPLTIDP. */ + return (vector long long) { 0x8000000000000000LL, 0x8000000000000000LL }; +} + +/* 0x3ff0000000000000LL is the bit pattern for 1.0 which can be generated with + XXSPLTISDP. */ +vector long long +vector_float_1_0 (void) +{ + /* XXSPLTIDP. */ + return (vector long long) { 0x3ff0000000000000LL, 0x3ff0000000000000LL }; +} + +/* 0x400921fb54442d18LL is the bit pattern for PI, which cannot be generated + with XXSPLTIDP. */ +vector long long +scalar_pi (void) +{ + /* PLXV. */ + return (vector long long) { 0x400921fb54442d18LL, 0x400921fb54442d18LL }; +} + +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */