public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc(refs/users/meissner/heads/work047)] Use XXSPLTI32DX to generate some constants.
@ 2021-04-13 18:04 Michael Meissner
0 siblings, 0 replies; only message in thread
From: Michael Meissner @ 2021-04-13 18:04 UTC (permalink / raw)
To: gcc-cvs
https://gcc.gnu.org/g:a34582b843b2ef84e05679f3a56d09c4f8941024
commit a34582b843b2ef84e05679f3a56d09c4f8941024
Author: Michael Meissner <meissner@linux.ibm.com>
Date: Tue Apr 13 14:04:22 2021 -0400
Use XXSPLTI32DX to generate some constants.
This patch generates a pair of XXSPLTI32DX instructions to load 64-bit
scalar or 128-bit vector constants into the vector registers.
I added a new constraint (eD) for constants that can be loaded with
XXSPLTI32DX, but cannot be loaded with the XXSPLTIDP or XXSPLTIW
instructions.
I added a debug switch (-mxxsplti32dx) to control whether this behavior is
on or off.
For vector moves, I bumped up the size of expanding a vector constant from
5 instructions (20 bytes) to 6 instructions (24 bytes). This is to
accomidate the size of two prefixed instructions.
gcc/
2021-04-13 Michael Meissner <meissner@linux.ibm.com>
* config/rs6000/altivec.me (UNSPEC_XXSPLTI32DX): Move to vsx.md.
(xxsplti32dx_v4si): Move to vsx.md.
(xxsplti32dx_v4si_inst): Move to vsx.md.
(xxsplti32dx_v4sf): Move to vsx.md.
(xxsplti32dx_v4sf_inst): Move to vsx.md.
* config/rs6000/contraints.md (eD): New constraint.
* config/rs6000/predicates.md (easy_fp_constant): If we can load
the constant with a pair of XXSPLTI32DX instructions, it is easy.
(xxsplti32dx_operand): New predicate.
(easy_vector_constant): If we can load the constant with a pair of
XXSPLTI32DX instructions, it is easy.
* config/rs6000/rs6000-cpus.def (OTHER_POWER10_MASKS): Add
-mxxsplti32dx.
(POWERPC_MASKS): Add -mxxsplti32dx.
* config/rs6000/rs6000-protos.h (xxsplti32dx_constant_p): New
declaration.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Add
-mxxsplti32dx support.
(xxsplti32dx_constant_p): New helper function.
(output_vec_const_move): Split constants that need XXSPLTI32DX.
(rs6000_opt_masks): Add -mxxsplti32dx.
* config/rs6000/rs6000.md (movsf_hardfloat): Add support for
loading constants with XXSPLTI32DX.
(mov<mode>_hardfloat32, FMOVE64 iterator): Add support for loading
constants with XXSPLTI32DX.
(mov<mode>_hardfloat64, FMOVE64 iterator): Add support for loading
constants with XXSPLTI32DX.
* config/rs6000/rs6000.opt (-mxxsplti32dx): New switch.
* config/rs6000/vsx.md (UNSPEC_XXSPLTI32DX): Move unspec here from
altivec.md.
(UNSPEC_XXSPLTI32DX_CONST): New unspec.
(vsx_mov<mode>_64bit): Bump up size of 'W' vector constants to
accomidate a pair of XXSPLTI32DX instructions.
(vsx_mov<mode>_32bit): Bump up size of 'W' vector constants to
accomidate a pair of XXSPLTI32DX instructions.
(XXSPLTI32DX): New mode iterator.
(xxsplti32dx_<mode>): New insn and splits.
(xxsplti32dx_<mode>_first): New insns.
(xxsplti32dx_<mode>_second): New insns.
(xxsplti32dx_v4si): Move here from altivec.md.
(xxsplti32dx_v4si_inst): Move here from altivec.md.
(xxsplti32dx_v4sf): Move here from altivec.md.
(xxsplti32dx_v4sf_inst): Move here from altivec.md.
gcc/testsuite/
2021-04-13 Michael Meissner <meissner@linux.ibm.com>
* gcc.target/powerpc/vec-splati-runnable.c: Update insn count.
* gcc.target/powerpc/vec-splat-constant-sf.c: Update insn count.
* gcc.target/powerpc/vec-splat-constant-df.c: Update insn count.
* gcc.target/powerpc/vec-splat-constant-v2df.c: Update insn
count.
Diff:
---
gcc/config/rs6000/altivec.md | 60 ---------
gcc/config/rs6000/constraints.md | 6 +
gcc/config/rs6000/predicates.md | 22 ++++
gcc/config/rs6000/rs6000-cpus.def | 2 +
gcc/config/rs6000/rs6000-protos.h | 1 +
gcc/config/rs6000/rs6000.c | 133 ++++++++++++++++++-
gcc/config/rs6000/rs6000.md | 44 +++++--
gcc/config/rs6000/rs6000.opt | 4 +
gcc/config/rs6000/vsx.md | 142 ++++++++++++++++++++-
.../gcc.target/powerpc/vec-splat-constant-df.c | 9 +-
.../gcc.target/powerpc/vec-splat-constant-sf.c | 5 +-
.../gcc.target/powerpc/vec-splat-constant-v2df.c | 10 +-
.../gcc.target/powerpc/vec-splati-runnable.c | 2 +-
13 files changed, 354 insertions(+), 86 deletions(-)
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index ad6ead04cfa..9af71e036ab 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -176,7 +176,6 @@
UNSPEC_VSTRIL
UNSPEC_SLDB
UNSPEC_SRDB
- UNSPEC_XXSPLTI32DX
UNSPEC_XXBLEND
UNSPEC_XXPERMX
])
@@ -818,65 +817,6 @@
"vs<SLDB_lr>dbi %0,%1,%2,%3"
[(set_attr "type" "vecsimple")])
-(define_expand "xxsplti32dx_v4si"
- [(set (match_operand:V4SI 0 "register_operand" "=wa")
- (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "0")
- (match_operand:QI 2 "u1bit_cint_operand" "n")
- (match_operand:SI 3 "s32bit_cint_operand" "n")]
- UNSPEC_XXSPLTI32DX))]
- "TARGET_POWER10"
-{
- int index = INTVAL (operands[2]);
-
- if (!BYTES_BIG_ENDIAN)
- index = 1 - index;
-
- emit_insn (gen_xxsplti32dx_v4si_inst (operands[0], operands[1],
- GEN_INT (index), operands[3]));
- DONE;
-}
- [(set_attr "type" "vecsimple")])
-
-(define_insn "xxsplti32dx_v4si_inst"
- [(set (match_operand:V4SI 0 "register_operand" "=wa")
- (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "0")
- (match_operand:QI 2 "u1bit_cint_operand" "n")
- (match_operand:SI 3 "s32bit_cint_operand" "n")]
- UNSPEC_XXSPLTI32DX))]
- "TARGET_POWER10"
- "xxsplti32dx %x0,%2,%3"
- [(set_attr "type" "vecsimple")
- (set_attr "prefixed" "yes")])
-
-(define_expand "xxsplti32dx_v4sf"
- [(set (match_operand:V4SF 0 "register_operand" "=wa")
- (unspec:V4SF [(match_operand:V4SF 1 "register_operand" "0")
- (match_operand:QI 2 "u1bit_cint_operand" "n")
- (match_operand:SF 3 "const_double_operand" "n")]
- UNSPEC_XXSPLTI32DX))]
- "TARGET_POWER10"
-{
- int index = INTVAL (operands[2]);
- long value = rs6000_const_f32_to_i32 (operands[3]);
- if (!BYTES_BIG_ENDIAN)
- index = 1 - index;
-
- emit_insn (gen_xxsplti32dx_v4sf_inst (operands[0], operands[1],
- GEN_INT (index), GEN_INT (value)));
- DONE;
-})
-
-(define_insn "xxsplti32dx_v4sf_inst"
- [(set (match_operand:V4SF 0 "register_operand" "=wa")
- (unspec:V4SF [(match_operand:V4SF 1 "register_operand" "0")
- (match_operand:QI 2 "u1bit_cint_operand" "n")
- (match_operand:SI 3 "s32bit_cint_operand" "n")]
- UNSPEC_XXSPLTI32DX))]
- "TARGET_POWER10"
- "xxsplti32dx %x0,%2,%3"
- [(set_attr "type" "vecsimple")
- (set_attr "prefixed" "yes")])
-
(define_insn "xxblend_<mode>"
[(set (match_operand:VM3 0 "register_operand" "=wa")
(unspec:VM3 [(match_operand:VM3 1 "register_operand" "wa")
diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index e1fadd63580..d665e2a94db 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -208,6 +208,12 @@
(and (match_code "const_int")
(match_test "((- (unsigned HOST_WIDE_INT) ival) + 0x8000) < 0x10000")))
+;; SF/DF/V2DF/DI/V2DI scalar or vector constant that can be loaded with a pair
+;; of XXSPLTI32DX instructions.
+(define_constraint "eD"
+ "A vector constant that can be loaded with XXSPLTI32DX instructions."
+ (match_operand 0 "xxsplti32dx_operand"))
+
;; SF/DF/V2DF scalar or vector constant that can be loaded with XXSPLTIDP
(define_constraint "eF"
"A vector constant that can be loaded with the XXSPLTIDP instruction."
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 8c461ba2b76..01e5e09e0a6 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -606,6 +606,11 @@
if (xxspltidp_operand (op, mode))
return 1;
+ /* If we have the ISA 3.1 XXSPLTI32DX instruction, see if the constant can
+ be loaded with a pair of those instructions. */
+ if (xxsplti32dx_operand (op, mode))
+ return 1;
+
/* Otherwise consider floating point constants hard, so that the
constant gets pushed to memory during the early RTL phases. This
has the advantage that double precision constants that can be
@@ -684,6 +689,20 @@
return xxspltidp_constant_p (op, mode, &value);
})
+;; Return 1 if operand is a SF/DF CONST_DOUBLE or V2DF CONST_VECTOR that can be
+;; loaded via a pair f ISA 3.1 XXSPLTI32DX instructions. Do not return true if
+;; the value is 0.0 or it can be loaded with XXSPLTIDP, since that is easy to
+;; generate without using XXSPLTI32DX.
+(define_predicate "xxsplti32dx_operand"
+ (match_code "const_double,const_int,const_vector,vec_duplicate")
+{
+ if (op == CONST0_RTX (mode))
+ return false;
+
+ HOST_WIDE_INT value = 0;
+ return xxsplti32dx_constant_p (op, mode, &value);
+})
+
;; Return 1 if the operand is a CONST_VECTOR and can be loaded into a
;; vector register without using memory.
(define_predicate "easy_vector_constant"
@@ -703,6 +722,9 @@
if (xxspltidp_operand (op, mode))
return true;
+ if (xxsplti32dx_operand (op, mode))
+ return true;
+
if (TARGET_P9_VECTOR
&& xxspltib_constant_p (op, mode, &num_insns, &value))
return true;
diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def
index 3b657e490b1..b0821b34a69 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -86,6 +86,7 @@
| OPTION_MASK_P10_FUSION \
| OPTION_MASK_P10_FUSION_LD_CMPI \
| OPTION_MASK_P10_FUSION_2LOGICAL \
+ | OPTION_MASK_XXSPLTI32DX \
| OPTION_MASK_XXSPLTIDP \
| OPTION_MASK_XXSPLTIW)
@@ -163,6 +164,7 @@
| OPTION_MASK_SOFT_FLOAT \
| OPTION_MASK_STRICT_ALIGN_OPTIONAL \
| OPTION_MASK_VSX \
+ | OPTION_MASK_XXSPLTI32DX \
| OPTION_MASK_XXSPLTIDP \
| OPTION_MASK_XXSPLTIW)
#endif
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 5cd0b341fa6..4efed10cc53 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -33,6 +33,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, int, int, int,
extern bool easy_altivec_constant (rtx, machine_mode);
extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
extern bool xxspltidp_constant_p (rtx, machine_mode, HOST_WIDE_INT *);
+extern bool xxsplti32dx_constant_p (rtx, machine_mode, HOST_WIDE_INT *);
extern int vspltis_shifted (rtx);
extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 07958c82761..ca3c7cb6020 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4481,6 +4481,9 @@ rs6000_option_override_internal (bool global_init_p)
if (TARGET_POWER10 && TARGET_VSX)
{
+ if ((rs6000_isa_flags_explicit & OPTION_MASK_XXSPLTI32DX) == 0)
+ rs6000_isa_flags |= OPTION_MASK_XXSPLTI32DX;
+
if ((rs6000_isa_flags_explicit & OPTION_MASK_XXSPLTIW) == 0)
rs6000_isa_flags |= OPTION_MASK_XXSPLTIW;
@@ -4488,7 +4491,9 @@ rs6000_option_override_internal (bool global_init_p)
rs6000_isa_flags |= OPTION_MASK_XXSPLTIDP;
}
else
- rs6000_isa_flags &= ~(OPTION_MASK_XXSPLTIW | OPTION_MASK_XXSPLTIDP);
+ rs6000_isa_flags &= ~(OPTION_MASK_XXSPLTIW
+ | OPTION_MASK_XXSPLTIDP
+ | OPTION_MASK_XXSPLTI32DX);
if (TARGET_DEBUG_REG || TARGET_DEBUG_TARGET)
rs6000_print_isa_options (stderr, 0, "after subtarget", rs6000_isa_flags);
@@ -6549,6 +6554,128 @@ xxspltidp_constant_p (rtx op,
return true;
}
+/* Return true if OP is of the given MODE and can be synthesized with ISA 3.1
+ XXSPLTI32DX instruction. If the instruction can be synthesized with
+ XXSPLTIDP or is 0/-1, return false.
+
+ Return the 64-bit constant to use in the two XXSPLTI32DX instructions via
+ CONSTANT_PTR. */
+
+bool
+xxsplti32dx_constant_p (rtx op,
+ machine_mode mode,
+ HOST_WIDE_INT *constant_ptr)
+{
+ *constant_ptr = 0;
+
+ if (!TARGET_XXSPLTI32DX)
+ return false;
+
+ if (mode == VOIDmode)
+ mode = GET_MODE (op);
+
+ if (op == CONST0_RTX (mode))
+ return false;
+
+ rtx element = op;
+ if (mode == V2DFmode || mode == V2DImode)
+ {
+ /* Handle VEC_DUPLICATE and CONST_VECTOR. */
+ if (GET_CODE (op) == VEC_DUPLICATE)
+ element = XEXP (op, 0);
+
+ else if (GET_CODE (op) == CONST_VECTOR)
+ {
+ element = CONST_VECTOR_ELT (op, 0);
+ if (!rtx_equal_p (element, CONST_VECTOR_ELT (op, 1)))
+ return false;
+ }
+
+ else
+ return false;
+
+ mode = GET_MODE_INNER (mode);
+ }
+
+ else if (mode == V4SImode || mode == V4SFmode)
+ {
+ /* For V4SI/V4SF, the XXSPLTI32DX instruction pair can represent vectors
+ where the two even elements are equal and the two odd elements are
+ equal. */
+ if (GET_CODE (op) != CONST_VECTOR)
+ return false;
+
+ rtx op0 = CONST_VECTOR_ELT (op, 0);
+ if (!rtx_equal_p (op0, CONST_VECTOR_ELT (op, 2)))
+ return false;
+
+ rtx op1 = CONST_VECTOR_ELT (op, 1);
+ if (!rtx_equal_p (op1, CONST_VECTOR_ELT (op, 3)))
+ return false;
+
+ if (rtx_equal_p (op0, op1))
+ return false;
+
+ long op0_value;
+ long op1_value;
+ if (mode == V4SImode)
+ {
+ op0_value = INTVAL (op0);
+ op1_value = INTVAL (op1);
+ }
+ else
+ {
+ op0_value = rs6000_const_f32_to_i32 (op0);
+ op1_value = rs6000_const_f32_to_i32 (op1);
+ }
+
+ *constant_ptr = (op0_value << 32) | (op1_value & 0xffffffff);
+ return true;
+ }
+
+ if (GET_MODE (element) != mode)
+ return false;
+
+ /* Handle floating point constants. */
+ if (mode == SFmode || mode == DFmode)
+ {
+ HOST_WIDE_INT xxspltidp_value = 0;
+
+ if (!CONST_DOUBLE_P (element))
+ return false;
+
+ if (xxspltidp_constant_p (element, mode, &xxspltidp_value))
+ return false;
+
+ long high_low[2];
+ const struct real_value *rv = CONST_DOUBLE_REAL_VALUE (element);
+ REAL_VALUE_TO_TARGET_DOUBLE (*rv, high_low);
+
+ if (!BYTES_BIG_ENDIAN)
+ std::swap (high_low[0], high_low[1]);
+
+ *constant_ptr = (high_low[0] << 32) | (high_low[1] & 0xffffffff);
+ return true;
+ }
+
+ /* Handle integer constants. */
+ else if (mode == DImode)
+ {
+ if (!CONST_INT_P (element))
+ return false;
+
+ HOST_WIDE_INT value = INTVAL (element);
+ if (value == -1)
+ return false;
+
+ *constant_ptr = value;
+ return true;
+ }
+
+ else
+ return false;
+}
+
const char *
output_vec_const_move (rtx *operands)
{
@@ -6597,6 +6724,9 @@ output_vec_const_move (rtx *operands)
|| xxspltidp_operand (vec, mode))
return "#";
+ if (xxsplti32dx_operand (vec, mode))
+ return "#";
+
if (TARGET_P9_VECTOR
&& xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
{
@@ -24094,6 +24224,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] =
{ "string", 0, false, true },
{ "update", OPTION_MASK_NO_UPDATE, true , true },
{ "vsx", OPTION_MASK_VSX, false, true },
+ { "xxsplti32dx", OPTION_MASK_XXSPLTI32DX, false, true },
{ "xxspltiw", OPTION_MASK_XXSPLTIW, false, true },
{ "xxspltidp", OPTION_MASK_XXSPLTIDP, false, true },
#ifdef OPTION_MASK_64BIT
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 1996fd1ece3..cd292a798cb 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -7564,17 +7564,17 @@
;;
;; LWZ LFS LXSSP LXSSPX STFS STXSSP
;; STXSSPX STW XXLXOR LI FMR XSCPSGNDP
-;; MR MT<x> MF<x> NOP XXSPLTIDP
+;; MR MT<x> MF<x> NOP XXSPLTIDP XXSPLTI32DX
(define_insn "movsf_hardfloat"
[(set (match_operand:SF 0 "nonimmediate_operand"
"=!r, f, v, wa, m, wY,
Z, m, wa, !r, f, wa,
- !r, *c*l, !r, *h, wa")
+ !r, *c*l, !r, *h, wa, wa")
(match_operand:SF 1 "input_operand"
"m, m, wY, Z, f, v,
wa, r, j, j, f, wa,
- r, r, *h, 0, eF"))]
+ r, r, *h, 0, eF, eD"))]
"(register_operand (operands[0], SFmode)
|| register_operand (operands[1], SFmode))
&& TARGET_HARD_FLOAT
@@ -7597,19 +7597,28 @@
mt%0 %1
mf%1 %0
nop
+ #
#"
[(set_attr "type"
"load, fpload, fpload, fpload, fpstore, fpstore,
fpstore, store, veclogical, integer, fpsimple, fpsimple,
- *, mtjmpr, mfjmpr, *, vecperm")
+ *, mtjmpr, mfjmpr, *, vecperm, vecperm")
(set_attr "isa"
"*, *, p9v, p8v, *, p9v,
p8v, *, *, *, *, *,
- *, *, *, *, p10")
+ *, *, *, *, p10, p10")
(set_attr "prefixed"
"*, *, *, *, *, *,
*, *, *, *, *, *,
- *, *, *, *, yes")])
+ *, *, *, *, yes, yes")
+ (set_attr "max_prefixed_insns"
+ "*, *, *, *, *, *,
+ *, *, *, *, *, *,
+ *, *, *, *, *, 2")
+ (set_attr "num_insns"
+ "*, *, *, *, *, *,
+ *, *, *, *, *, *,
+ *, *, *, *, *, 2")])
;; LWZ LFIWZX STW STFIWX MTVSRWZ MFVSRWZ
;; FMR MR MT%0 MF%1 NOP
@@ -7869,18 +7878,18 @@
;; STFD LFD FMR LXSD STXSD
;; LXSD STXSD XXLOR XXLXOR GPR<-0
-;; LWZ STW MR XXSPLTIDP
+;; LWZ STW MR XXSPLTIDP XXSPLTI32DX
(define_insn "*mov<mode>_hardfloat32"
[(set (match_operand:FMOVE64 0 "nonimmediate_operand"
"=m, d, d, <f64_p9>, wY,
<f64_av>, Z, <f64_vsx>, <f64_vsx>, !r,
- Y, r, !r, wa")
+ Y, r, !r, wa, wa")
(match_operand:FMOVE64 1 "input_operand"
"d, m, d, wY, <f64_p9>,
Z, <f64_av>, <f64_vsx>, <zero_fp>, <zero_fp>,
- r, Y, r, eF"))]
+ r, Y, r, eF, eD"))]
"! TARGET_POWERPC64 && TARGET_HARD_FLOAT
&& (gpc_reg_operand (operands[0], <MODE>mode)
|| gpc_reg_operand (operands[1], <MODE>mode))"
@@ -7898,24 +7907,33 @@
#
#
#
+ #
#"
[(set_attr "type"
"fpstore, fpload, fpsimple, fpload, fpstore,
fpload, fpstore, veclogical, veclogical, two,
- store, load, two, vecperm")
+ store, load, two, vecperm, vecperm")
(set_attr "size" "64")
(set_attr "length"
"*, *, *, *, *,
*, *, *, *, 8,
- 8, 8, 8, *")
+ 8, 8, 8, *, *")
(set_attr "isa"
"*, *, *, p9v, p9v,
p7v, p7v, *, *, *,
- *, *, *, p10")
+ *, *, *, p10, p10")
(set_attr "prefixed"
"*, *, *, *, *,
*, *, *, *, *,
- *, *, *, yes")])
+ *, *, *, yes, yes")
+ (set_attr "max_prefixed_insns"
+ "*, *, *, *, *,
+ *, *, *, *, *,
+ *, *, *, *, 2")
+ (set_attr "num_insns"
+ "*, *, *, *, *,
+ *, *, *, *, *,
+ *, *, *, *, 2")])
;; STW LWZ MR G-const H-const F-const
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 6620cdb7716..bd269369ca0 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -627,3 +627,7 @@ Generate (do not generate) the XXSPLTIW instruction.
mxxspltidp
Target Undocumented Mask(XXSPLTIDP) Var(rs6000_isa_flags)
Generate (do not generate) the XXSPLTIDP instruction.
+
+mxxsplti32dx
+Target Undocumented Mask(XXSPLTI32DX) Var(rs6000_isa_flags)
+Generate (do not generate) the XXSPLTI32DX instruction.
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 951ce659872..56e3cf6756f 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -370,6 +370,8 @@
UNSPEC_VDIVES
UNSPEC_VDIVEU
UNSPEC_XXSPLTIDP
+ UNSPEC_XXSPLTI32DX
+ UNSPEC_XXSPLTI32DX_CONST
])
(define_int_iterator XVCVBF16 [UNSPEC_VSX_XVCVSPBF16
@@ -1193,7 +1195,7 @@
(set_attr "num_insns"
"*, *, *, 2, *, 2,
2, 2, 2, 2, *, *,
- *, 5, 2, *, *")
+ *, 6, 2, *, *")
(set_attr "max_prefixed_insns"
"*, *, *, *, *, 2,
2, 2, 2, 2, *, *,
@@ -1201,7 +1203,7 @@
(set_attr "length"
"*, *, *, 8, *, 8,
8, 8, 8, 8, *, *,
- *, 20, 8, *, *")
+ *, 24, 8, *, *")
(set_attr "isa"
"<VSisa>, <VSisa>, <VSisa>, *, *, *,
*, *, *, *, p9v, *,
@@ -1233,7 +1235,7 @@
vecstore, vecload")
(set_attr "length"
"*, *, *, 16, 16, 16,
- *, *, *, 20, 16,
+ *, *, *, 24, 16,
*, *")
(set_attr "isa"
"<VSisa>, <VSisa>, <VSisa>, *, *, *,
@@ -6326,3 +6328,137 @@
rs6000_emit_xxspltidp_v2df (operands[0], value);
DONE;
})
+
+;; XXSPLTI32DX used to create 64-bit constants or 32-bit vector constants where
+;; the even elements match and the odd elements match.
+(define_mode_iterator XXSPLTI32DX [SF DF V4SF V4SI V2DF V2DI])
+
+(define_insn_and_split "*xxsplti32dx_<mode>"
+ [(set (match_operand:XXSPLTI32DX 0 "vsx_register_operand" "=wa")
+ (match_operand:XXSPLTI32DX 1 "xxsplti32dx_operand"))]
+ "TARGET_XXSPLTI32DX"
+ "#"
+ "&& 1"
+ [(set (match_dup 0)
+ (unspec:XXSPLTI32DX [(match_dup 2)
+ (match_dup 3)] UNSPEC_XXSPLTI32DX_CONST))
+ (set (match_dup 0)
+ (unspec:XXSPLTI32DX [(match_dup 0)
+ (match_dup 4)
+ (match_dup 5)] UNSPEC_XXSPLTI32DX_CONST))]
+{
+ HOST_WIDE_INT value = 0;
+
+ if (!xxsplti32dx_constant_p (operands[1], <MODE>mode, &value))
+ gcc_unreachable ();
+
+ HOST_WIDE_INT high = value >> 32;
+ HOST_WIDE_INT low = ((value & 0xffffffff) ^ 0x80000000) - 0x80000000;
+
+ /* If the low bits are 0 or all 1s, initialize that word first. This way we
+ can use a smaller XXSPLTIB instruction instead the first XXSPLTI32DX. */
+ if (low == 0 || low == -1)
+ {
+ operands[2] = const1_rtx;
+ operands[3] = GEN_INT (low);
+ operands[4] = const0_rtx;
+ operands[5] = GEN_INT (high);
+ }
+ else
+ {
+ operands[2] = const0_rtx;
+ operands[3] = GEN_INT (high);
+ operands[4] = const1_rtx;
+ operands[5] = GEN_INT (low);
+ }
+}
+ [(set_attr "type" "vecperm")
+ (set_attr "prefixed" "yes")
+ (set_attr "num_insns" "2")
+ (set_attr "max_prefixed_insns" "2")])
+
+;; First word of XXSPLTI32DX
+(define_insn "*xxsplti32dx_<mode>_first"
+ [(set (match_operand:XXSPLTI32DX 0 "vsx_register_operand" "=wa,wa,wa")
+ (unspec:XXSPLTI32DX [(match_operand 1 "u1bit_cint_operand" "n,n,n")
+ (match_operand 2 "const_int_operand" "O,wM,n")]
+ UNSPEC_XXSPLTI32DX_CONST))]
+ "TARGET_XXSPLTI32DX"
+ "@
+ xxspltib %x0,0
+ xxspltib %x0,255
+ xxsplti32dx %x0,%1,%2"
+ [(set_attr "type" "vecperm")
+ (set_attr "prefixed" "*,*,yes")])
+
+;; Second word of XXSPLTI32DX
+(define_insn "*xxsplti32dx_<mode>_second"
+ [(set (match_operand:XXSPLTI32DX 0 "vsx_register_operand" "=wa")
+ (unspec:XXSPLTI32DX [(match_operand:XXSPLTI32DX 1 "vsx_register_operand" "0")
+ (match_operand 2 "u1bit_cint_operand" "n")
+ (match_operand 3 "const_int_operand" "n")]
+ UNSPEC_XXSPLTI32DX_CONST))]
+ "TARGET_XXSPLTI32DX"
+ "xxsplti32dx %x0,%2,%3"
+ [(set_attr "type" "vecperm")
+ (set_attr "prefixed" "yes")])
+
+;; XXSPLTI32DX built-in support.
+(define_expand "xxsplti32dx_v4si"
+ [(set (match_operand:V4SI 0 "register_operand" "=wa")
+ (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "0")
+ (match_operand:QI 2 "u1bit_cint_operand" "n")
+ (match_operand:SI 3 "s32bit_cint_operand" "n")]
+ UNSPEC_XXSPLTI32DX))]
+ "TARGET_POWER10"
+{
+ int index = INTVAL (operands[2]);
+
+ if (!BYTES_BIG_ENDIAN)
+ index = 1 - index;
+
+ emit_insn (gen_xxsplti32dx_v4si_inst (operands[0], operands[1],
+ GEN_INT (index), operands[3]));
+ DONE;
+}
+ [(set_attr "type" "vecsimple")])
+
+(define_insn "xxsplti32dx_v4si_inst"
+ [(set (match_operand:V4SI 0 "register_operand" "=wa")
+ (unspec:V4SI [(match_operand:V4SI 1 "register_operand" "0")
+ (match_operand:QI 2 "u1bit_cint_operand" "n")
+ (match_operand:SI 3 "s32bit_cint_operand" "n")]
+ UNSPEC_XXSPLTI32DX))]
+ "TARGET_POWER10"
+ "xxsplti32dx %x0,%2,%3"
+ [(set_attr "type" "vecsimple")
+ (set_attr "prefixed" "yes")])
+
+(define_expand "xxsplti32dx_v4sf"
+ [(set (match_operand:V4SF 0 "register_operand" "=wa")
+ (unspec:V4SF [(match_operand:V4SF 1 "register_operand" "0")
+ (match_operand:QI 2 "u1bit_cint_operand" "n")
+ (match_operand:SF 3 "const_double_operand" "n")]
+ UNSPEC_XXSPLTI32DX))]
+ "TARGET_POWER10"
+{
+ int index = INTVAL (operands[2]);
+ long value = rs6000_const_f32_to_i32 (operands[3]);
+ if (!BYTES_BIG_ENDIAN)
+ index = 1 - index;
+
+ emit_insn (gen_xxsplti32dx_v4sf_inst (operands[0], operands[1],
+ GEN_INT (index), GEN_INT (value)));
+ DONE;
+})
+
+(define_insn "xxsplti32dx_v4sf_inst"
+ [(set (match_operand:V4SF 0 "register_operand" "=wa")
+ (unspec:V4SF [(match_operand:V4SF 1 "register_operand" "0")
+ (match_operand:QI 2 "u1bit_cint_operand" "n")
+ (match_operand:SI 3 "s32bit_cint_operand" "n")]
+ UNSPEC_XXSPLTI32DX))]
+ "TARGET_POWER10"
+ "xxsplti32dx %x0,%2,%3"
+ [(set_attr "type" "vecsimple")
+ (set_attr "prefixed" "yes")])
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
index 8f6e176f9af..1435ef4ef4f 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
@@ -48,13 +48,16 @@ scalar_double_m_inf (void) /* XXSPLTIDP. */
double
scalar_double_pi (void)
{
- return M_PI; /* PLFD. */
+ return M_PI; /* 2x XXSPLTI32DX. */
}
double
scalar_double_denorm (void)
{
- return 0x1p-149f; /* PLFD. */
+ return 0x1p-149f; /* XXSPLTIB, XXSPLTI32DX. */
}
-/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
+/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */
+/* { dg-final { scan-assembler-not {\mplfd\M} } } */
+/* { dg-final { scan-assembler-not {\mplxsd\M} } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
index 72504bdfbbd..e9a45d5159d 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
@@ -57,4 +57,7 @@ scalar_float_denorm (void)
return 0x1p-149f; /* PLFS. */
}
-/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */
+/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 1 } } */
+/* { dg-final { scan-assembler-not {\mplfs\M} } } */
+/* { dg-final { scan-assembler-not {\mplxssp\M} } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
index d509459292c..d81198b163d 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
@@ -51,14 +51,16 @@ v2df_double_m_inf (void)
vector double
v2df_double_pi (void)
{
- return (vector double) { M_PI, M_PI }; /* PLFD. */
+ return (vector double) { M_PI, M_PI }; /* 2x XXSPLTI32DX. */
}
vector double
v2df_double_denorm (void)
{
- return (vector double) { (double)0x1p-149f,
- (double)0x1p-149f }; /* PLFD. */
+ return (vector double) { (double)0x1p-149f, /* XXSPLTIB, */
+ (double)0x1p-149f }; /* XXSPLTI32DX. */
}
-/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
+/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
+/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */
+/* { dg-final { scan-assembler-not {\mplxv\M} } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
index 06a8289d09b..f0eb982eadf 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec-splati-runnable.c
@@ -162,4 +162,4 @@ main (int argc, char *argv [])
/* { dg-final { scan-assembler-times {\mxxspltiw\M} 1 } } */
/* { dg-final { scan-assembler-times {\mxxspltidp\M} 2 } } */
-/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 3 } } */
+/* { dg-final { scan-assembler-times {\mxxsplti32dx\M} 4 } } */
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2021-04-13 18:04 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-13 18:04 [gcc(refs/users/meissner/heads/work047)] Use XXSPLTI32DX to generate some constants Michael Meissner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).