public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc(refs/users/meissner/heads/work046)] Implement XXSPLTIW support.
@ 2021-04-07 17:13 Michael Meissner
0 siblings, 0 replies; 2+ messages in thread
From: Michael Meissner @ 2021-04-07 17:13 UTC (permalink / raw)
To: gcc-cvs
https://gcc.gnu.org/g:b0bdb408113f6d1a6620d8bbbdd794dbddda95bc
commit b0bdb408113f6d1a6620d8bbbdd794dbddda95bc
Author: Michael Meissner <meissner@linux.ibm.com>
Date: Wed Apr 7 13:13:29 2021 -0400
Implement XXSPLTIW support.
This patch implements XXSPLTIW support for V8HI, V4SI, and V4SF vector
constants.
A new constraint (eW) is added to match constants that can be loaded with
the XXSPLTIW instruction.
I have moved the XXSPLTIW built-in function support from altivec.md to
vsx.md because the functions can load any VSX register, not just the
ALTIVEC registers. I have also re-implemented the built-in functions to
load the vector constants, which will be optimized to generate the
appropriate XXSPLTIW, VSPLTISH, VSPLTISW, of XXSPLTIB instruction.
I have added a temporary switch (-mxxspltiw) to control whether or not the
XXSPLTIW instruction is generated.
This patch provides a xxspltiw_constant_p function which decodes both
VEC_DUPLICATE and VECTOR_CONST insns (similar to the existing
xxspltib_constant_p function).
The xxspltiw_constant_p function returns the appropriate integer that will be
used in the XXSPLTIW instruction. I.e. for V8HI constants, it will be two
elements combined to make a 32-bit constant, and for V4SF constants, the value
will be converted to the integer representation of the 32-bit floating value.
gcc/
2021-04-07 Michael Meissner <meissner@linux.ibm.com>
* config/rs6000/altivec.md (UNSPEC_XXSPLTIW): Move to vsx.md.
(xxspltiw_v4si): Move to vsx.md and re-implement.
(xxspltiw_v4sf): Move to vsx.md and re-implement.
(xxspltiw_v4sf_inst): Delete.
* config/rs6000/constraints.md (eW): New constraint.
* config/rs6000/predicates.md (xxspltiw_operand): New predicate.
(easy_vector_constant): If we can generate XXSPLTIW, mark the
vector constant as easy.
* config/rs6000/rs6000-cpus.def (OTHER_POWER10_MASKS): Add
-mxxspltiw support.
(POWERPC_MASKS): Add -mxxspltiw support.
* config/rs6000/rs6000-protos.h (xxspltiw_constant_p): New
declaration.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Add
-mxxspltiw support.
(xxspltib_constant_p): If we can generate XXSPLTIW, don't generate
XXSPLTIB and a vector extend instruction.
(xxspltiw_constant_p): New function.
(output_vec_const_move): Add support for XXSPLTIW.
(rs6000_opt_masks): Add -mxxspltiw support.
* config/rs6000/rs6000.opt (-mxxspltiw): New switch.
* config/rs6000/vsx.md (UNSPEC_XXSPLTIW): Move here from
altivec.md.
(vsx_mov<mode>_64bit): Add XXSPLTIW support.
(vsx_mov<mode>_32bit): Add XXSPLTIW support.
(XXSPLTIW): New mode iterator.
(xxspltiw_<mode>_internal1): New define_insn_and_split.
(xxspltiw_<mode>_internal2): New define_insn.
(xxspltiw_v4si): Move to vsx.md from altivec.md. Re-implement to
use the new constant format.
(xxspltiw_v4sf): Move to vsx.md from altivec.md. Re-implement to
use the new constant format.
Diff:
---
gcc/config/rs6000/altivec.md | 30 -------------
gcc/config/rs6000/constraints.md | 5 +++
gcc/config/rs6000/predicates.md | 22 +++++++++
gcc/config/rs6000/rs6000-cpus.def | 6 ++-
gcc/config/rs6000/rs6000-protos.h | 1 +
gcc/config/rs6000/rs6000.c | 93 ++++++++++++++++++++++++++++++++++++++
gcc/config/rs6000/rs6000.opt | 3 ++
gcc/config/rs6000/vsx.md | 95 +++++++++++++++++++++++++++++++++------
8 files changed, 209 insertions(+), 46 deletions(-)
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 1351dafbc41..708296cb14d 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -176,7 +176,6 @@
UNSPEC_VSTRIL
UNSPEC_SLDB
UNSPEC_SRDB
- UNSPEC_XXSPLTIW
UNSPEC_XXSPLTID
UNSPEC_XXSPLTI32DX
UNSPEC_XXBLEND
@@ -820,35 +819,6 @@
"vs<SLDB_lr>dbi %0,%1,%2,%3"
[(set_attr "type" "vecsimple")])
-(define_insn "xxspltiw_v4si"
- [(set (match_operand:V4SI 0 "register_operand" "=wa")
- (unspec:V4SI [(match_operand:SI 1 "s32bit_cint_operand" "n")]
- UNSPEC_XXSPLTIW))]
- "TARGET_POWER10"
- "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")
- (set_attr "prefixed" "yes")])
-
-(define_expand "xxspltiw_v4sf"
- [(set (match_operand:V4SF 0 "register_operand" "=wa")
- (unspec:V4SF [(match_operand:SF 1 "const_double_operand" "n")]
- UNSPEC_XXSPLTIW))]
- "TARGET_POWER10"
-{
- long long value = rs6000_const_f32_to_i32 (operands[1]);
- emit_insn (gen_xxspltiw_v4sf_inst (operands[0], GEN_INT (value)));
- DONE;
-})
-
-(define_insn "xxspltiw_v4sf_inst"
- [(set (match_operand:V4SF 0 "register_operand" "=wa")
- (unspec:V4SF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
- UNSPEC_XXSPLTIW))]
- "TARGET_POWER10"
- "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")
- (set_attr "prefixed" "yes")])
-
(define_expand "xxspltidp_v2df"
[(set (match_operand:V2DF 0 "register_operand" )
(unspec:V2DF [(match_operand:SF 1 "const_double_operand")]
diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index 561ce9797af..b3e36fbcfdf 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -213,6 +213,11 @@
"A signed 34-bit integer constant if prefixed instructions are supported."
(match_operand 0 "cint34_operand"))
+;; V4SI/V4SF/V8HI vector constant that can be loaded with XXSPLTIW
+(define_constraint "eW"
+ "A vector constant that can be loaded with the XXSPLTIW instruction."
+ (match_operand 0 "xxspltiw_operand"))
+
;; Floating-point constraints. These two are defined so that insn
;; length attributes can be calculated exactly.
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index e21bc745f72..dc23f62a3af 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -640,6 +640,25 @@
return num_insns == 1;
})
+;; Return 1 if the operand is a CONST_VECTOR and can be loaded into a vector
+;; using the ISA 3.1 XXSPLTIW instruction. Do not return 1 if the value can be
+;; loaded with a smaller XXSPLTIB or VSPLTISW instruction.
+(define_predicate "xxspltiw_operand"
+ (match_code "vec_duplicate,const_vector")
+{
+ HOST_WIDE_INT value = 0;
+
+ if (!xxspltiw_constant_p (op, mode, &value))
+ return false;
+
+ /* xxspltiw_constant_p returns V8HI as (element | (element << 16)). Undo
+ this to see if the value is in the range -16..15. */
+ if (mode == V8HImode)
+ value = ((value & 0xffff) ^ 0x8000) - 0x8000;
+
+ return !EASY_VECTOR_15 (value);
+})
+
;; Return 1 if the operand is a CONST_VECTOR and can be loaded into a
;; vector register without using memory.
(define_predicate "easy_vector_constant"
@@ -653,6 +672,9 @@
if (zero_constant (op, mode) || all_ones_constant (op, mode))
return true;
+ if (xxspltiw_operand (op, mode))
+ return true;
+
if (TARGET_P9_VECTOR
&& xxspltib_constant_p (op, mode, &num_insns, &value))
return true;
diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def
index cbbb42c1b3a..f7743374f26 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -78,7 +78,8 @@
#define OTHER_POWER10_MASKS (OPTION_MASK_MMA \
| OPTION_MASK_PCREL \
| OPTION_MASK_PCREL_OPT \
- | OPTION_MASK_PREFIXED)
+ | OPTION_MASK_PREFIXED \
+ | OPTION_MASK_XXSPLTIW)
#define ISA_3_1_MASKS_SERVER (ISA_3_0_MASKS_SERVER \
| OPTION_MASK_POWER10 \
@@ -160,7 +161,8 @@
| OPTION_MASK_RECIP_PRECISION \
| OPTION_MASK_SOFT_FLOAT \
| OPTION_MASK_STRICT_ALIGN_OPTIONAL \
- | OPTION_MASK_VSX)
+ | OPTION_MASK_VSX \
+ | OPTION_MASK_XXSPLTIW)
#endif
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 8ac30905013..eff72af8814 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -32,6 +32,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, int, int, int,
extern bool easy_altivec_constant (rtx, machine_mode);
extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
+extern bool xxspltiw_constant_p (rtx, machine_mode, HOST_WIDE_INT *);
extern int vspltis_shifted (rtx);
extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index dc38c093c53..54c338d73a5 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4476,6 +4476,10 @@ rs6000_option_override_internal (bool global_init_p)
rs6000_isa_flags &= ~OPTION_MASK_MMA;
}
+ if (TARGET_POWER10 && TARGET_VSX
+ && (rs6000_isa_flags_explicit & OPTION_MASK_XXSPLTIW) == 0)
+ rs6000_isa_flags |= OPTION_MASK_XXSPLTIW;
+
if (!TARGET_PCREL && TARGET_PCREL_OPT)
rs6000_isa_flags &= ~OPTION_MASK_PCREL_OPT;
@@ -6460,6 +6464,12 @@ xxspltib_constant_p (rtx op,
else if (IN_RANGE (value, -1, 0))
*num_insns_ptr = 1;
+ /* If XXSPLTIW is available, don't return true if we can use that
+ instruction instead of doing 2 instructions. */
+ else if (TARGET_XXSPLTIW
+ && (mode == V4SImode || mode == V8HImode))
+ return false;
+
else
*num_insns_ptr = 2;
@@ -6467,6 +6477,66 @@ xxspltib_constant_p (rtx op,
return true;
}
+/* Return true if OP is of the given MODE and can be synthesized with ISA 3.1
+ XXSPLTIW instruction, possibly with an sign extension.
+
+ Return the constant that is being split via CONSTANT_PTR. */
+
+bool
+xxspltiw_constant_p (rtx op,
+ machine_mode mode,
+ HOST_WIDE_INT *constant_ptr)
+{
+ *constant_ptr = 0;
+
+ if (!TARGET_XXSPLTIW)
+ return false;
+
+ if (mode == VOIDmode)
+ mode = GET_MODE (op);
+
+ if (mode != V8HImode && mode != V4SImode && mode != V4SFmode)
+ return false;
+
+ rtx element = op;
+ if (GET_CODE (op) == VEC_DUPLICATE)
+ element = op;
+
+ else if (GET_CODE (op) == CONST_VECTOR)
+ {
+ size_t nunits = GET_MODE_NUNITS (mode);
+ element = CONST_VECTOR_ELT (op, 0);
+
+ for (size_t i = 1; i < nunits; i++)
+ if (!rtx_equal_p (element, CONST_VECTOR_ELT (op, i)))
+ return false;
+ }
+
+ HOST_WIDE_INT value;
+ if (CONST_INT_P (element))
+ {
+ value = INTVAL (element);
+ if (!SIGNED_INTEGER_NBIT_P (value, 32))
+ return false;
+
+ /* For V8HImode, return the value setting 2 elements of the constant. */
+ if (mode == V8HImode)
+ {
+ value &= 0xffff;
+ value |= value << 16;
+ }
+ }
+
+ else if (CONST_DOUBLE_P (element))
+ value = rs6000_const_f32_to_i32 (element);
+
+ else
+ return false;
+
+ *constant_ptr = value;
+ return true;
+}
+
const char *
output_vec_const_move (rtx *operands)
{
@@ -6511,6 +6581,28 @@ output_vec_const_move (rtx *operands)
gcc_unreachable ();
}
+ HOST_WIDE_INT xxspltiw_value = 0;
+ if (xxspltiw_constant_p (vec, mode, &xxspltiw_value))
+ {
+ /* Generate the smaller VSPLTIS{H,W} if we can. */
+ if (dest_vmx_p && mode == V8HImode)
+ {
+ long hi_value = ((xxspltiw_value & 0xffff) ^ 0x8000) - 0x8000;
+ if (IN_RANGE (hi_value, -16, 15))
+ {
+ operands[2] = GEN_INT (hi_value);
+ return "vspltish %0,%2";
+ }
+ }
+
+ operands[2] = GEN_INT (xxspltiw_value);
+ if (dest_vmx_p && mode == V4SImode
+ && IN_RANGE (xxspltiw_value, -16, 15))
+ return "vspltisw %0,%2";
+
+ return "xxspltiw %x0,%2";
+ }
+
if (TARGET_P9_VECTOR
&& xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
{
@@ -24008,6 +24100,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] =
{ "string", 0, false, true },
{ "update", OPTION_MASK_NO_UPDATE, true , true },
{ "vsx", OPTION_MASK_VSX, false, true },
+ { "xxspltiw", OPTION_MASK_XXSPLTIW, false, true },
#ifdef OPTION_MASK_64BIT
#if TARGET_AIX_OS
{ "aix64", OPTION_MASK_64BIT, false, false },
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 0dbdf753673..06e7cdbbced 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -619,3 +619,6 @@ Generate (do not generate) MMA instructions.
mrelative-jumptables
Target Undocumented Var(rs6000_relative_jumptables) Init(1) Save
+
+mxxspltiw
+Target Undocumented Mask(XXSPLTIW) Var(rs6000_isa_flags)
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index bcb92be2f5c..825e8b1480b 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -369,6 +369,7 @@
UNSPEC_REPLACE_UN
UNSPEC_VDIVES
UNSPEC_VDIVEU
+ UNSPEC_XXSPLTIW
])
(define_int_iterator XVCVBF16 [UNSPEC_VSX_XVCVSPBF16
@@ -1167,17 +1168,17 @@
;; VSX store VSX load VSX move VSX->GPR GPR->VSX LQ (GPR)
;; STQ (GPR) GPR load GPR store GPR move XXSPLTIB VSPLTISW
-;; VSX 0/-1 VMX const GPR const LVX (VMX) STVX (VMX)
+;; VSX 0/-1 VMX const GPR const LVX (VMX) STVX (VMX) XXSPLTI*
(define_insn "vsx_mov<mode>_64bit"
[(set (match_operand:VSX_M 0 "nonimmediate_operand"
"=ZwO, wa, wa, r, we, ?wQ,
?&r, ??r, ??Y, <??r>, wa, v,
- ?wa, v, <??r>, wZ, v")
+ ?wa, v, <??r>, wZ, v, wa")
(match_operand:VSX_M 1 "input_operand"
"wa, ZwO, wa, we, r, r,
wQ, Y, r, r, wE, jwM,
- ?jwM, W, <nW>, v, wZ"))]
+ ?jwM, W, <nW>, v, wZ, eW"))]
"TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
&& (register_operand (operands[0], <MODE>mode)
@@ -1188,36 +1189,40 @@
[(set_attr "type"
"vecstore, vecload, vecsimple, mtvsr, mfvsr, load,
store, load, store, *, vecsimple, vecsimple,
- vecsimple, *, *, vecstore, vecload")
+ vecsimple, *, *, vecstore, vecload, vecperm")
(set_attr "num_insns"
"*, *, *, 2, *, 2,
2, 2, 2, 2, *, *,
- *, 5, 2, *, *")
+ *, 5, 2, *, *, *")
(set_attr "max_prefixed_insns"
"*, *, *, *, *, 2,
2, 2, 2, 2, *, *,
- *, *, *, *, *")
+ *, *, *, *, *, *")
(set_attr "length"
"*, *, *, 8, *, 8,
8, 8, 8, 8, *, *,
- *, 20, 8, *, *")
+ *, 20, 8, *, *, *")
(set_attr "isa"
"<VSisa>, <VSisa>, <VSisa>, *, *, *,
*, *, *, *, p9v, *,
- <VSisa>, *, *, *, *")])
+ <VSisa>, *, *, *, *, p10")
+ (set_attr "prefixed"
+ "*, *, *, *, *, *,
+ *, *, *, *, *, *,
+ *, *, *, *, *, yes")])
;; VSX store VSX load VSX move GPR load GPR store GPR move
-;; XXSPLTIB VSPLTISW VSX 0/-1 VMX const GPR const
+;; XXSPLTIB VSPLTISW VSX 0/-1 VMX const GPR const XXSPLTI*
;; LVX (VMX) STVX (VMX)
(define_insn "*vsx_mov<mode>_32bit"
[(set (match_operand:VSX_M 0 "nonimmediate_operand"
"=ZwO, wa, wa, ??r, ??Y, <??r>,
- wa, v, ?wa, v, <??r>,
+ wa, v, ?wa, v, <??r>, wa,
wZ, v")
(match_operand:VSX_M 1 "input_operand"
"wa, ZwO, wa, Y, r, r,
- wE, jwM, ?jwM, W, <nW>,
+ wE, jwM, ?jwM, W, <nW>, eW,
v, wZ"))]
"!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
@@ -1228,15 +1233,19 @@
}
[(set_attr "type"
"vecstore, vecload, vecsimple, load, store, *,
- vecsimple, vecsimple, vecsimple, *, *,
+ vecsimple, vecsimple, vecsimple, *, *, vecperm,
vecstore, vecload")
(set_attr "length"
"*, *, *, 16, 16, 16,
- *, *, *, 20, 16,
+ *, *, *, 20, 16, *,
*, *")
(set_attr "isa"
"<VSisa>, <VSisa>, <VSisa>, *, *, *,
- p9v, *, <VSisa>, *, *,
+ p9v, *, <VSisa>, *, *, p10,
+ *, *")
+ (set_attr "prefixed"
+ "*, *, *, *, *, *,
+ *, *, *, *, *, yes,
*, *")])
;; Explicit load/store expanders for the builtin functions
@@ -6216,3 +6225,61 @@
"TARGET_POWER10"
"vmulld %0,%1,%2"
[(set_attr "type" "veccomplex")])
+
+\f
+;; XXSPLTIW support.
+(define_mode_iterator XXSPLTIW [V8HI V4SI V4SF])
+
+(define_insn_and_split "*xxspltiw_<mode>_internal1"
+ [(set (match_operand:XXSPLTIW 0 "vsx_register_operand" "=wa")
+ (match_operand:XXSPLTIW 1 "xxspltiw_operand"))]
+ "TARGET_XXSPLTIW"
+ "#"
+ "&& 1"
+ [(set (match_operand:XXSPLTIW 0 "vsx_register_operand")
+ (unspec:XXSPLTIW [(match_dup 2)] UNSPEC_XXSPLTIW))]
+{
+ HOST_WIDE_INT value = 0;
+
+ if (!xxspltiw_constant_p (operands[1], <MODE>mode, &value))
+ gcc_unreachable ();
+
+ operands[2] = GEN_INT (value);
+}
+ [(set_attr "type" "vecsimple")
+ (set_attr "prefixed" "yes")])
+
+(define_insn "*xxspltiw_<mode>_internal2"
+ [(set (match_operand:XXSPLTIW 0 "vsx_register_operand" "=wa")
+ (unspec:XXSPLTIW [(match_operand 1 "const_int_operand" "n")]
+ UNSPEC_XXSPLTIW))]
+ "TARGET_XXSPLTIW"
+ "xxspltiw %x0,%1"
+ [(set_attr "type" "vecsimple")
+ (set_attr "prefixed" "yes")])
+
+;; Implement XXSPLTIW built-in functions as just loading up the appropriate
+;; constant vector. The normal optimizations will generate XXSPLTIW.
+(define_expand "xxspltiw_v4si"
+ [(use (match_operand:V4SI 0 "register_operand"))
+ (use (match_operand:SI 1 "s32bit_cint_operand"))]
+ "TARGET_POWER10"
+{
+ rtx op1 = operands[1];
+ rtvec rv = gen_rtvec (4, op1, op1, op1, op1, op1);
+ rtx cv = gen_rtx_CONST_VECTOR (V4SImode, rv);
+ emit_move_insn (operands[0], cv);
+ DONE;
+})
+
+(define_expand "xxspltiw_v4sf"
+ [(use (match_operand:V4SF 0 "register_operand"))
+ (use (match_operand:SF 1 "const_double_operand"))]
+ "TARGET_POWER10"
+{
+ rtx op1 = operands[1];
+ rtvec rv = gen_rtvec (4, op1, op1, op1, op1, op1);
+ rtx cv = gen_rtx_CONST_VECTOR (V4SFmode, rv);
+ emit_move_insn (operands[0], cv);
+ DONE;
+})
^ permalink raw reply [flat|nested] 2+ messages in thread
* [gcc(refs/users/meissner/heads/work046)] Implement XXSPLTIW support.
@ 2021-04-07 18:46 Michael Meissner
0 siblings, 0 replies; 2+ messages in thread
From: Michael Meissner @ 2021-04-07 18:46 UTC (permalink / raw)
To: gcc-cvs
https://gcc.gnu.org/g:9ec2b7a1a16105ab16efed20302107b875c06026
commit 9ec2b7a1a16105ab16efed20302107b875c06026
Author: Michael Meissner <meissner@linux.ibm.com>
Date: Wed Apr 7 14:46:39 2021 -0400
Implement XXSPLTIW support.
This patch implements XXSPLTIW support for V8HI, V4SI, and V4SF vector
constants.
A new constraint (eW) is added to match constants that can be loaded with
the XXSPLTIW instruction.
I have moved the XXSPLTIW built-in function support from altivec.md to
vsx.md because the functions can load any VSX register, not just the
ALTIVEC registers. I have also re-implemented the built-in functions to
load the vector constants, which will be optimized to generate the
appropriate XXSPLTIW, VSPLTISH, VSPLTISW, of XXSPLTIB instruction.
I have added a temporary switch (-mxxspltiw) to control whether or not the
XXSPLTIW instruction is generated.
This patch provides a xxspltiw_constant_p function which decodes both
VEC_DUPLICATE and VECTOR_CONST insns (similar to the existing
xxspltib_constant_p function).
The xxspltiw_constant_p function returns the appropriate integer that will be
used in the XXSPLTIW instruction. I.e. for V8HI constants, it will be two
elements combined to make a 32-bit constant, and for V4SF constants, the value
will be converted to the integer representation of the 32-bit floating value.
gcc/
2021-04-07 Michael Meissner <meissner@linux.ibm.com>
* config/rs6000/altivec.md (UNSPEC_XXSPLTIW): Move to vsx.md.
(xxspltiw_v4si): Move to vsx.md and re-implement.
(xxspltiw_v4sf): Move to vsx.md and re-implement.
(xxspltiw_v4sf_inst): Delete.
* config/rs6000/constraints.md (eW): New constraint.
* config/rs6000/predicates.md (xxspltiw_operand): New predicate.
(easy_vector_constant): If we can generate XXSPLTIW, mark the
vector constant as easy.
* config/rs6000/rs6000-cpus.def (OTHER_POWER10_MASKS): Add
-mxxspltiw support.
(POWERPC_MASKS): Add -mxxspltiw support.
* config/rs6000/rs6000-protos.h (xxspltiw_constant_p): New
declaration.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Add
-mxxspltiw support.
(xxspltib_constant_p): If we can generate XXSPLTIW, don't generate
XXSPLTIB and a vector extend instruction.
(xxspltiw_constant_p): New function.
(output_vec_const_move): Add support for XXSPLTIW.
(rs6000_opt_masks): Add -mxxspltiw support.
* config/rs6000/rs6000.opt (-mxxspltiw): New switch.
* config/rs6000/vsx.md (UNSPEC_XXSPLTIW): Move here from
altivec.md.
(vsx_mov<mode>_64bit): Add XXSPLTIW support.
(vsx_mov<mode>_32bit): Add XXSPLTIW support.
(XXSPLTIW): New mode iterator.
(xxspltiw_<mode>_internal1): New define_insn_and_split.
(xxspltiw_<mode>_internal2): New define_insn.
(xxspltiw_v4si): Move to vsx.md from altivec.md. Re-implement to
use the new constant format.
(xxspltiw_v4sf): Move to vsx.md from altivec.md. Re-implement to
use the new constant format.
Diff:
---
gcc/config/rs6000/altivec.md | 30 -------------
gcc/config/rs6000/constraints.md | 5 +++
gcc/config/rs6000/predicates.md | 22 +++++++++
gcc/config/rs6000/rs6000-cpus.def | 6 ++-
gcc/config/rs6000/rs6000-protos.h | 1 +
gcc/config/rs6000/rs6000.c | 93 ++++++++++++++++++++++++++++++++++++++
gcc/config/rs6000/rs6000.opt | 3 ++
gcc/config/rs6000/vsx.md | 95 +++++++++++++++++++++++++++++++++------
8 files changed, 209 insertions(+), 46 deletions(-)
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 1351dafbc41..708296cb14d 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -176,7 +176,6 @@
UNSPEC_VSTRIL
UNSPEC_SLDB
UNSPEC_SRDB
- UNSPEC_XXSPLTIW
UNSPEC_XXSPLTID
UNSPEC_XXSPLTI32DX
UNSPEC_XXBLEND
@@ -820,35 +819,6 @@
"vs<SLDB_lr>dbi %0,%1,%2,%3"
[(set_attr "type" "vecsimple")])
-(define_insn "xxspltiw_v4si"
- [(set (match_operand:V4SI 0 "register_operand" "=wa")
- (unspec:V4SI [(match_operand:SI 1 "s32bit_cint_operand" "n")]
- UNSPEC_XXSPLTIW))]
- "TARGET_POWER10"
- "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")
- (set_attr "prefixed" "yes")])
-
-(define_expand "xxspltiw_v4sf"
- [(set (match_operand:V4SF 0 "register_operand" "=wa")
- (unspec:V4SF [(match_operand:SF 1 "const_double_operand" "n")]
- UNSPEC_XXSPLTIW))]
- "TARGET_POWER10"
-{
- long long value = rs6000_const_f32_to_i32 (operands[1]);
- emit_insn (gen_xxspltiw_v4sf_inst (operands[0], GEN_INT (value)));
- DONE;
-})
-
-(define_insn "xxspltiw_v4sf_inst"
- [(set (match_operand:V4SF 0 "register_operand" "=wa")
- (unspec:V4SF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
- UNSPEC_XXSPLTIW))]
- "TARGET_POWER10"
- "xxspltiw %x0,%1"
- [(set_attr "type" "vecsimple")
- (set_attr "prefixed" "yes")])
-
(define_expand "xxspltidp_v2df"
[(set (match_operand:V2DF 0 "register_operand" )
(unspec:V2DF [(match_operand:SF 1 "const_double_operand")]
diff --git a/gcc/config/rs6000/constraints.md b/gcc/config/rs6000/constraints.md
index 561ce9797af..b3e36fbcfdf 100644
--- a/gcc/config/rs6000/constraints.md
+++ b/gcc/config/rs6000/constraints.md
@@ -213,6 +213,11 @@
"A signed 34-bit integer constant if prefixed instructions are supported."
(match_operand 0 "cint34_operand"))
+;; V4SI/V4SF/V8HI vector constant that can be loaded with XXSPLTIW
+(define_constraint "eW"
+ "A vector constant that can be loaded with the XXSPLTIW instruction."
+ (match_operand 0 "xxspltiw_operand"))
+
;; Floating-point constraints. These two are defined so that insn
;; length attributes can be calculated exactly.
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index e21bc745f72..dc23f62a3af 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -640,6 +640,25 @@
return num_insns == 1;
})
+;; Return 1 if the operand is a CONST_VECTOR and can be loaded into a vector
+;; using the ISA 3.1 XXSPLTIW instruction. Do not return 1 if the value can be
+;; loaded with a smaller XXSPLTIB or VSPLTISW instruction.
+(define_predicate "xxspltiw_operand"
+ (match_code "vec_duplicate,const_vector")
+{
+ HOST_WIDE_INT value = 0;
+
+ if (!xxspltiw_constant_p (op, mode, &value))
+ return false;
+
+ /* xxspltiw_constant_p returns V8HI as (element | (element << 16)). Undo
+ this to see if the value is in the range -16..15. */
+ if (mode == V8HImode)
+ value = ((value & 0xffff) ^ 0x8000) - 0x8000;
+
+ return !EASY_VECTOR_15 (value);
+})
+
;; Return 1 if the operand is a CONST_VECTOR and can be loaded into a
;; vector register without using memory.
(define_predicate "easy_vector_constant"
@@ -653,6 +672,9 @@
if (zero_constant (op, mode) || all_ones_constant (op, mode))
return true;
+ if (xxspltiw_operand (op, mode))
+ return true;
+
if (TARGET_P9_VECTOR
&& xxspltib_constant_p (op, mode, &num_insns, &value))
return true;
diff --git a/gcc/config/rs6000/rs6000-cpus.def b/gcc/config/rs6000/rs6000-cpus.def
index cbbb42c1b3a..f7743374f26 100644
--- a/gcc/config/rs6000/rs6000-cpus.def
+++ b/gcc/config/rs6000/rs6000-cpus.def
@@ -78,7 +78,8 @@
#define OTHER_POWER10_MASKS (OPTION_MASK_MMA \
| OPTION_MASK_PCREL \
| OPTION_MASK_PCREL_OPT \
- | OPTION_MASK_PREFIXED)
+ | OPTION_MASK_PREFIXED \
+ | OPTION_MASK_XXSPLTIW)
#define ISA_3_1_MASKS_SERVER (ISA_3_0_MASKS_SERVER \
| OPTION_MASK_POWER10 \
@@ -160,7 +161,8 @@
| OPTION_MASK_RECIP_PRECISION \
| OPTION_MASK_SOFT_FLOAT \
| OPTION_MASK_STRICT_ALIGN_OPTIONAL \
- | OPTION_MASK_VSX)
+ | OPTION_MASK_VSX \
+ | OPTION_MASK_XXSPLTIW)
#endif
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 8ac30905013..eff72af8814 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -32,6 +32,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, int, int, int,
extern bool easy_altivec_constant (rtx, machine_mode);
extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
+extern bool xxspltiw_constant_p (rtx, machine_mode, HOST_WIDE_INT *);
extern int vspltis_shifted (rtx);
extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index dc38c093c53..54c338d73a5 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4476,6 +4476,10 @@ rs6000_option_override_internal (bool global_init_p)
rs6000_isa_flags &= ~OPTION_MASK_MMA;
}
+ if (TARGET_POWER10 && TARGET_VSX
+ && (rs6000_isa_flags_explicit & OPTION_MASK_XXSPLTIW) == 0)
+ rs6000_isa_flags |= OPTION_MASK_XXSPLTIW;
+
if (!TARGET_PCREL && TARGET_PCREL_OPT)
rs6000_isa_flags &= ~OPTION_MASK_PCREL_OPT;
@@ -6460,6 +6464,12 @@ xxspltib_constant_p (rtx op,
else if (IN_RANGE (value, -1, 0))
*num_insns_ptr = 1;
+ /* If XXSPLTIW is available, don't return true if we can use that
+ instruction instead of doing 2 instructions. */
+ else if (TARGET_XXSPLTIW
+ && (mode == V4SImode || mode == V8HImode))
+ return false;
+
else
*num_insns_ptr = 2;
@@ -6467,6 +6477,66 @@ xxspltib_constant_p (rtx op,
return true;
}
+/* Return true if OP is of the given MODE and can be synthesized with ISA 3.1
+ XXSPLTIW instruction, possibly with an sign extension.
+
+ Return the constant that is being split via CONSTANT_PTR. */
+
+bool
+xxspltiw_constant_p (rtx op,
+ machine_mode mode,
+ HOST_WIDE_INT *constant_ptr)
+{
+ *constant_ptr = 0;
+
+ if (!TARGET_XXSPLTIW)
+ return false;
+
+ if (mode == VOIDmode)
+ mode = GET_MODE (op);
+
+ if (mode != V8HImode && mode != V4SImode && mode != V4SFmode)
+ return false;
+
+ rtx element = op;
+ if (GET_CODE (op) == VEC_DUPLICATE)
+ element = op;
+
+ else if (GET_CODE (op) == CONST_VECTOR)
+ {
+ size_t nunits = GET_MODE_NUNITS (mode);
+ element = CONST_VECTOR_ELT (op, 0);
+
+ for (size_t i = 1; i < nunits; i++)
+ if (!rtx_equal_p (element, CONST_VECTOR_ELT (op, i)))
+ return false;
+ }
+
+ HOST_WIDE_INT value;
+ if (CONST_INT_P (element))
+ {
+ value = INTVAL (element);
+ if (!SIGNED_INTEGER_NBIT_P (value, 32))
+ return false;
+
+ /* For V8HImode, return the value setting 2 elements of the constant. */
+ if (mode == V8HImode)
+ {
+ value &= 0xffff;
+ value |= value << 16;
+ }
+ }
+
+ else if (CONST_DOUBLE_P (element))
+ value = rs6000_const_f32_to_i32 (element);
+
+ else
+ return false;
+
+ *constant_ptr = value;
+ return true;
+}
+
const char *
output_vec_const_move (rtx *operands)
{
@@ -6511,6 +6581,28 @@ output_vec_const_move (rtx *operands)
gcc_unreachable ();
}
+ HOST_WIDE_INT xxspltiw_value = 0;
+ if (xxspltiw_constant_p (vec, mode, &xxspltiw_value))
+ {
+ /* Generate the smaller VSPLTIS{H,W} if we can. */
+ if (dest_vmx_p && mode == V8HImode)
+ {
+ long hi_value = ((xxspltiw_value & 0xffff) ^ 0x8000) - 0x8000;
+ if (IN_RANGE (hi_value, -16, 15))
+ {
+ operands[2] = GEN_INT (hi_value);
+ return "vspltish %0,%2";
+ }
+ }
+
+ operands[2] = GEN_INT (xxspltiw_value);
+ if (dest_vmx_p && mode == V4SImode
+ && IN_RANGE (xxspltiw_value, -16, 15))
+ return "vspltisw %0,%2";
+
+ return "xxspltiw %x0,%2";
+ }
+
if (TARGET_P9_VECTOR
&& xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
{
@@ -24008,6 +24100,7 @@ static struct rs6000_opt_mask const rs6000_opt_masks[] =
{ "string", 0, false, true },
{ "update", OPTION_MASK_NO_UPDATE, true , true },
{ "vsx", OPTION_MASK_VSX, false, true },
+ { "xxspltiw", OPTION_MASK_XXSPLTIW, false, true },
#ifdef OPTION_MASK_64BIT
#if TARGET_AIX_OS
{ "aix64", OPTION_MASK_64BIT, false, false },
diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 0dbdf753673..06e7cdbbced 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -619,3 +619,6 @@ Generate (do not generate) MMA instructions.
mrelative-jumptables
Target Undocumented Var(rs6000_relative_jumptables) Init(1) Save
+
+mxxspltiw
+Target Undocumented Mask(XXSPLTIW) Var(rs6000_isa_flags)
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index bcb92be2f5c..e5c5e157d1d 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -369,6 +369,7 @@
UNSPEC_REPLACE_UN
UNSPEC_VDIVES
UNSPEC_VDIVEU
+ UNSPEC_XXSPLTIW
])
(define_int_iterator XVCVBF16 [UNSPEC_VSX_XVCVSPBF16
@@ -1167,17 +1168,17 @@
;; VSX store VSX load VSX move VSX->GPR GPR->VSX LQ (GPR)
;; STQ (GPR) GPR load GPR store GPR move XXSPLTIB VSPLTISW
-;; VSX 0/-1 VMX const GPR const LVX (VMX) STVX (VMX)
+;; VSX 0/-1 VMX const GPR const LVX (VMX) STVX (VMX) XXSPLTI*
(define_insn "vsx_mov<mode>_64bit"
[(set (match_operand:VSX_M 0 "nonimmediate_operand"
"=ZwO, wa, wa, r, we, ?wQ,
?&r, ??r, ??Y, <??r>, wa, v,
- ?wa, v, <??r>, wZ, v")
+ ?wa, v, <??r>, wZ, v, wa")
(match_operand:VSX_M 1 "input_operand"
"wa, ZwO, wa, we, r, r,
wQ, Y, r, r, wE, jwM,
- ?jwM, W, <nW>, v, wZ"))]
+ ?jwM, W, <nW>, v, wZ, eW"))]
"TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
&& (register_operand (operands[0], <MODE>mode)
@@ -1188,36 +1189,40 @@
[(set_attr "type"
"vecstore, vecload, vecsimple, mtvsr, mfvsr, load,
store, load, store, *, vecsimple, vecsimple,
- vecsimple, *, *, vecstore, vecload")
+ vecsimple, *, *, vecstore, vecload, vecperm")
(set_attr "num_insns"
"*, *, *, 2, *, 2,
2, 2, 2, 2, *, *,
- *, 5, 2, *, *")
+ *, 5, 2, *, *, *")
(set_attr "max_prefixed_insns"
"*, *, *, *, *, 2,
2, 2, 2, 2, *, *,
- *, *, *, *, *")
+ *, *, *, *, *, *")
(set_attr "length"
"*, *, *, 8, *, 8,
8, 8, 8, 8, *, *,
- *, 20, 8, *, *")
+ *, 20, 8, *, *, *")
(set_attr "isa"
"<VSisa>, <VSisa>, <VSisa>, *, *, *,
*, *, *, *, p9v, *,
- <VSisa>, *, *, *, *")])
+ <VSisa>, *, *, *, *, p10")
+ (set_attr "prefixed"
+ "*, *, *, *, *, *,
+ *, *, *, *, *, *,
+ *, *, *, *, *, yes")])
;; VSX store VSX load VSX move GPR load GPR store GPR move
-;; XXSPLTIB VSPLTISW VSX 0/-1 VMX const GPR const
+;; XXSPLTIB VSPLTISW VSX 0/-1 VMX const GPR const XXSPLTI*
;; LVX (VMX) STVX (VMX)
(define_insn "*vsx_mov<mode>_32bit"
[(set (match_operand:VSX_M 0 "nonimmediate_operand"
"=ZwO, wa, wa, ??r, ??Y, <??r>,
- wa, v, ?wa, v, <??r>,
+ wa, v, ?wa, v, <??r>, wa,
wZ, v")
(match_operand:VSX_M 1 "input_operand"
"wa, ZwO, wa, Y, r, r,
- wE, jwM, ?jwM, W, <nW>,
+ wE, jwM, ?jwM, W, <nW>, eW,
v, wZ"))]
"!TARGET_POWERPC64 && VECTOR_MEM_VSX_P (<MODE>mode)
@@ -1228,15 +1233,19 @@
}
[(set_attr "type"
"vecstore, vecload, vecsimple, load, store, *,
- vecsimple, vecsimple, vecsimple, *, *,
+ vecsimple, vecsimple, vecsimple, *, *, vecperm,
vecstore, vecload")
(set_attr "length"
"*, *, *, 16, 16, 16,
- *, *, *, 20, 16,
+ *, *, *, 20, 16, *,
*, *")
(set_attr "isa"
"<VSisa>, <VSisa>, <VSisa>, *, *, *,
- p9v, *, <VSisa>, *, *,
+ p9v, *, <VSisa>, *, *, p10,
+ *, *")
+ (set_attr "prefixed"
+ "*, *, *, *, *, *,
+ *, *, *, *, *, yes,
*, *")])
;; Explicit load/store expanders for the builtin functions
@@ -6216,3 +6225,61 @@
"TARGET_POWER10"
"vmulld %0,%1,%2"
[(set_attr "type" "veccomplex")])
+
+\f
+;; XXSPLTIW support.
+(define_mode_iterator XXSPLTIW [V8HI V4SI V4SF])
+
+(define_insn_and_split "*xxspltiw_<mode>_internal1"
+ [(set (match_operand:XXSPLTIW 0 "vsx_register_operand" "=wa")
+ (match_operand:XXSPLTIW 1 "xxspltiw_operand"))]
+ "TARGET_XXSPLTIW"
+ "#"
+ "&& 1"
+ [(set (match_operand:XXSPLTIW 0 "vsx_register_operand")
+ (unspec:XXSPLTIW [(match_dup 2)] UNSPEC_XXSPLTIW))]
+{
+ HOST_WIDE_INT value = 0;
+
+ if (!xxspltiw_constant_p (operands[1], <MODE>mode, &value))
+ gcc_unreachable ();
+
+ operands[2] = GEN_INT (value);
+}
+ [(set_attr "type" "vecperm")
+ (set_attr "prefixed" "yes")])
+
+(define_insn "*xxspltiw_<mode>_internal2"
+ [(set (match_operand:XXSPLTIW 0 "vsx_register_operand" "=wa")
+ (unspec:XXSPLTIW [(match_operand 1 "const_int_operand" "n")]
+ UNSPEC_XXSPLTIW))]
+ "TARGET_XXSPLTIW"
+ "xxspltiw %x0,%1"
+ [(set_attr "type" "vecperm")
+ (set_attr "prefixed" "yes")])
+
+;; Implement XXSPLTIW built-in functions as just loading up the appropriate
+;; constant vector. The normal optimizations will generate XXSPLTIW.
+(define_expand "xxspltiw_v4si"
+ [(use (match_operand:V4SI 0 "register_operand"))
+ (use (match_operand:SI 1 "s32bit_cint_operand"))]
+ "TARGET_POWER10"
+{
+ rtx op1 = operands[1];
+ rtvec rv = gen_rtvec (4, op1, op1, op1, op1, op1);
+ rtx cv = gen_rtx_CONST_VECTOR (V4SImode, rv);
+ emit_move_insn (operands[0], cv);
+ DONE;
+})
+
+(define_expand "xxspltiw_v4sf"
+ [(use (match_operand:V4SF 0 "register_operand"))
+ (use (match_operand:SF 1 "const_double_operand"))]
+ "TARGET_POWER10"
+{
+ rtx op1 = operands[1];
+ rtvec rv = gen_rtvec (4, op1, op1, op1, op1, op1);
+ rtx cv = gen_rtx_CONST_VECTOR (V4SFmode, rv);
+ emit_move_insn (operands[0], cv);
+ DONE;
+})
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2021-04-07 18:46 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-07 17:13 [gcc(refs/users/meissner/heads/work046)] Implement XXSPLTIW support Michael Meissner
2021-04-07 18:46 Michael Meissner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).