public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed
* [gcc(refs/users/meissner/heads/work054)] PR 93230: Fold sign/zero extension into vec_extract.
@ 2021-06-03 17:52 Michael Meissner
0 siblings, 0 replies; only message in thread
From: Michael Meissner @ 2021-06-03 17:52 UTC (permalink / raw)
To: gcc-cvs
https://gcc.gnu.org/g:8303b8a41b4c5aed3fc874097391f64b93c78b08
commit 8303b8a41b4c5aed3fc874097391f64b93c78b08
Author: Michael Meissner <meissner@linux.ibm.com>
Date: Thu Jun 3 13:51:44 2021 -0400
PR 93230: Fold sign/zero extension into vec_extract.
gcc/
2021-06-03 Michael Meissner <meissner@linux.ibm.com>
PR target/93230
* config/rs6000/rs6000.c (rs6000_split_vec_extract_var): Remove
support for handling MEM, users call rs6000_adjust_vec_address
directly.
* config/rs6000/vsx.md (VSX_EX_FL): New mode attribute.
(vsx_extract_v4sf_<mode>_load): Rename to vsx_extract_v4sf_load.
(vsx_extract_v4sf_to_df_load): New insn to combine vec_extract of
SFmode from memory being converted to DFmode.
(vsx_extract_v4si_<su><mode>_load): New insn to support V4SI
vec_extract from memory being converted to DImode directly without
an extra sign/zero extension.
(vsx_extract_v8hi_<su><mode>_load): New insn to support V8HI
vec_extract from memory being converted to DImode directly without
an extra sign/zero extension.
(vsx_extract_v16qi_u<mode>_load): New insn to support V16QI
vec_extract from memory being converted to DImode directly without
an extra zero extension.
(vsx_extract_v4si_var_load): Split V4SI extract from other small
integers, and add support for loading up vector registers with
sign/zero extension directly.
(vsx_extract_<mode>_var_load, VSX_EXTRACT_I2 iterator): Split
V8HI/V16QI vector extract from memory to handle loading vector
registers in addition to GPR registers.
(vsx_extract_<mode>_uns_di_var): New insn to optimize extracting a
small integer from a vector in a register and zero extending it to
DImode.
(vsx_extract_v4si_<su><mode>_var_load): New insns to support
combining a V4SI variable vector extract from memory with sign or
zero extension.
(vsx_extract_v8hi_<su><mode>_var_load): New insns to support
combining a V8HI variable vector extract from memory with sign or
zero extension.
(vsx_extract_v4si_u<mode>_var_load): New insns to support
combining a V16QI variable vector extract from memory with zero
extension.
(vsx_ext_v4si_fl_<mode>_load): New insn to support a V4SI vector
extract that is converted to floating point to avoid doing a
direct move.
(vsx_ext_v4si_ufl_<mode>_load): New insn to support an unsigned
V4SI vector extract that is converted to floating point to avoid
doing a direct move.
(vsx_ext_v4si_fl_<mode>_var_load): New insn to support a V4SI
variable vector extract that is converted to floating point to
avoid doing a direct move.
(vsx_ext_v4si_ufl_<mode>_var_load): New insn to support an
unsigned V4SI variable vector extract that is converted to
floating point to avoid doing a direct move.
(vsx_ext_<VSX_EXTRACT_I2:mode>_fl_<FL_CONV:mode>_load): New insns
to support a V8HI/V16QI vector extract that is converted to
floating point to avoid doing a direct move.
(vsx_ext_<VSX_EXTRACT_I2:mode>_ufl_<FL_CONV:mode>_load): New insns
to support an unsigned V8HI/V16QI vector extract that is converted
to floating point to avoid doing a direct move.
(vsx_ext_<VSX_EXTRACT_I2:mode>_fl_<FL_CONV:mode>_vl): New insns to
support a variable V8HI/V16QI vector extract that is converted to
floating point to avoid doing a direct move.
(vsx_ext_<VSX_EXTRACT_I2:mode>_ufl_<FL_CONV:mode>_vl): New insns
to support an unsigned variable V8HI/V16QI vector extract that is
converted to floating point to avoid doing a direct move.
gcc/testsuite/
2021-06-03 Michael Meissner <meissner@linux.ibm.com>
PR target/93230
* gcc.target/powerpc/fold-vec-extract-char.p8.c: Adjust
instruction counts.
* gcc.target/powerpc/fold-vec-extract-int.p8.c: Adjust
instruction counts.
* gcc.target/powerpc/fold-vec-extract-short.p8.c: Adjust
instruction counts.
* gcc.target/powerpc/pcrel-opt-inc-di.c: Fix typo.
Diff:
---
gcc/config/rs6000/rs6000.c | 13 +-
gcc/config/rs6000/vsx.md | 525 ++++++++++++++++++++-
| 2 +-
| 6 +-
| 2 +-
5 files changed, 521 insertions(+), 27 deletions(-)
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 420a1bb9521..c4f5f359692 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -8026,18 +8026,7 @@ rs6000_split_vec_extract_var (rtx dest, rtx src, rtx element, rtx tmp_gpr,
gcc_assert (byte_shift >= 0);
- /* If we are given a memory address, optimize to load just the element. We
- don't have to adjust the vector element number on little endian
- systems. */
- if (MEM_P (src))
- {
- emit_move_insn (dest,
- rs6000_adjust_vec_address (dest, src, element, tmp_gpr,
- scalar_mode));
- return;
- }
-
- else if (REG_P (src) || SUBREG_P (src))
+ if (REG_P (src) || SUBREG_P (src))
{
int num_elements = GET_MODE_NUNITS (mode);
int bits_in_element = mode_to_bits (GET_MODE_INNER (mode));
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index bc708113865..b49d5b44573 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -253,6 +253,13 @@
(TF "TARGET_FLOAT128_HW
&& FLOAT128_IEEE_P (TFmode)")])
+;; Mode attribute to give the constraint for the floating point type for vector
+;; extract and convert to floating point operations.
+(define_mode_attr VSX_EX_FL [(SF "wa")
+ (DF "wa")
+ (KF "v")
+ (TF "v")])
+
;; Iterator for the 2 short vector types to do a splat from an integer
(define_mode_iterator VSX_SPLAT_I [V16QI V8HI])
@@ -3443,7 +3450,9 @@
DONE;
})
-;; Variable V2DI/V2DF extract from memory
+;; Variable V2DI/V2DF extract from memory. We separate these insns, because
+;; the compiler will sometimes have the vector value in a register, but then
+;; decide the best way to do this is to do a store and then a load.
(define_insn_and_split "*vsx_extract_<mode>_var_load"
[(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=wa,r")
(unspec:<VS_scalar> [(match_operand:VSX_D 1 "memory_operand" "Q,Q")
@@ -3494,7 +3503,7 @@
[(set_attr "length" "8")
(set_attr "type" "fp")])
-(define_insn_and_split "*vsx_extract_v4sf_<mode>_load"
+(define_insn_and_split "*vsx_extract_v4sf_load"
[(set (match_operand:SF 0 "register_operand" "=f,v,v,?r")
(vec_select:SF
(match_operand:V4SF 1 "memory_operand" "m,Z,m,m")
@@ -3512,7 +3521,29 @@
(set_attr "length" "8")
(set_attr "isa" "*,p7v,p9v,*")])
-;; Variable V4SF extract from a register
+;; V4SF extract to DFmode
+(define_insn_and_split "*vsx_extract_v4sf_to_df_load"
+ [(set (match_operand:DF 0 "register_operand" "=f,v")
+ (float_extend:DF
+ (vec_select:SF
+ (match_operand:V4SF 1 "memory_operand" "m,m")
+ (parallel [(match_operand:QI 2 "const_0_to_3_operand" "n,n")]))))
+ (clobber (match_scratch:P 3 "=&b,&b"))]
+ "VECTOR_MEM_VSX_P (V4SFmode)"
+ "#"
+ "&& reload_completed"
+ [(set (match_dup 0)
+ (float_extend:DF (match_dup 4)))]
+{
+ rtx reg_sf = gen_rtx_REG (SFmode, reg_or_subregno (operands[0]));
+ operands[4] = rs6000_adjust_vec_address (reg_sf, operands[1], operands[2],
+ operands[3], SFmode);
+}
+ [(set_attr "type" "fpload")
+ (set_attr "length" "8")
+ (set_attr "isa" "*,p8v")])
+
+;; Variable V4SF extract
(define_insn_and_split "vsx_extract_v4sf_var"
[(set (match_operand:SF 0 "gpc_reg_operand" "=wa")
(unspec:SF [(match_operand:V4SF 1 "gpc_reg_operand" "v")
@@ -3547,6 +3578,26 @@
}
[(set_attr "type" "fpload,load")])
+(define_insn_and_split "*vsx_extract_v4sf_to_df_var_load"
+ [(set (match_operand:DF 0 "gpc_reg_operand" "=wa")
+ (float_extend:DF
+ (unspec:SF [(match_operand:V4SF 1 "memory_operand" "Q")
+ (match_operand:DI 2 "gpc_reg_operand" "r")]
+ UNSPEC_VSX_EXTRACT)))
+ (clobber (match_scratch:DI 3 "=&b"))]
+ "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_DIRECT_MOVE_64BIT"
+ "#"
+ "&& reload_completed"
+ [(set (match_dup 0)
+ (float_extend:DF (match_dup 4)))]
+{
+ rtx reg_sf = gen_rtx_REG (SFmode, reg_or_subregno (operands[0]));
+ operands[4] = rs6000_adjust_vec_address (reg_sf, operands[1], operands[2],
+ operands[3], SFmode);
+}
+ [(set_attr "type" "fpload")
+ (set_attr "length" "8")])
+
;; Expand the builtin form of xxpermdi to canonical rtl.
(define_expand "vsx_xxpermdi_<mode>"
[(match_operand:VSX_L 0 "vsx_register_operand")
@@ -3891,7 +3942,94 @@
[(set_attr "type" "load")
(set_attr "length" "8")])
-;; Variable V16QI/V8HI/V4SI extract from a register
+;; Optimize extracting and extending a single SI element from memory. GPRs
+;; take any address. If the element number is 0, we can use normal X-FORM
+;; (reg+reg) addressing to load up the vector register. Otherwise use Q to get
+;; a single register, so we can load the offset into the scratch register.
+(define_insn_and_split "*vsx_extract_v4si_<su><mode>_load"
+ [(set (match_operand:EXTSI 0 "gpc_reg_operand" "=r,wa,wa")
+ (any_extend:EXTSI
+ (vec_select:SI
+ (match_operand:V4SI 1 "memory_operand" "m,Z,Q")
+ (parallel [(match_operand:QI 2 "const_0_to_3_operand" "n,O,n")]))))
+ (clobber (match_scratch:DI 3 "=&b,&b,&b"))]
+ "VECTOR_MEM_VSX_P (V4SImode) && TARGET_DIRECT_MOVE_64BIT"
+ "#"
+ "&& reload_completed"
+ [(set (match_dup 0)
+ (any_extend:EXTSI (match_dup 4)))]
+{
+ rtx reg_si = gen_rtx_REG (SImode, reg_or_subregno (operands[0]));
+ operands[4] = rs6000_adjust_vec_address (reg_si, operands[1], operands[2],
+ operands[3], SImode);
+}
+ [(set_attr "type" "load,fpload,fpload")
+ (set_attr "length" "8")])
+
+;; Optimize extracting and extending a single HI element from memory. GPRs
+;; take any address. If the element number is 0, we can use normal X-FORM
+;; (reg+reg) addressing to load up the vector register. Otherwise use Q to get
+;; a single register, so we can load the offset into the scratch register.
+(define_insn_and_split "*vsx_extract_v8hi_<su><mode>_load"
+ [(set (match_operand:EXTHI 0 "gpc_reg_operand" "=r,v,v")
+ (any_extend:EXTHI
+ (vec_select:HI
+ (match_operand:V8HI 1 "memory_operand" "m,Z,Q")
+ (parallel [(match_operand:QI 2 "const_0_to_7_operand" "n,O,n")]))))
+ (clobber (match_scratch:DI 3 "=&b,&b,&b"))]
+ "VECTOR_MEM_VSX_P (V8HImode) && TARGET_DIRECT_MOVE_64BIT"
+ "#"
+ "&& reload_completed"
+ [(set (match_dup 0)
+ (any_extend:EXTHI (match_dup 4)))]
+{
+ rtx reg_hi = gen_rtx_REG (HImode, reg_or_subregno (operands[0]));
+ rtx mem = rs6000_adjust_vec_address (reg_hi, operands[1], operands[2],
+ operands[3], HImode);
+
+ /* We don't have a sign extend to a vector register, so we have to do
+ the load first and then a sign extend operation. */
+ if (int_reg_operand (operands[0], <MODE>mode) || <CODE> == ZERO_EXTEND)
+ operands[4] = mem;
+
+ else
+ {
+ emit_move_insn (reg_hi, mem);
+ operands[4] = reg_hi;
+ }
+}
+ [(set_attr "type" "load,fpload,fpload")
+ (set_attr "length" "8,12,12")
+ (set_attr "isa" "*,p9v,p9v")])
+
+;; Optimize extracting and zero extending a single QI element from memory.
+;; GPRs take any address. If the element number is 0, we can use normal X-FORM
+;; (reg+reg) addressing to load up the vector register. Otherwise use Q to get
+;; a single register, so we can load the offset into the scratch register. We
+;; don't have eiter a GPR load or a vector load that does sign extension, so
+;; only do the zero_extend case.
+(define_insn_and_split "*vsx_extract_v16qi_u<mode>_load"
+ [(set (match_operand:EXTQI 0 "gpc_reg_operand" "=r,v,v")
+ (zero_extend:EXTQI
+ (vec_select:QI
+ (match_operand:V16QI 1 "memory_operand" "m,Z,Q")
+ (parallel [(match_operand:QI 2 "const_0_to_15_operand" "n,O,n")]))))
+ (clobber (match_scratch:DI 3 "=&b,&b,&b"))]
+ "VECTOR_MEM_VSX_P (V16QImode) && TARGET_DIRECT_MOVE_64BIT"
+ "#"
+ "&& reload_completed"
+ [(set (match_dup 0)
+ (zero_extend:EXTQI (match_dup 4)))]
+{
+ rtx reg_qi = gen_rtx_REG (QImode, reg_or_subregno (operands[0]));
+ operands[4] = rs6000_adjust_vec_address (reg_qi, operands[1], operands[2],
+ operands[3], QImode);
+}
+ [(set_attr "type" "load,fpload,fpload")
+ (set_attr "length" "8")
+ (set_attr "isa" "*,p9v,p9v")])
+
+;; Variable V16QI/V8HI/V4SI extract
(define_insn_and_split "vsx_extract_<mode>_var"
[(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=r,r")
(unspec:<VS_scalar>
@@ -3911,14 +4049,33 @@
}
[(set_attr "isa" "p9v,*")])
-;; Variable V16QI/V8HI/V4SI extract from memory
+;; Variable V4SI extract when the vector is in memory
+(define_insn_and_split "*vsx_extract_v4si_var_load"
+ [(set (match_operand:SI 0 "gpc_reg_operand" "=r,wa")
+ (unspec:SI
+ [(match_operand:V4SI 1 "memory_operand" "Q,Q")
+ (match_operand:DI 2 "gpc_reg_operand" "r,r")]
+ UNSPEC_VSX_EXTRACT))
+ (clobber (match_scratch:DI 3 "=&b,&b"))]
+ "VECTOR_MEM_VSX_P (V4SImode) && TARGET_DIRECT_MOVE_64BIT"
+ "#"
+ "&& reload_completed"
+ [(set (match_dup 0) (match_dup 4))]
+{
+ operands[4] = rs6000_adjust_vec_address (operands[0], operands[1], operands[2],
+ operands[3], SImode);
+}
+ [(set_attr "type" "load")
+ (set_attr "length" "8")])
+
+;; Variable V8HI/V16QI extract when the vector is in memory
(define_insn_and_split "*vsx_extract_<mode>_var_load"
- [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=r")
+ [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=r,v")
(unspec:<VS_scalar>
- [(match_operand:VSX_EXTRACT_I 1 "memory_operand" "Q")
- (match_operand:DI 2 "gpc_reg_operand" "r")]
+ [(match_operand:VSX_EXTRACT_I2 1 "memory_operand" "Q,Q")
+ (match_operand:DI 2 "gpc_reg_operand" "r,r")]
UNSPEC_VSX_EXTRACT))
- (clobber (match_scratch:DI 3 "=&b"))]
+ (clobber (match_scratch:DI 3 "=&b,&b"))]
"VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_DIRECT_MOVE_64BIT"
"#"
"&& reload_completed"
@@ -3927,7 +4084,113 @@
operands[4] = rs6000_adjust_vec_address (operands[0], operands[1], operands[2],
operands[3], <VS_scalar>mode);
}
- [(set_attr "type" "load")])
+ [(set_attr "type" "load")
+ (set_attr "length" "8")
+ (set_attr "isa" "*,p9v")])
+
+;; Variable V4SI/V8HI/V16QI vector extract when the vector is in a register and
+;; combine with zero extend
+(define_insn_and_split "*vsx_extract_<mode>_uns_di_var"
+ [(set (match_operand:DI 0 "gpc_reg_operand" "=r,r")
+ (zero_extend:DI
+ (unspec:<VSX_EXTRACT_I:VS_scalar>
+ [(match_operand:VSX_EXTRACT_I 1 "gpc_reg_operand" "v,v")
+ (match_operand:DI 2 "gpc_reg_operand" "r,r")]
+ UNSPEC_VSX_EXTRACT)))
+ (clobber (match_scratch:DI 3 "=r,r"))
+ (clobber (match_scratch:V2DI 4 "=X,&v"))]
+ "VECTOR_MEM_VSX_P (<VSX_EXTRACT_I:MODE>mode) && TARGET_DIRECT_MOVE_64BIT"
+ "#"
+ "&& reload_completed"
+ [(const_int 0)]
+{
+ machine_mode smode = <VS_scalar>mode;
+ rtx reg_small = gen_rtx_REG (smode, REGNO (operands[0]));
+ rs6000_split_vec_extract_var (reg_small, operands[1], operands[2],
+ operands[3], operands[4]);
+ DONE;
+}
+ [(set_attr "isa" "p9v,*")])
+
+;; Variable V4SI vector extract when the vector is in memory, and combine with
+;; a sign or zero extend.
+(define_insn_and_split "*vsx_extract_v4si_<su><mode>_var_load"
+ [(set (match_operand:EXTSI 0 "gpc_reg_operand" "=r,wa")
+ (any_extend:EXTSI
+ (unspec:V4SI
+ [(match_operand:V4SI 1 "memory_operand" "Q,Q")
+ (match_operand:DI 2 "gpc_reg_operand" "r,r")]
+ UNSPEC_VSX_EXTRACT)))
+ (clobber (match_scratch:DI 3 "=&b,&b"))]
+ "VECTOR_MEM_VSX_P (V4SImode) && TARGET_DIRECT_MOVE_64BIT"
+ "#"
+ "&& reload_completed"
+ [(set (match_dup 0)
+ (any_extend:EXTSI (match_dup 4)))]
+{
+ rtx reg_si = gen_rtx_REG (SImode, REGNO (operands[0]));
+ operands[4] = rs6000_adjust_vec_address (reg_si, operands[1], operands[2],
+ operands[3], SImode);
+}
+ [(set_attr "type" "load,fpload")
+ (set_attr "length" "8")])
+
+;; Variable V8HI vector extract when the vector is in memory, and combine with
+;; a sign or zero extend.
+(define_insn_and_split "*vsx_extract_v8hi_<su><mode>_var_load"
+ [(set (match_operand:EXTHI 0 "gpc_reg_operand" "=r,v")
+ (any_extend:EXTHI
+ (unspec:V8HI
+ [(match_operand:V8HI 1 "memory_operand" "Q,Q")
+ (match_operand:DI 2 "gpc_reg_operand" "r,r")]
+ UNSPEC_VSX_EXTRACT)))
+ (clobber (match_scratch:DI 3 "=&b,&b"))]
+ "VECTOR_MEM_VSX_P (V4SImode) && TARGET_DIRECT_MOVE_64BIT"
+ "#"
+ "&& reload_completed"
+ [(set (match_dup 0)
+ (any_extend:EXTHI (match_dup 4)))]
+{
+ rtx reg_hi = gen_rtx_REG (HImode, REGNO (operands[0]));
+ rtx mem = rs6000_adjust_vec_address (reg_hi, operands[1], operands[2],
+ operands[3], HImode);
+
+ /* Altivec load HImode does not have a sign extend version. */
+ if (int_reg_operand (operands[0], HImode) || <CODE> == ZERO_EXTEND)
+ operands[4] = mem;
+ else
+ {
+ emit_move_insn (reg_hi, mem);
+ operands[4] = reg_hi;
+ }
+}
+ [(set_attr "type" "load,fpload")
+ (set_attr "length" "8")
+ (set_attr "isa" "*,p9v")])
+
+;; Variable V16QI vector extract when the vector is in memory, and combine with
+;; a zero extend. There is no sign extend version of load byte.
+(define_insn_and_split "*vsx_extract_v4si_u<mode>_var_load"
+ [(set (match_operand:EXTQI 0 "gpc_reg_operand" "=r,wa")
+ (any_extend:EXTQI
+ (unspec:V16QI
+ [(match_operand:V16QI 1 "memory_operand" "Q,Q")
+ (match_operand:DI 2 "gpc_reg_operand" "r,r")]
+ UNSPEC_VSX_EXTRACT)))
+ (clobber (match_scratch:DI 3 "=&b,&b"))]
+ "VECTOR_MEM_VSX_P (V16QImode) && TARGET_DIRECT_MOVE_64BIT"
+ "#"
+ "&& reload_completed"
+ [(set (match_dup 0)
+ (zero_extend:EXTQI (match_dup 4)))]
+{
+ rtx reg_qi = gen_rtx_REG (QImode, REGNO (operands[0]));
+ operands[4] = rs6000_adjust_vec_address (reg_qi, operands[1], operands[2],
+ operands[3], QImode);
+}
+ [(set_attr "type" "load,fpload")
+ (set_attr "length" "8")
+ (set_attr "isa" "*,p9v")])
;; ISA 3.1 extract
(define_expand "vextractl<mode>"
@@ -4300,6 +4563,248 @@
}
[(set_attr "isa" "<FL_CONV:VSisa>")])
+;; Optimize <type> f = (<ftype>) vec_extract (V4SI, <n>).
+;;
+;; <ftype> is a hardware floating point type that conversions are directly
+;; supported (SFmode, DFmode, KFmode, maybe TFmode).
+;;
+;; The element number (<n>) is constant.
+;;
+;; The vector is in memory, and we convert the vector extraction to a load to
+;; the VSX registers and then convert, avoiding a direct move.
+;;
+;; For SFmode/DFmode, we can use all vector registers. For KFmode/TFmode, we
+;; have to use only the Altivec regsiters.
+(define_insn_and_split "*vsx_ext_v4si_fl_<mode>_load"
+ [(set (match_operand:FL_CONV 0 "gpc_reg_operand" "=<VSX_EX_FL>,<VSX_EX_FL>")
+ (float:FL_CONV
+ (vec_select:SI
+ (match_operand:V4SI 1 "memory_operand" "Z,Q")
+ (parallel [(match_operand:QI 2 "const_0_to_3_operand" "O,n")]))))
+ (clobber (match_scratch:DI 3 "=&b,&b"))]
+ "VECTOR_MEM_VSX_P (V4SImode) && TARGET_DIRECT_MOVE_64BIT"
+ "#"
+ "&& reload_completed"
+ [(set (match_dup 4)
+ (sign_extend:DI (match_dup 5)))
+ (set (match_dup 0)
+ (float:<FL_CONV:MODE> (match_dup 4)))]
+{
+ rtx reg_si = gen_rtx_REG (SImode, reg_or_subregno (operands[0]));
+ operands[4] = gen_rtx_REG (DImode, reg_or_subregno (operands[0]));
+ operands[5] = rs6000_adjust_vec_address (reg_si, operands[1], operands[2],
+ operands[3], SImode);
+}
+ [(set_attr "isa" "<FL_CONV:VSisa>")])
+
+(define_insn_and_split "*vsx_ext_v4si_ufl_<mode>_load"
+ [(set (match_operand:FL_CONV 0 "gpc_reg_operand" "=<VSX_EX_FL>,<VSX_EX_FL>")
+ (unsigned_float:FL_CONV
+ (vec_select:SI
+ (match_operand:V4SI 1 "memory_operand" "Z,Q")
+ (parallel [(match_operand:QI 2 "const_0_to_3_operand" "O,n")]))))
+ (clobber (match_scratch:DI 3 "=&b,&b"))]
+ "VECTOR_MEM_VSX_P (V4SImode) && TARGET_DIRECT_MOVE_64BIT"
+ "#"
+ "&& reload_completed"
+ [(set (match_dup 4)
+ (zero_extend:DI (match_dup 5)))
+ (set (match_dup 0)
+ (unsigned_float:<FL_CONV:MODE> (match_dup 4)))]
+{
+ rtx reg_si = gen_rtx_REG (SImode, reg_or_subregno (operands[0]));
+ operands[4] = gen_rtx_REG (DImode, reg_or_subregno (operands[0]));
+ operands[5] = rs6000_adjust_vec_address (reg_si, operands[1], operands[2],
+ operands[3], SImode);
+}
+ [(set_attr "isa" "<FL_CONV:VSisa>")])
+
+;; Optimize <type> f = (<ftype>) vec_extract (V4SI, <n>).
+;;
+;; <ftype> is a hardware floating point type that conversions are directly
+;; supported (SFmode, DFmode, KFmode, maybe TFmode).
+;;
+;; The element number (<n>) is variable.
+;;
+;; The vector is in memory, and we convert the vector extraction to a load to
+;; the VSX registers and then convert, avoiding a direct move.
+;;
+;; For SFmode/DFmode, we can use all vector registers. For KFmode/TFmode, we
+;; have to use only the Altivec regsiters.
+(define_insn_and_split "*vsx_ext_v4si_fl_<mode>_var_load"
+ [(set (match_operand:FL_CONV 0 "gpc_reg_operand" "=<VSX_EX_FL>")
+ (float:FL_CONV
+ (unspec:SI
+ [(match_operand:V4SI 1 "memory_operand" "Q")
+ (match_operand:DI 2 "gpc_reg_operand" "r")]
+ UNSPEC_VSX_EXTRACT)))
+ (clobber (match_scratch:DI 3 "=&b"))]
+ "VECTOR_MEM_VSX_P (V4SImode) && TARGET_DIRECT_MOVE_64BIT"
+ "#"
+ "&& reload_completed"
+ [(set (match_dup 4)
+ (sign_extend:DI (match_dup 5)))
+ (set (match_dup 0)
+ (float:<FL_CONV:MODE> (match_dup 4)))]
+{
+ rtx reg_si = gen_rtx_REG (SImode, reg_or_subregno (operands[0]));
+ operands[4] = gen_rtx_REG (DImode, reg_or_subregno (operands[0]));
+ operands[5] = rs6000_adjust_vec_address (reg_si, operands[1], operands[2],
+ operands[3], SImode);
+}
+ [(set_attr "isa" "<FL_CONV:VSisa>")])
+
+(define_insn_and_split "*vsx_ext_v4si_ufl_<mode>_var_load"
+ [(set (match_operand:FL_CONV 0 "gpc_reg_operand" "=<VSX_EX_FL>")
+ (unsigned_float:FL_CONV
+ (unspec:SI
+ [(match_operand:V4SI 1 "memory_operand" "Q")
+ (match_operand:DI 2 "gpc_reg_operand" "r")]
+ UNSPEC_VSX_EXTRACT)))
+ (clobber (match_scratch:DI 3 "=&b"))]
+ "VECTOR_MEM_VSX_P (V4SImode) && TARGET_DIRECT_MOVE_64BIT"
+ "#"
+ "&& reload_completed"
+ [(set (match_dup 4)
+ (zero_extend:DI (match_dup 5)))
+ (set (match_dup 0)
+ (unsigned_float:<FL_CONV:MODE> (match_dup 4)))]
+{
+ rtx reg_si = gen_rtx_REG (SImode, reg_or_subregno (operands[0]));
+ operands[4] = gen_rtx_REG (DImode, reg_or_subregno (operands[0]));
+ operands[5] = rs6000_adjust_vec_address (reg_si, operands[1], operands[2],
+ operands[3], SImode);
+}
+ [(set_attr "isa" "<FL_CONV:VSisa>")])
+
+;; Optimize <type> f = (<ftype>) vec_extract (V8HI/V16QI, <n>).
+;;
+;; <ftype> is a hardware floating point type that conversions are directly
+;; supported (SFmode, DFmode, KFmode, maybe TFmode).
+;;
+;; The element number (<n>) is constant.
+;;
+;; The vector is in memory, and we convert the vector extraction to a load to
+;; the VSX registers and then convert, avoiding a direct move.
+;;
+;; For SFmode/DFmode, we can use all vector registers. For KFmode/TFmode, we
+;; have to use only the Altivec regsiters.
+(define_insn_and_split "*vsx_ext_<VSX_EXTRACT_I2:mode>_fl_<FL_CONV:mode>_load"
+ [(set (match_operand:FL_CONV 0 "gpc_reg_operand"
+ "=<FL_CONV:VSX_EX_FL>,<FL_CONV:VSX_EX_FL>")
+ (float:FL_CONV
+ (vec_select:<VSX_EXTRACT_I2:VS_scalar>
+ (match_operand:VSX_EXTRACT_I2 1 "memory_operand" "Z,Q")
+ (parallel [(match_operand:QI 2 "<VSX_EXTRACT_PREDICATE>" "O,n")]))))
+ (clobber (match_scratch:DI 3 "=&b,&b"))
+ (clobber (match_scratch:DI 4 "=v,v"))]
+ "VECTOR_MEM_VSX_P (<VSX_EXTRACT_I2:MODE>mode) && TARGET_POWERPC64
+ && TARGET_P9_VECTOR"
+ "#"
+ "&& reload_completed"
+ [(set (match_dup 5)
+ (match_dup 6))
+ (set (match_dup 4)
+ (sign_extend:DI (match_dup 5)))
+ (set (match_dup 0)
+ (float:<FL_CONV:MODE> (match_dup 4)))]
+{
+ machine_mode smode = <VSX_EXTRACT_I2:VS_scalar>mode;
+ operands[5] = gen_rtx_REG (smode, reg_or_subregno (operands[4]));
+ operands[6] = rs6000_adjust_vec_address (operands[5], operands[1],
+ operands[2], operands[3],
+ smode);
+})
+
+(define_insn_and_split "*vsx_ext_<VSX_EXTRACT_I2:mode>_ufl_<FL_CONV:mode>_load"
+ [(set (match_operand:FL_CONV 0 "gpc_reg_operand"
+ "=<FL_CONV:VSX_EX_FL>,<FL_CONV:VSX_EX_FL>")
+ (unsigned_float:FL_CONV
+ (vec_select:<VSX_EXTRACT_I2:VS_scalar>
+ (match_operand:VSX_EXTRACT_I2 1 "memory_operand" "Z,Q")
+ (parallel [(match_operand:QI 2 "<VSX_EXTRACT_PREDICATE>" "O,n")]))))
+ (clobber (match_scratch:DI 3 "=&b,&b"))]
+ "VECTOR_MEM_VSX_P (<VSX_EXTRACT_I2:MODE>mode) && TARGET_POWERPC64
+ && TARGET_P9_VECTOR"
+ "#"
+ "&& reload_completed"
+ [(set (match_dup 4)
+ (zero_extend:DI (match_dup 5)))
+ (set (match_dup 0)
+ (unsigned_float:<FL_CONV:MODE> (match_dup 4)))]
+{
+ machine_mode smode = <VSX_EXTRACT_I2:VS_scalar>mode;
+ rtx reg_small = gen_rtx_REG (smode, reg_or_subregno (operands[0]));
+ operands[4] = gen_rtx_REG (DImode, reg_or_subregno (operands[0]));
+ operands[5] = rs6000_adjust_vec_address (reg_small, operands[1],
+ operands[2], operands[3],
+ smode);
+})
+
+;; Optimize <type> f = (<ftype>) vec_extract (V8HI/V16QI, <n>).
+;;
+;; <ftype> is a hardware floating point type that conversions are directly
+;; supported (SFmode, DFmode, KFmode, maybe TFmode).
+;;
+;; The element number (<n>) is variable.
+;;
+;; The vector is in memory, and we convert the vector extraction to a load to
+;; the VSX registers and then convert, avoiding a direct move.
+;;
+;; For SFmode/DFmode, we can use all vector registers. For KFmode/TFmode, we
+;; have to use only the Altivec regsiters.
+(define_insn_and_split "*vsx_ext_<VSX_EXTRACT_I2:mode>_fl_<FL_CONV:mode>_vl"
+ [(set (match_operand:FL_CONV 0 "gpc_reg_operand" "=<FL_CONV:VSX_EX_FL>")
+ (float:FL_CONV
+ (unspec:<VSX_EXTRACT_I2:VS_scalar>
+ [(match_operand:VSX_EXTRACT_I2 1 "memory_operand" "Q")
+ (match_operand:DI 2 "gpc_reg_operand" "r")]
+ UNSPEC_VSX_EXTRACT)))
+ (clobber (match_scratch:DI 3 "=&b"))
+ (clobber (match_scratch:DI 4 "=v"))]
+ "VECTOR_MEM_VSX_P (<VSX_EXTRACT_I2:MODE>mode) && TARGET_POWERPC64
+ && TARGET_P9_VECTOR"
+ "#"
+ "&& reload_completed"
+ [(set (match_dup 5)
+ (match_dup 6))
+ (set (match_dup 4)
+ (sign_extend:DI (match_dup 5)))
+ (set (match_dup 0)
+ (float:<FL_CONV:MODE> (match_dup 4)))]
+{
+ machine_mode smode = <VSX_EXTRACT_I2:VS_scalar>mode;
+ operands[5] = gen_rtx_REG (smode, reg_or_subregno (operands[4]));
+ operands[6] = rs6000_adjust_vec_address (operands[5], operands[1],
+ operands[2], operands[3],
+ smode);
+})
+
+(define_insn_and_split "*vsx_ext_<VSX_EXTRACT_I2:mode>_ufl_<FL_CONV:mode>_vl"
+ [(set (match_operand:FL_CONV 0 "gpc_reg_operand" "=<FL_CONV:VSX_EX_FL>")
+ (unsigned_float:FL_CONV
+ (unspec:<VSX_EXTRACT_I2:VS_scalar>
+ [(match_operand:VSX_EXTRACT_I2 1 "memory_operand" "Q")
+ (match_operand:DI 2 "gpc_reg_operand" "r")]
+ UNSPEC_VSX_EXTRACT)))
+ (clobber (match_scratch:DI 3 "=&b"))]
+ "VECTOR_MEM_VSX_P (<VSX_EXTRACT_I2:MODE>mode) && TARGET_POWERPC64
+ && TARGET_P9_VECTOR"
+ "#"
+ "&& reload_completed"
+ [(set (match_dup 4)
+ (zero_extend:DI (match_dup 5)))
+ (set (match_dup 0)
+ (unsigned_float:<FL_CONV:MODE> (match_dup 4)))]
+{
+ machine_mode smode = <VSX_EXTRACT_I2:VS_scalar>mode;
+ rtx reg_small = gen_rtx_REG (smode, reg_or_subregno (operands[0]));
+ operands[4] = gen_rtx_REG (DImode, reg_or_subregno (operands[0]));
+ operands[5] = rs6000_adjust_vec_address (reg_small, operands[1],
+ operands[2], operands[3],
+ smode);
+})
+
;; V4SI/V8HI/V16QI set operation on ISA 3.0
(define_insn "vsx_set_<mode>_p9"
[(set (match_operand:VSX_EXTRACT_I 0 "gpc_reg_operand" "=<VSX_EX>")
--git a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p8.c b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p8.c
index f3b9556b2e6..555be18a3ea 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p8.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p8.c
@@ -21,7 +21,7 @@
/* { dg-final { scan-assembler-times {\msrdi\M} 3 { target lp64 } } } */
/* { dg-final { scan-assembler-times "extsb" 2 } } */
/* { dg-final { scan-assembler-times {\mvspltb\M} 3 { target lp64 } } } */
-/* { dg-final { scan-assembler-times {\mrlwinm\M} 4 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mrlwinm\M} 2 { target lp64 } } } */
/* multiple codegen variations for -m32. */
/* { dg-final { scan-assembler-times {\mrlwinm\M} 3 { target ilp32 } } } */
--git a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p8.c b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p8.c
index 75eaf25943b..c9e9a26ab06 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p8.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p8.c
@@ -7,14 +7,14 @@
// Targeting P8 (LE) and (BE). 6 tests total.
// P8 LE constant: vspltw, mfvsrwz, (1:extsw/2:rldicl)
-// P8 LE variables: subfic, sldi, mtvsrd, xxpermdi, vslo, mfvsrd, sradi, (1:extsw/5:rldicl))
+// P8 LE variables: subfic, sldi, mtvsrd, xxpermdi, vslo, mfvsrd, sradi, (1:extsw/2:rldicl))
// P8 BE constant: vspltw, mfvsrwz, (1:extsw/2:rldicl)
// P8 BE variables: sldi, mtvsrd, xxpermdi, vslo, mfvsrd, sradi, (1:extsw/2:rldicl))
/* { dg-final { scan-assembler-times {\mvspltw\M} 3 { target lp64 } } } */
/* { dg-final { scan-assembler-times {\mmfvsrwz\M} 3 { target lp64 } } } */
-/* { dg-final { scan-assembler-times {\mrldicl\M} 7 { target { le } } } } */
-/* { dg-final { scan-assembler-times {\mrldicl\M} 4 { target { lp64 && be } } } } */
+/* { dg-final { scan-assembler-times {\mrldicl\M} 5 { target { le } } } } */
+/* { dg-final { scan-assembler-times {\mrldicl\M} 2 { target { lp64 && be } } } } */
/* { dg-final { scan-assembler-times {\msubfic\M} 3 { target { le } } } } */
/* { dg-final { scan-assembler-times {\msldi\M} 3 { target lp64 } } } */
/* { dg-final { scan-assembler-times {\mmtvsrd\M} 3 { target lp64 } } } */
--git a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-short.p8.c b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-short.p8.c
index 0ddecb4e4b5..2daebb86f21 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-short.p8.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-short.p8.c
@@ -24,7 +24,7 @@
/* { dg-final { scan-assembler-times "mfvsrd" 6 { target lp64 } } } */
/* { dg-final { scan-assembler-times "srdi" 3 { target lp64 } } } */
/* { dg-final { scan-assembler-times "extsh" 2 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "rlwinm" 4 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "rlwinm" 2 { target lp64 } } } */
/* -m32 codegen tests. */
/* { dg-final { scan-assembler-times {\mli\M} 6 { target ilp32 } } } */
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2021-06-03 17:52 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-03 17:52 [gcc(refs/users/meissner/heads/work054)] PR 93230: Fold sign/zero extension into vec_extract Michael Meissner
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).