[gcc(refs/users/meissner/heads/work054)] PR 93230: Fold sign/zero extension into vec

public inbox for gcc-cvs@sourceware.org
help / color / mirror / Atom feed

* [gcc(refs/users/meissner/heads/work054)] PR 93230: Fold sign/zero extension into vec_extract.
@ 2021-06-03 17:52 Michael Meissner
  0 siblings, 0 replies; only message in thread
From: Michael Meissner @ 2021-06-03 17:52 UTC (permalink / raw)
  To: gcc-cvs

https://gcc.gnu.org/g:8303b8a41b4c5aed3fc874097391f64b93c78b08

commit 8303b8a41b4c5aed3fc874097391f64b93c78b08
Author: Michael Meissner <meissner@linux.ibm.com>
Date:   Thu Jun 3 13:51:44 2021 -0400

    PR 93230: Fold sign/zero extension into vec_extract.
    
    gcc/
    2021-06-03  Michael Meissner  <meissner@linux.ibm.com>
    
            PR target/93230
            * config/rs6000/rs6000.c (rs6000_split_vec_extract_var): Remove
            support for handling MEM, users call rs6000_adjust_vec_address
            directly.
            * config/rs6000/vsx.md (VSX_EX_FL): New mode attribute.
            (vsx_extract_v4sf_<mode>_load): Rename to vsx_extract_v4sf_load.
            (vsx_extract_v4sf_to_df_load): New insn to combine vec_extract of
            SFmode from memory being converted to DFmode.
            (vsx_extract_v4si_<su><mode>_load): New insn to support V4SI
            vec_extract from memory being converted to DImode directly without
            an extra sign/zero extension.
            (vsx_extract_v8hi_<su><mode>_load): New insn to support V8HI
            vec_extract from memory being converted to DImode directly without
            an extra sign/zero extension.
            (vsx_extract_v16qi_u<mode>_load): New insn to support V16QI
            vec_extract from memory being converted to DImode directly without
            an extra zero extension.
            (vsx_extract_v4si_var_load): Split V4SI extract from other small
            integers, and add support for loading up vector registers with
            sign/zero extension directly.
            (vsx_extract_<mode>_var_load, VSX_EXTRACT_I2 iterator): Split
            V8HI/V16QI vector extract from memory to handle loading vector
            registers in addition to GPR registers.
            (vsx_extract_<mode>_uns_di_var): New insn to optimize extracting a
            small integer from a vector in a register and zero extending it to
            DImode.
            (vsx_extract_v4si_<su><mode>_var_load): New insns to support
            combining a V4SI variable vector extract from memory with sign or
            zero extension.
            (vsx_extract_v8hi_<su><mode>_var_load): New insns to support
            combining a V8HI variable vector extract from memory with sign or
            zero extension.
            (vsx_extract_v4si_u<mode>_var_load): New insns to support
            combining a V16QI variable vector extract from memory with zero
            extension.
            (vsx_ext_v4si_fl_<mode>_load): New insn to support a V4SI vector
            extract that is converted to floating point to avoid doing a
            direct move.
            (vsx_ext_v4si_ufl_<mode>_load):  New insn to support an unsigned
            V4SI vector extract that is converted to floating point to avoid
            doing a direct move.
            (vsx_ext_v4si_fl_<mode>_var_load): New insn to support a V4SI
            variable vector extract that is converted to floating point to
            avoid doing a direct move.
            (vsx_ext_v4si_ufl_<mode>_var_load): New insn to support an
            unsigned V4SI variable vector extract that is converted to
            floating point to avoid doing a direct move.
            (vsx_ext_<VSX_EXTRACT_I2:mode>_fl_<FL_CONV:mode>_load): New insns
            to support a V8HI/V16QI vector extract that is converted to
            floating point to avoid doing a direct move.
            (vsx_ext_<VSX_EXTRACT_I2:mode>_ufl_<FL_CONV:mode>_load): New insns
            to support an unsigned V8HI/V16QI vector extract that is converted
            to floating point to avoid doing a direct move.
            (vsx_ext_<VSX_EXTRACT_I2:mode>_fl_<FL_CONV:mode>_vl): New insns to
            support a variable V8HI/V16QI vector extract that is converted to
            floating point to avoid doing a direct move.
            (vsx_ext_<VSX_EXTRACT_I2:mode>_ufl_<FL_CONV:mode>_vl): New insns
            to support an unsigned variable V8HI/V16QI vector extract that is
            converted to floating point to avoid doing a direct move.
    
    gcc/testsuite/
    2021-06-03  Michael Meissner  <meissner@linux.ibm.com>
    
            PR target/93230
            * gcc.target/powerpc/fold-vec-extract-char.p8.c: Adjust
            instruction counts.
            * gcc.target/powerpc/fold-vec-extract-int.p8.c: Adjust
            instruction counts.
            * gcc.target/powerpc/fold-vec-extract-short.p8.c: Adjust
            instruction counts.
            * gcc.target/powerpc/pcrel-opt-inc-di.c: Fix typo.

Diff:
---
 gcc/config/rs6000/rs6000.c                         |  13 +-
 gcc/config/rs6000/vsx.md                           | 525 ++++++++++++++++++++-
 .../gcc.target/powerpc/fold-vec-extract-char.p8.c  |   2 +-
 .../gcc.target/powerpc/fold-vec-extract-int.p8.c   |   6 +-
 .../gcc.target/powerpc/fold-vec-extract-short.p8.c |   2 +-
 5 files changed, 521 insertions(+), 27 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 420a1bb9521..c4f5f359692 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -8026,18 +8026,7 @@ rs6000_split_vec_extract_var (rtx dest, rtx src, rtx element, rtx tmp_gpr,
 
   gcc_assert (byte_shift >= 0);
 
-  /* If we are given a memory address, optimize to load just the element.  We
-     don't have to adjust the vector element number on little endian
-     systems.  */
-  if (MEM_P (src))
-    {
-      emit_move_insn (dest,
-		      rs6000_adjust_vec_address (dest, src, element, tmp_gpr,
-						 scalar_mode));
-      return;
-    }
-
-  else if (REG_P (src) || SUBREG_P (src))
+  if (REG_P (src) || SUBREG_P (src))
     {
       int num_elements = GET_MODE_NUNITS (mode);
       int bits_in_element = mode_to_bits (GET_MODE_INNER (mode));
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index bc708113865..b49d5b44573 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -253,6 +253,13 @@
 			       (TF "TARGET_FLOAT128_HW
 				    && FLOAT128_IEEE_P (TFmode)")])
 
+;; Mode attribute to give the constraint for the floating point type for vector
+;; extract and convert to floating point operations.
+(define_mode_attr VSX_EX_FL [(SF "wa")
+			     (DF "wa")
+			     (KF "v")
+			     (TF "v")])
+
 ;; Iterator for the 2 short vector types to do a splat from an integer
 (define_mode_iterator VSX_SPLAT_I [V16QI V8HI])
 
@@ -3443,7 +3450,9 @@
   DONE;
 })
 
-;; Variable V2DI/V2DF extract from memory
+;; Variable V2DI/V2DF extract from memory.  We separate these insns, because
+;; the compiler will sometimes have the vector value in a register, but then
+;; decide the best way to do this is to do a store and then a load.
 (define_insn_and_split "*vsx_extract_<mode>_var_load"
   [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=wa,r")
 	(unspec:<VS_scalar> [(match_operand:VSX_D 1 "memory_operand" "Q,Q")
@@ -3494,7 +3503,7 @@
   [(set_attr "length" "8")
    (set_attr "type" "fp")])
 
-(define_insn_and_split "*vsx_extract_v4sf_<mode>_load"
+(define_insn_and_split "*vsx_extract_v4sf_load"
   [(set (match_operand:SF 0 "register_operand" "=f,v,v,?r")
 	(vec_select:SF
 	 (match_operand:V4SF 1 "memory_operand" "m,Z,m,m")
@@ -3512,7 +3521,29 @@
    (set_attr "length" "8")
    (set_attr "isa" "*,p7v,p9v,*")])
 
-;; Variable V4SF extract from a register
+;; V4SF extract to DFmode
+(define_insn_and_split "*vsx_extract_v4sf_to_df_load"
+  [(set (match_operand:DF 0 "register_operand" "=f,v")
+	(float_extend:DF
+	 (vec_select:SF
+	  (match_operand:V4SF 1 "memory_operand" "m,m")
+	  (parallel [(match_operand:QI 2 "const_0_to_3_operand" "n,n")]))))
+   (clobber (match_scratch:P 3 "=&b,&b"))]
+  "VECTOR_MEM_VSX_P (V4SFmode)"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0)
+	(float_extend:DF (match_dup 4)))]
+{
+  rtx reg_sf = gen_rtx_REG (SFmode, reg_or_subregno (operands[0]));
+  operands[4] = rs6000_adjust_vec_address (reg_sf, operands[1], operands[2],
+					   operands[3], SFmode);
+}
+  [(set_attr "type" "fpload")
+   (set_attr "length" "8")
+   (set_attr "isa" "*,p8v")])
+
+;; Variable V4SF extract
 (define_insn_and_split "vsx_extract_v4sf_var"
   [(set (match_operand:SF 0 "gpc_reg_operand" "=wa")
 	(unspec:SF [(match_operand:V4SF 1 "gpc_reg_operand" "v")
@@ -3547,6 +3578,26 @@
 }
   [(set_attr "type" "fpload,load")])
 
+(define_insn_and_split "*vsx_extract_v4sf_to_df_var_load"
+  [(set (match_operand:DF 0 "gpc_reg_operand" "=wa")
+	(float_extend:DF
+	 (unspec:SF [(match_operand:V4SF 1 "memory_operand" "Q")
+		     (match_operand:DI 2 "gpc_reg_operand" "r")]
+		    UNSPEC_VSX_EXTRACT)))
+   (clobber (match_scratch:DI 3 "=&b"))]
+  "VECTOR_MEM_VSX_P (V4SFmode) && TARGET_DIRECT_MOVE_64BIT"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0)
+	(float_extend:DF (match_dup 4)))]
+{
+  rtx reg_sf = gen_rtx_REG (SFmode, reg_or_subregno (operands[0]));
+  operands[4] = rs6000_adjust_vec_address (reg_sf, operands[1], operands[2],
+					   operands[3], SFmode);
+}
+  [(set_attr "type" "fpload")
+   (set_attr "length" "8")])
+
 ;; Expand the builtin form of xxpermdi to canonical rtl.
 (define_expand "vsx_xxpermdi_<mode>"
   [(match_operand:VSX_L 0 "vsx_register_operand")
@@ -3891,7 +3942,94 @@
   [(set_attr "type" "load")
    (set_attr "length" "8")])
 
-;; Variable V16QI/V8HI/V4SI extract from a register
+;; Optimize extracting and extending a single SI element from memory.  GPRs
+;; take any address.  If the element number is 0, we can use normal X-FORM
+;; (reg+reg) addressing to load up the vector register.  Otherwise use Q to get
+;; a single register, so we can load the offset into the scratch register.
+(define_insn_and_split "*vsx_extract_v4si_<su><mode>_load"
+  [(set (match_operand:EXTSI 0 "gpc_reg_operand" "=r,wa,wa")
+	(any_extend:EXTSI
+	 (vec_select:SI
+	  (match_operand:V4SI 1 "memory_operand" "m,Z,Q")
+	  (parallel [(match_operand:QI 2 "const_0_to_3_operand" "n,O,n")]))))
+   (clobber (match_scratch:DI 3 "=&b,&b,&b"))]
+  "VECTOR_MEM_VSX_P (V4SImode) && TARGET_DIRECT_MOVE_64BIT"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0)
+	(any_extend:EXTSI (match_dup 4)))]
+{
+  rtx reg_si = gen_rtx_REG (SImode, reg_or_subregno (operands[0]));
+  operands[4] = rs6000_adjust_vec_address (reg_si, operands[1], operands[2],
+					   operands[3], SImode);
+}
+  [(set_attr "type" "load,fpload,fpload")
+   (set_attr "length" "8")])
+
+;; Optimize extracting and extending a single HI element from memory.  GPRs
+;; take any address.  If the element number is 0, we can use normal X-FORM
+;; (reg+reg) addressing to load up the vector register.  Otherwise use Q to get
+;; a single register, so we can load the offset into the scratch register.
+(define_insn_and_split "*vsx_extract_v8hi_<su><mode>_load"
+  [(set (match_operand:EXTHI 0 "gpc_reg_operand" "=r,v,v")
+	(any_extend:EXTHI
+	 (vec_select:HI
+	  (match_operand:V8HI 1 "memory_operand" "m,Z,Q")
+	  (parallel [(match_operand:QI 2 "const_0_to_7_operand" "n,O,n")]))))
+   (clobber (match_scratch:DI 3 "=&b,&b,&b"))]
+  "VECTOR_MEM_VSX_P (V8HImode) && TARGET_DIRECT_MOVE_64BIT"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0)
+	(any_extend:EXTHI (match_dup 4)))]
+{
+  rtx reg_hi = gen_rtx_REG (HImode, reg_or_subregno (operands[0]));
+  rtx mem = rs6000_adjust_vec_address (reg_hi, operands[1], operands[2],
+				       operands[3], HImode);
+
+  /* We don't have a sign extend to a vector register, so we have to do
+     the load first and then a sign extend operation.  */
+  if (int_reg_operand (operands[0], <MODE>mode) || <CODE> == ZERO_EXTEND)
+    operands[4] = mem;
+
+  else
+    {
+      emit_move_insn (reg_hi, mem);
+      operands[4] = reg_hi;
+    }
+}
+  [(set_attr "type" "load,fpload,fpload")
+   (set_attr "length" "8,12,12")
+   (set_attr "isa" "*,p9v,p9v")])
+
+;; Optimize extracting and zero extending a single QI element from memory.
+;; GPRs take any address.  If the element number is 0, we can use normal X-FORM
+;; (reg+reg) addressing to load up the vector register.  Otherwise use Q to get
+;; a single register, so we can load the offset into the scratch register.  We
+;; don't have eiter a GPR load or a vector load that does sign extension, so
+;; only do the zero_extend case.
+(define_insn_and_split "*vsx_extract_v16qi_u<mode>_load"
+  [(set (match_operand:EXTQI 0 "gpc_reg_operand" "=r,v,v")
+	(zero_extend:EXTQI
+	 (vec_select:QI
+	  (match_operand:V16QI 1 "memory_operand" "m,Z,Q")
+	  (parallel [(match_operand:QI 2 "const_0_to_15_operand" "n,O,n")]))))
+   (clobber (match_scratch:DI 3 "=&b,&b,&b"))]
+  "VECTOR_MEM_VSX_P (V16QImode) && TARGET_DIRECT_MOVE_64BIT"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0)
+	(zero_extend:EXTQI (match_dup 4)))]
+{
+  rtx reg_qi = gen_rtx_REG (QImode, reg_or_subregno (operands[0]));
+  operands[4] = rs6000_adjust_vec_address (reg_qi, operands[1], operands[2],
+					   operands[3], QImode);
+}
+  [(set_attr "type" "load,fpload,fpload")
+   (set_attr "length" "8")
+   (set_attr "isa" "*,p9v,p9v")])
+
+;; Variable V16QI/V8HI/V4SI extract
 (define_insn_and_split "vsx_extract_<mode>_var"
   [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=r,r")
 	(unspec:<VS_scalar>
@@ -3911,14 +4049,33 @@
 }
   [(set_attr "isa" "p9v,*")])
 
-;; Variable V16QI/V8HI/V4SI extract from memory
+;; Variable V4SI extract when the vector is in memory
+(define_insn_and_split "*vsx_extract_v4si_var_load"
+  [(set (match_operand:SI 0 "gpc_reg_operand" "=r,wa")
+	(unspec:SI
+	 [(match_operand:V4SI 1 "memory_operand" "Q,Q")
+	  (match_operand:DI 2 "gpc_reg_operand" "r,r")]
+	 UNSPEC_VSX_EXTRACT))
+   (clobber (match_scratch:DI 3 "=&b,&b"))]
+  "VECTOR_MEM_VSX_P (V4SImode) && TARGET_DIRECT_MOVE_64BIT"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0) (match_dup 4))]
+{
+  operands[4] = rs6000_adjust_vec_address (operands[0], operands[1], operands[2],
+					   operands[3], SImode);
+}
+  [(set_attr "type" "load")
+   (set_attr "length" "8")])
+
+;; Variable V8HI/V16QI extract when the vector is in memory
 (define_insn_and_split "*vsx_extract_<mode>_var_load"
-  [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=r")
+  [(set (match_operand:<VS_scalar> 0 "gpc_reg_operand" "=r,v")
 	(unspec:<VS_scalar>
-	 [(match_operand:VSX_EXTRACT_I 1 "memory_operand" "Q")
-	  (match_operand:DI 2 "gpc_reg_operand" "r")]
+	 [(match_operand:VSX_EXTRACT_I2 1 "memory_operand" "Q,Q")
+	  (match_operand:DI 2 "gpc_reg_operand" "r,r")]
 	 UNSPEC_VSX_EXTRACT))
-   (clobber (match_scratch:DI 3 "=&b"))]
+   (clobber (match_scratch:DI 3 "=&b,&b"))]
   "VECTOR_MEM_VSX_P (<MODE>mode) && TARGET_DIRECT_MOVE_64BIT"
   "#"
   "&& reload_completed"
@@ -3927,7 +4084,113 @@
   operands[4] = rs6000_adjust_vec_address (operands[0], operands[1], operands[2],
 					   operands[3], <VS_scalar>mode);
 }
-  [(set_attr "type" "load")])
+  [(set_attr "type" "load")
+   (set_attr "length" "8")
+   (set_attr "isa" "*,p9v")])
+
+;; Variable V4SI/V8HI/V16QI vector extract when the vector is in a register and
+;; combine with zero extend
+(define_insn_and_split "*vsx_extract_<mode>_uns_di_var"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=r,r")
+	(zero_extend:DI
+	 (unspec:<VSX_EXTRACT_I:VS_scalar>
+	  [(match_operand:VSX_EXTRACT_I 1 "gpc_reg_operand" "v,v")
+	   (match_operand:DI 2 "gpc_reg_operand" "r,r")]
+	  UNSPEC_VSX_EXTRACT)))
+   (clobber (match_scratch:DI 3 "=r,r"))
+   (clobber (match_scratch:V2DI 4 "=X,&v"))]
+  "VECTOR_MEM_VSX_P (<VSX_EXTRACT_I:MODE>mode) && TARGET_DIRECT_MOVE_64BIT"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  machine_mode smode = <VS_scalar>mode;
+  rtx reg_small = gen_rtx_REG (smode, REGNO (operands[0]));
+  rs6000_split_vec_extract_var (reg_small, operands[1], operands[2],
+				operands[3], operands[4]);
+  DONE;
+}
+  [(set_attr "isa" "p9v,*")])
+
+;; Variable V4SI vector extract when the vector is in memory, and combine with
+;; a sign or zero extend.
+(define_insn_and_split "*vsx_extract_v4si_<su><mode>_var_load"
+  [(set (match_operand:EXTSI 0 "gpc_reg_operand" "=r,wa")
+	(any_extend:EXTSI
+	 (unspec:V4SI
+	  [(match_operand:V4SI 1 "memory_operand" "Q,Q")
+	   (match_operand:DI 2 "gpc_reg_operand" "r,r")]
+	  UNSPEC_VSX_EXTRACT)))
+   (clobber (match_scratch:DI 3 "=&b,&b"))]
+  "VECTOR_MEM_VSX_P (V4SImode) && TARGET_DIRECT_MOVE_64BIT"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0)
+	(any_extend:EXTSI (match_dup 4)))]
+{
+  rtx reg_si = gen_rtx_REG (SImode, REGNO (operands[0]));
+  operands[4] = rs6000_adjust_vec_address (reg_si, operands[1], operands[2],
+					   operands[3], SImode);
+}
+  [(set_attr "type" "load,fpload")
+   (set_attr "length" "8")])
+
+;; Variable V8HI vector extract when the vector is in memory, and combine with
+;; a sign or zero extend.
+(define_insn_and_split "*vsx_extract_v8hi_<su><mode>_var_load"
+  [(set (match_operand:EXTHI 0 "gpc_reg_operand" "=r,v")
+	(any_extend:EXTHI
+	 (unspec:V8HI
+	  [(match_operand:V8HI 1 "memory_operand" "Q,Q")
+	   (match_operand:DI 2 "gpc_reg_operand" "r,r")]
+	  UNSPEC_VSX_EXTRACT)))
+   (clobber (match_scratch:DI 3 "=&b,&b"))]
+  "VECTOR_MEM_VSX_P (V4SImode) && TARGET_DIRECT_MOVE_64BIT"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0)
+	(any_extend:EXTHI (match_dup 4)))]
+{
+  rtx reg_hi = gen_rtx_REG (HImode, REGNO (operands[0]));
+  rtx mem = rs6000_adjust_vec_address (reg_hi, operands[1], operands[2],
+				       operands[3], HImode);
+
+  /* Altivec load HImode does not have a sign extend version.  */
+  if (int_reg_operand (operands[0], HImode) || <CODE> == ZERO_EXTEND)
+    operands[4] = mem;
+  else
+    {
+      emit_move_insn (reg_hi, mem);
+      operands[4] = reg_hi;
+    }
+}
+  [(set_attr "type" "load,fpload")
+   (set_attr "length" "8")
+   (set_attr "isa" "*,p9v")])
+
+;; Variable V16QI vector extract when the vector is in memory, and combine with
+;; a zero extend.  There is no sign extend version of load byte.
+(define_insn_and_split "*vsx_extract_v4si_u<mode>_var_load"
+  [(set (match_operand:EXTQI 0 "gpc_reg_operand" "=r,wa")
+	(any_extend:EXTQI
+	 (unspec:V16QI
+	  [(match_operand:V16QI 1 "memory_operand" "Q,Q")
+	   (match_operand:DI 2 "gpc_reg_operand" "r,r")]
+	  UNSPEC_VSX_EXTRACT)))
+   (clobber (match_scratch:DI 3 "=&b,&b"))]
+  "VECTOR_MEM_VSX_P (V16QImode) && TARGET_DIRECT_MOVE_64BIT"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 0)
+	(zero_extend:EXTQI (match_dup 4)))]
+{
+  rtx reg_qi = gen_rtx_REG (QImode, REGNO (operands[0]));
+  operands[4] = rs6000_adjust_vec_address (reg_qi, operands[1], operands[2],
+					   operands[3], QImode);
+}
+  [(set_attr "type" "load,fpload")
+   (set_attr "length" "8")
+   (set_attr "isa" "*,p9v")])
 
 ;; ISA 3.1 extract
 (define_expand "vextractl<mode>"
@@ -4300,6 +4563,248 @@
 }
   [(set_attr "isa" "<FL_CONV:VSisa>")])
 
+;; Optimize <type> f = (<ftype>) vec_extract (V4SI, <n>).
+;;
+;; <ftype> is a hardware floating point type that conversions are directly
+;; supported (SFmode, DFmode, KFmode, maybe TFmode).
+;;
+;; The element number (<n>) is constant.
+;;
+;; The vector is in memory, and we convert the vector extraction to a load to
+;; the VSX registers and then convert, avoiding a direct move.
+;;
+;; For SFmode/DFmode, we can use all vector registers.  For KFmode/TFmode, we
+;; have to use only the Altivec regsiters.
+(define_insn_and_split "*vsx_ext_v4si_fl_<mode>_load"
+  [(set (match_operand:FL_CONV 0 "gpc_reg_operand" "=<VSX_EX_FL>,<VSX_EX_FL>")
+	(float:FL_CONV
+	 (vec_select:SI
+	  (match_operand:V4SI 1 "memory_operand" "Z,Q")
+	  (parallel [(match_operand:QI 2 "const_0_to_3_operand" "O,n")]))))
+   (clobber (match_scratch:DI 3 "=&b,&b"))]
+  "VECTOR_MEM_VSX_P (V4SImode) && TARGET_DIRECT_MOVE_64BIT"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 4)
+	(sign_extend:DI (match_dup 5)))
+   (set (match_dup 0)
+	(float:<FL_CONV:MODE> (match_dup 4)))]
+{
+  rtx reg_si = gen_rtx_REG (SImode, reg_or_subregno (operands[0]));
+  operands[4] = gen_rtx_REG (DImode, reg_or_subregno (operands[0]));
+  operands[5] = rs6000_adjust_vec_address (reg_si, operands[1], operands[2],
+					   operands[3], SImode);
+}
+  [(set_attr "isa" "<FL_CONV:VSisa>")])
+
+(define_insn_and_split "*vsx_ext_v4si_ufl_<mode>_load"
+  [(set (match_operand:FL_CONV 0 "gpc_reg_operand" "=<VSX_EX_FL>,<VSX_EX_FL>")
+	(unsigned_float:FL_CONV
+	 (vec_select:SI
+	  (match_operand:V4SI 1 "memory_operand" "Z,Q")
+	  (parallel [(match_operand:QI 2 "const_0_to_3_operand" "O,n")]))))
+   (clobber (match_scratch:DI 3 "=&b,&b"))]
+  "VECTOR_MEM_VSX_P (V4SImode) && TARGET_DIRECT_MOVE_64BIT"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 4)
+	(zero_extend:DI (match_dup 5)))
+   (set (match_dup 0)
+	(unsigned_float:<FL_CONV:MODE> (match_dup 4)))]
+{
+  rtx reg_si = gen_rtx_REG (SImode, reg_or_subregno (operands[0]));
+  operands[4] = gen_rtx_REG (DImode, reg_or_subregno (operands[0]));
+  operands[5] = rs6000_adjust_vec_address (reg_si, operands[1], operands[2],
+					   operands[3], SImode);
+}
+  [(set_attr "isa" "<FL_CONV:VSisa>")])
+
+;; Optimize <type> f = (<ftype>) vec_extract (V4SI, <n>).
+;;
+;; <ftype> is a hardware floating point type that conversions are directly
+;; supported (SFmode, DFmode, KFmode, maybe TFmode).
+;;
+;; The element number (<n>) is variable.
+;;
+;; The vector is in memory, and we convert the vector extraction to a load to
+;; the VSX registers and then convert, avoiding a direct move.
+;;
+;; For SFmode/DFmode, we can use all vector registers.  For KFmode/TFmode, we
+;; have to use only the Altivec regsiters.
+(define_insn_and_split "*vsx_ext_v4si_fl_<mode>_var_load"
+  [(set (match_operand:FL_CONV 0 "gpc_reg_operand" "=<VSX_EX_FL>")
+	(float:FL_CONV
+	 (unspec:SI
+	  [(match_operand:V4SI 1 "memory_operand" "Q")
+	   (match_operand:DI 2 "gpc_reg_operand" "r")]
+	  UNSPEC_VSX_EXTRACT)))
+   (clobber (match_scratch:DI 3 "=&b"))]
+  "VECTOR_MEM_VSX_P (V4SImode) && TARGET_DIRECT_MOVE_64BIT"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 4)
+	(sign_extend:DI (match_dup 5)))
+   (set (match_dup 0)
+	(float:<FL_CONV:MODE> (match_dup 4)))]
+{
+  rtx reg_si = gen_rtx_REG (SImode, reg_or_subregno (operands[0]));
+  operands[4] = gen_rtx_REG (DImode, reg_or_subregno (operands[0]));
+  operands[5] = rs6000_adjust_vec_address (reg_si, operands[1], operands[2],
+					   operands[3], SImode);
+}
+  [(set_attr "isa" "<FL_CONV:VSisa>")])
+
+(define_insn_and_split "*vsx_ext_v4si_ufl_<mode>_var_load"
+  [(set (match_operand:FL_CONV 0 "gpc_reg_operand" "=<VSX_EX_FL>")
+	(unsigned_float:FL_CONV
+	 (unspec:SI
+	  [(match_operand:V4SI 1 "memory_operand" "Q")
+	   (match_operand:DI 2 "gpc_reg_operand" "r")]
+	  UNSPEC_VSX_EXTRACT)))
+   (clobber (match_scratch:DI 3 "=&b"))]
+  "VECTOR_MEM_VSX_P (V4SImode) && TARGET_DIRECT_MOVE_64BIT"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 4)
+	(zero_extend:DI (match_dup 5)))
+   (set (match_dup 0)
+	(unsigned_float:<FL_CONV:MODE> (match_dup 4)))]
+{
+  rtx reg_si = gen_rtx_REG (SImode, reg_or_subregno (operands[0]));
+  operands[4] = gen_rtx_REG (DImode, reg_or_subregno (operands[0]));
+  operands[5] = rs6000_adjust_vec_address (reg_si, operands[1], operands[2],
+					   operands[3], SImode);
+}
+  [(set_attr "isa" "<FL_CONV:VSisa>")])
+
+;; Optimize <type> f = (<ftype>) vec_extract (V8HI/V16QI, <n>).
+;;
+;; <ftype> is a hardware floating point type that conversions are directly
+;; supported (SFmode, DFmode, KFmode, maybe TFmode).
+;;
+;; The element number (<n>) is constant.
+;;
+;; The vector is in memory, and we convert the vector extraction to a load to
+;; the VSX registers and then convert, avoiding a direct move.
+;;
+;; For SFmode/DFmode, we can use all vector registers.  For KFmode/TFmode, we
+;; have to use only the Altivec regsiters.
+(define_insn_and_split "*vsx_ext_<VSX_EXTRACT_I2:mode>_fl_<FL_CONV:mode>_load"
+  [(set (match_operand:FL_CONV 0 "gpc_reg_operand"
+				"=<FL_CONV:VSX_EX_FL>,<FL_CONV:VSX_EX_FL>")
+	(float:FL_CONV
+	 (vec_select:<VSX_EXTRACT_I2:VS_scalar>
+	  (match_operand:VSX_EXTRACT_I2 1 "memory_operand" "Z,Q")
+	  (parallel [(match_operand:QI 2 "<VSX_EXTRACT_PREDICATE>" "O,n")]))))
+   (clobber (match_scratch:DI 3 "=&b,&b"))
+   (clobber (match_scratch:DI 4 "=v,v"))]
+  "VECTOR_MEM_VSX_P (<VSX_EXTRACT_I2:MODE>mode) && TARGET_POWERPC64
+   && TARGET_P9_VECTOR"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 5)
+	(match_dup 6))
+   (set (match_dup 4)
+	(sign_extend:DI (match_dup 5)))
+   (set (match_dup 0)
+	(float:<FL_CONV:MODE> (match_dup 4)))]
+{
+  machine_mode smode = <VSX_EXTRACT_I2:VS_scalar>mode;
+  operands[5] = gen_rtx_REG (smode, reg_or_subregno (operands[4]));
+  operands[6] = rs6000_adjust_vec_address (operands[5], operands[1],
+					   operands[2], operands[3],
+					   smode);
+})
+
+(define_insn_and_split "*vsx_ext_<VSX_EXTRACT_I2:mode>_ufl_<FL_CONV:mode>_load"
+  [(set (match_operand:FL_CONV 0 "gpc_reg_operand"
+				"=<FL_CONV:VSX_EX_FL>,<FL_CONV:VSX_EX_FL>")
+	(unsigned_float:FL_CONV
+	 (vec_select:<VSX_EXTRACT_I2:VS_scalar>
+	  (match_operand:VSX_EXTRACT_I2 1 "memory_operand" "Z,Q")
+	  (parallel [(match_operand:QI 2 "<VSX_EXTRACT_PREDICATE>" "O,n")]))))
+   (clobber (match_scratch:DI 3 "=&b,&b"))]
+  "VECTOR_MEM_VSX_P (<VSX_EXTRACT_I2:MODE>mode) && TARGET_POWERPC64
+   && TARGET_P9_VECTOR"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 4)
+	(zero_extend:DI (match_dup 5)))
+   (set (match_dup 0)
+	(unsigned_float:<FL_CONV:MODE> (match_dup 4)))]
+{
+  machine_mode smode = <VSX_EXTRACT_I2:VS_scalar>mode;
+  rtx reg_small = gen_rtx_REG (smode, reg_or_subregno (operands[0]));
+  operands[4] = gen_rtx_REG (DImode, reg_or_subregno (operands[0]));
+  operands[5] = rs6000_adjust_vec_address (reg_small, operands[1],
+					   operands[2], operands[3],
+					   smode);
+})
+
+;; Optimize <type> f = (<ftype>) vec_extract (V8HI/V16QI, <n>).
+;;
+;; <ftype> is a hardware floating point type that conversions are directly
+;; supported (SFmode, DFmode, KFmode, maybe TFmode).
+;;
+;; The element number (<n>) is variable.
+;;
+;; The vector is in memory, and we convert the vector extraction to a load to
+;; the VSX registers and then convert, avoiding a direct move.
+;;
+;; For SFmode/DFmode, we can use all vector registers.  For KFmode/TFmode, we
+;; have to use only the Altivec regsiters.
+(define_insn_and_split "*vsx_ext_<VSX_EXTRACT_I2:mode>_fl_<FL_CONV:mode>_vl"
+  [(set (match_operand:FL_CONV 0 "gpc_reg_operand" "=<FL_CONV:VSX_EX_FL>")
+	(float:FL_CONV
+	 (unspec:<VSX_EXTRACT_I2:VS_scalar>
+	  [(match_operand:VSX_EXTRACT_I2 1 "memory_operand" "Q")
+	   (match_operand:DI 2 "gpc_reg_operand" "r")]
+	  UNSPEC_VSX_EXTRACT)))
+   (clobber (match_scratch:DI 3 "=&b"))
+   (clobber (match_scratch:DI 4 "=v"))]
+  "VECTOR_MEM_VSX_P (<VSX_EXTRACT_I2:MODE>mode) && TARGET_POWERPC64
+   && TARGET_P9_VECTOR"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 5)
+	(match_dup 6))
+   (set (match_dup 4)
+	(sign_extend:DI (match_dup 5)))
+   (set (match_dup 0)
+	(float:<FL_CONV:MODE> (match_dup 4)))]
+{
+  machine_mode smode = <VSX_EXTRACT_I2:VS_scalar>mode;
+  operands[5] = gen_rtx_REG (smode, reg_or_subregno (operands[4]));
+  operands[6] = rs6000_adjust_vec_address (operands[5], operands[1],
+					   operands[2], operands[3],
+					   smode);
+})
+
+(define_insn_and_split "*vsx_ext_<VSX_EXTRACT_I2:mode>_ufl_<FL_CONV:mode>_vl"
+  [(set (match_operand:FL_CONV 0 "gpc_reg_operand" "=<FL_CONV:VSX_EX_FL>")
+	(unsigned_float:FL_CONV
+	 (unspec:<VSX_EXTRACT_I2:VS_scalar>
+	  [(match_operand:VSX_EXTRACT_I2 1 "memory_operand" "Q")
+	   (match_operand:DI 2 "gpc_reg_operand" "r")]
+	  UNSPEC_VSX_EXTRACT)))
+   (clobber (match_scratch:DI 3 "=&b"))]
+  "VECTOR_MEM_VSX_P (<VSX_EXTRACT_I2:MODE>mode) && TARGET_POWERPC64
+   && TARGET_P9_VECTOR"
+  "#"
+  "&& reload_completed"
+  [(set (match_dup 4)
+	(zero_extend:DI (match_dup 5)))
+   (set (match_dup 0)
+	(unsigned_float:<FL_CONV:MODE> (match_dup 4)))]
+{
+  machine_mode smode = <VSX_EXTRACT_I2:VS_scalar>mode;
+  rtx reg_small = gen_rtx_REG (smode, reg_or_subregno (operands[0]));
+  operands[4] = gen_rtx_REG (DImode, reg_or_subregno (operands[0]));
+  operands[5] = rs6000_adjust_vec_address (reg_small, operands[1],
+					   operands[2], operands[3],
+					   smode);
+})
+
 ;; V4SI/V8HI/V16QI set operation on ISA 3.0
 (define_insn "vsx_set_<mode>_p9"
   [(set (match_operand:VSX_EXTRACT_I 0 "gpc_reg_operand" "=<VSX_EX>")
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p8.c b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p8.c
index f3b9556b2e6..555be18a3ea 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p8.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-char.p8.c
@@ -21,7 +21,7 @@
 /* { dg-final { scan-assembler-times {\msrdi\M} 3 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "extsb" 2 } } */
 /* { dg-final { scan-assembler-times {\mvspltb\M} 3 { target lp64 } } } */
-/* { dg-final { scan-assembler-times {\mrlwinm\M} 4 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mrlwinm\M} 2 { target lp64 } } } */
 
 /* multiple codegen variations for -m32. */
 /* { dg-final { scan-assembler-times {\mrlwinm\M} 3 { target ilp32 } } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p8.c b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p8.c
index 75eaf25943b..c9e9a26ab06 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p8.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-int.p8.c
@@ -7,14 +7,14 @@
 
 // Targeting P8 (LE) and (BE).  6 tests total.
 // P8 LE constant:  vspltw, mfvsrwz, (1:extsw/2:rldicl)
-// P8 LE variables: subfic,  sldi, mtvsrd, xxpermdi, vslo, mfvsrd, sradi, (1:extsw/5:rldicl))
+// P8 LE variables: subfic,  sldi, mtvsrd, xxpermdi, vslo, mfvsrd, sradi, (1:extsw/2:rldicl))
 // P8 BE constant:  vspltw, mfvsrwz, (1:extsw/2:rldicl)
 // P8 BE variables:                  sldi, mtvsrd, xxpermdi, vslo, mfvsrd, sradi, (1:extsw/2:rldicl))
 
 /* { dg-final { scan-assembler-times {\mvspltw\M} 3 { target lp64 } } } */
 /* { dg-final { scan-assembler-times {\mmfvsrwz\M} 3 { target lp64 } } } */
-/* { dg-final { scan-assembler-times {\mrldicl\M} 7 { target { le } } } } */
-/* { dg-final { scan-assembler-times {\mrldicl\M} 4 { target { lp64 && be } } } } */
+/* { dg-final { scan-assembler-times {\mrldicl\M} 5 { target { le } } } } */
+/* { dg-final { scan-assembler-times {\mrldicl\M} 2 { target { lp64 && be } } } } */
 /* { dg-final { scan-assembler-times {\msubfic\M} 3 { target { le } } } } */
 /* { dg-final { scan-assembler-times {\msldi\M} 3  { target lp64 } } } */
 /* { dg-final { scan-assembler-times {\mmtvsrd\M} 3 { target lp64 } } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-short.p8.c b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-short.p8.c
index 0ddecb4e4b5..2daebb86f21 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-short.p8.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-extract-short.p8.c
@@ -24,7 +24,7 @@
 /* { dg-final { scan-assembler-times "mfvsrd" 6 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "srdi" 3 { target lp64 } } } */
 /* { dg-final { scan-assembler-times "extsh" 2 { target lp64 } } } */
-/* { dg-final { scan-assembler-times "rlwinm" 4 { target lp64 } } } */
+/* { dg-final { scan-assembler-times "rlwinm" 2 { target lp64 } } } */
 
 /* -m32 codegen tests. */
 /* { dg-final { scan-assembler-times {\mli\M} 6 { target ilp32 } } } */


^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2021-06-03 17:52 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-03 17:52 [gcc(refs/users/meissner/heads/work054)] PR 93230: Fold sign/zero extension into vec_extract Michael Meissner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).