public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH 0/4] rs6000: Enable variable vec_insert with IFN VEC_SET
@ 2020-10-10  8:08 Xionghu Luo
  2020-10-10  8:08 ` [PATCH 1/4] rs6000: Change rs6000_expand_vector_set param Xionghu Luo
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Xionghu Luo @ 2020-10-10  8:08 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc, wschmidt, guojiufu, linkw, Xionghu Luo

Originated from
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554240.html
with patch split and some refinement per review comments.

Patch of IFN VEC_SET for ARRAY_REF(VIEW_CONVERT_EXPR) is committed,
this patch set enables expanding IFN VEC_SET for Power9 and Power8
with specfic instruction sequences.

Xionghu Luo (4):
  rs6000: Change rs6000_expand_vector_set param
  rs6000: Support variable insert and Expand vec_insert in expander [PR79251]
  rs6000: Enable vec_insert for P8 with rs6000_expand_vector_set_var_p8
  rs6000: Update testcases' instruction count

 gcc/config/rs6000/rs6000-c.c                  |  44 +++--
 gcc/config/rs6000/rs6000-call.c               |   2 +-
 gcc/config/rs6000/rs6000-protos.h             |   3 +-
 gcc/config/rs6000/rs6000.c                    | 181 +++++++++++++++++-
 gcc/config/rs6000/vector.md                   |   4 +-
 .../powerpc/fold-vec-insert-char-p8.c         |   8 +-
 .../powerpc/fold-vec-insert-char-p9.c         |  12 +-
 .../powerpc/fold-vec-insert-double.c          |  11 +-
 .../powerpc/fold-vec-insert-float-p8.c        |   6 +-
 .../powerpc/fold-vec-insert-float-p9.c        |  10 +-
 .../powerpc/fold-vec-insert-int-p8.c          |   6 +-
 .../powerpc/fold-vec-insert-int-p9.c          |  11 +-
 .../powerpc/fold-vec-insert-longlong.c        |  10 +-
 .../powerpc/fold-vec-insert-short-p8.c        |   6 +-
 .../powerpc/fold-vec-insert-short-p9.c        |   8 +-
 .../gcc.target/powerpc/pr79251-run.c          |  28 +++
 gcc/testsuite/gcc.target/powerpc/pr79251.h    |  19 ++
 gcc/testsuite/gcc.target/powerpc/pr79251.p8.c |  17 ++
 gcc/testsuite/gcc.target/powerpc/pr79251.p9.c |  18 ++
 .../gcc.target/powerpc/vsx-builtin-7.c        |   4 +-
 20 files changed, 337 insertions(+), 71 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251-run.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251.p8.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251.p9.c

-- 
2.25.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/4] rs6000: Change rs6000_expand_vector_set param
  2020-10-10  8:08 [PATCH 0/4] rs6000: Enable variable vec_insert with IFN VEC_SET Xionghu Luo
@ 2020-10-10  8:08 ` Xionghu Luo
  2020-11-24 19:44   ` Segher Boessenkool
  2020-10-10  8:08 ` [PATCH 2/4] rs6000: Support variable insert and Expand vec_insert in expander [PR79251] Xionghu Luo
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: Xionghu Luo @ 2020-10-10  8:08 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc, wschmidt, guojiufu, linkw, Xionghu Luo

rs6000_expand_vector_set could accept insert either to constant position
or variable position, so change the operand to reg_or_cint_operand.

gcc/ChangeLog:

2020-10-10  Xionghu Luo  <luoxhu@linux.ibm.com>

	* config/rs6000/rs6000-call.c (altivec_expand_vec_set_builtin):
	Change call param 2 from type int to rtx.
	* config/rs6000/rs6000-protos.h (rs6000_expand_vector_set):
	Likewise.
	* config/rs6000/rs6000.c (rs6000_expand_vector_init):
	Change call param 2 from type int to rtx.
	(rs6000_expand_vector_set): Likewise.
	* config/rs6000/vector.md (vec_set<mode>): Support both constant
	and variable index vec_set.
---
 gcc/config/rs6000/rs6000-call.c   |  2 +-
 gcc/config/rs6000/rs6000-protos.h |  2 +-
 gcc/config/rs6000/rs6000.c        | 16 +++++++++-------
 gcc/config/rs6000/vector.md       |  4 ++--
 4 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
index a8b520834c7..2608a2a0797 100644
--- a/gcc/config/rs6000/rs6000-call.c
+++ b/gcc/config/rs6000/rs6000-call.c
@@ -10655,7 +10655,7 @@ altivec_expand_vec_set_builtin (tree exp)
   op0 = force_reg (tmode, op0);
   op1 = force_reg (mode1, op1);
 
-  rs6000_expand_vector_set (op0, op1, elt);
+  rs6000_expand_vector_set (op0, op1, GEN_INT (elt));
 
   return op0;
 }
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 25fa5dd57cd..3578136e79b 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -57,7 +57,7 @@ extern bool rs6000_move_128bit_ok_p (rtx []);
 extern bool rs6000_split_128bit_ok_p (rtx []);
 extern void rs6000_expand_float128_convert (rtx, rtx, bool);
 extern void rs6000_expand_vector_init (rtx, rtx);
-extern void rs6000_expand_vector_set (rtx, rtx, int);
+extern void rs6000_expand_vector_set (rtx, rtx, rtx);
 extern void rs6000_expand_vector_extract (rtx, rtx, rtx);
 extern void rs6000_split_vec_extract_var (rtx, rtx, rtx, rtx, rtx);
 extern rtx rs6000_adjust_vec_address (rtx, rtx, rtx, rtx, machine_mode);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 375fff59928..a5b59395abd 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6678,7 +6678,8 @@ rs6000_expand_vector_init (rtx target, rtx vals)
       rs6000_expand_vector_init (target, copy);
 
       /* Insert variable.  */
-      rs6000_expand_vector_set (target, XVECEXP (vals, 0, one_var), one_var);
+      rs6000_expand_vector_set (target, XVECEXP (vals, 0, one_var),
+				GEN_INT (one_var));
       return;
     }
 
@@ -6692,10 +6693,10 @@ rs6000_expand_vector_init (rtx target, rtx vals)
   emit_move_insn (target, mem);
 }
 
-/* Set field ELT of TARGET to VAL.  */
+/* Set field ELT_RTX of TARGET to VAL.  */
 
 void
-rs6000_expand_vector_set (rtx target, rtx val, int elt)
+rs6000_expand_vector_set (rtx target, rtx val, rtx elt_rtx)
 {
   machine_mode mode = GET_MODE (target);
   machine_mode inner_mode = GET_MODE_INNER (mode);
@@ -6709,7 +6710,6 @@ rs6000_expand_vector_set (rtx target, rtx val, int elt)
   if (VECTOR_MEM_VSX_P (mode))
     {
       rtx insn = NULL_RTX;
-      rtx elt_rtx = GEN_INT (elt);
 
       if (mode == V2DFmode)
 	insn = gen_vsx_set_v2df (target, target, val, elt_rtx);
@@ -6736,8 +6736,11 @@ rs6000_expand_vector_set (rtx target, rtx val, int elt)
 	}
     }
 
+  gcc_assert (CONST_INT_P (elt_rtx));
+
   /* Simplify setting single element vectors like V1TImode.  */
-  if (GET_MODE_SIZE (mode) == GET_MODE_SIZE (inner_mode) && elt == 0)
+  if (GET_MODE_SIZE (mode) == GET_MODE_SIZE (inner_mode)
+      && INTVAL (elt_rtx) == 0)
     {
       emit_move_insn (target, gen_lowpart (mode, val));
       return;
@@ -6760,8 +6763,7 @@ rs6000_expand_vector_set (rtx target, rtx val, int elt)
 
   /* Set permute mask to insert element into target.  */
   for (i = 0; i < width; ++i)
-    XVECEXP (mask, 0, elt*width + i)
-      = GEN_INT (i + 0x10);
+    XVECEXP (mask, 0, INTVAL (elt_rtx) * width + i) = GEN_INT (i + 0x10);
   x = gen_rtx_CONST_VECTOR (V16QImode, XVEC (mask, 0));
 
   if (BYTES_BIG_ENDIAN)
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 796345c80d3..7aab1887cf5 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -1227,10 +1227,10 @@ (define_expand "vec_init<mode><VEC_base_l>"
 (define_expand "vec_set<mode>"
   [(match_operand:VEC_E 0 "vlogical_operand")
    (match_operand:<VEC_base> 1 "register_operand")
-   (match_operand 2 "const_int_operand")]
+   (match_operand 2 "reg_or_cint_operand")]
   "VECTOR_MEM_ALTIVEC_OR_VSX_P (<MODE>mode)"
 {
-  rs6000_expand_vector_set (operands[0], operands[1], INTVAL (operands[2]));
+  rs6000_expand_vector_set (operands[0], operands[1], operands[2]);
   DONE;
 })
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 2/4] rs6000: Support variable insert and Expand vec_insert in expander [PR79251]
  2020-10-10  8:08 [PATCH 0/4] rs6000: Enable variable vec_insert with IFN VEC_SET Xionghu Luo
  2020-10-10  8:08 ` [PATCH 1/4] rs6000: Change rs6000_expand_vector_set param Xionghu Luo
@ 2020-10-10  8:08 ` Xionghu Luo
  2020-11-24 22:37   ` Segher Boessenkool
  2020-10-10  8:08 ` [PATCH 3/4] rs6000: Enable vec_insert for P8 with rs6000_expand_vector_set_var_p8 Xionghu Luo
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: Xionghu Luo @ 2020-10-10  8:08 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc, wschmidt, guojiufu, linkw, Xionghu Luo

vec_insert accepts 3 arguments, arg0 is input vector, arg1 is the value
to be insert, arg2 is the place to insert arg1 to arg0.  Current expander
generates stxv+stwx+lxv if arg2 is variable instead of constant, which
causes serious store hit load performance issue on Power.  This patch tries
 1) Build VIEW_CONVERT_EXPR for vec_insert (i, v, n) like v[n&3] = i to
unify the gimple code, then expander could use vec_set_optab to expand.
 2) Expand the IFN VEC_SET to fast instructions: lvsr+insert+lvsl.
In this way, "vec_insert (i, v, n)" and "v[n&3] = i" won't be expanded too
early in gimple stage if arg2 is variable, avoid generating store hit load
instructions.

For Power9 V4SI:
	addi 9,1,-16
	rldic 6,6,2,60
	stxv 34,-16(1)
	stwx 5,9,6
	lxv 34,-16(1)
=>
	rlwinm 6,6,2,28,29
	mtvsrwz 0,5
	lvsr 1,0,6
	lvsl 0,0,6
	xxperm 34,34,33
	xxinsertw 34,0,12
	xxperm 34,34,32

Though instructions increase from 5 to 7, the performance is improved
60% in typical cases.
Tested with V2DI, V2DF V4SI, V4SF, V8HI, V16QI on Power9-LE.

gcc/ChangeLog:

2020-10-10  Xionghu Luo  <luoxhu@linux.ibm.com>

	* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
	Ajdust variable index vec_insert from address dereference to
	ARRAY_REF(VIEW_CONVERT_EXPR) tree expression.
	* config/rs6000/rs6000-protos.h (rs6000_expand_vector_set_var):
	New declaration.
	* config/rs6000/rs6000.c (rs6000_expand_vector_set_var): New function.
	* config/rs6000/vector.md (vec_set<mode>): Support both constant
	and variable index vec_set.

gcc/testsuite/ChangeLog:

2020-10-10  Xionghu Luo  <luoxhu@linux.ibm.com>

	* gcc.target/powerpc/pr79251.p9.c: New test.
	* gcc.target/powerpc/pr79251-run.c: New test.
	* gcc.target/powerpc/pr79251.h: New header.
---
 gcc/config/rs6000/rs6000-c.c                  | 25 ++++-----
 gcc/config/rs6000/rs6000-protos.h             |  1 +
 gcc/config/rs6000/rs6000.c                    | 53 +++++++++++++++++++
 .../gcc.target/powerpc/pr79251-run.c          | 28 ++++++++++
 gcc/testsuite/gcc.target/powerpc/pr79251.h    | 19 +++++++
 gcc/testsuite/gcc.target/powerpc/pr79251.p9.c | 18 +++++++
 6 files changed, 130 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251-run.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251.h
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251.p9.c

diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index cc1e997524e..5551a21d738 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -1512,9 +1512,7 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,
       tree arg1;
       tree arg2;
       tree arg1_type;
-      tree arg1_inner_type;
       tree decl, stmt;
-      tree innerptrtype;
       machine_mode mode;
 
       /* No second or third arguments. */
@@ -1566,8 +1564,13 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,
 	  return build_call_expr (call, 3, arg1, arg0, arg2);
 	}
 
-      /* Build *(((arg1_inner_type*)&(vector type){arg1})+arg2) = arg0. */
-      arg1_inner_type = TREE_TYPE (arg1_type);
+      /* Build *(((arg1_inner_type*)&(vector type){arg1})+arg2) = arg0 with
+	 VIEW_CONVERT_EXPR.  i.e.:
+	 D.3192 = v1;
+	 _1 = n & 3;
+	 VIEW_CONVERT_EXPR<int[4]>(D.3192)[_1] = i;
+	 v1 = D.3192;
+	 D.3194 = v1;  */
       if (TYPE_VECTOR_SUBPARTS (arg1_type) == 1)
 	arg2 = build_int_cst (TREE_TYPE (arg2), 0);
       else
@@ -1582,6 +1585,7 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,
       TREE_USED (decl) = 1;
       TREE_TYPE (decl) = arg1_type;
       TREE_READONLY (decl) = TYPE_READONLY (arg1_type);
+      TREE_ADDRESSABLE (decl) = 1;
       if (c_dialect_cxx ())
 	{
 	  stmt = build4 (TARGET_EXPR, arg1_type, decl, arg1,
@@ -1592,19 +1596,12 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,
 	{
 	  DECL_INITIAL (decl) = arg1;
 	  stmt = build1 (DECL_EXPR, arg1_type, decl);
-	  TREE_ADDRESSABLE (decl) = 1;
 	  SET_EXPR_LOCATION (stmt, loc);
 	  stmt = build1 (COMPOUND_LITERAL_EXPR, arg1_type, stmt);
 	}
-
-      innerptrtype = build_pointer_type (arg1_inner_type);
-
-      stmt = build_unary_op (loc, ADDR_EXPR, stmt, 0);
-      stmt = convert (innerptrtype, stmt);
-      stmt = build_binary_op (loc, PLUS_EXPR, stmt, arg2, 1);
-      stmt = build_indirect_ref (loc, stmt, RO_NULL);
-      stmt = build2 (MODIFY_EXPR, TREE_TYPE (stmt), stmt,
-		     convert (TREE_TYPE (stmt), arg0));
+      stmt = build_array_ref (loc, stmt, arg2);
+      stmt = fold_build2 (MODIFY_EXPR, TREE_TYPE (arg0), stmt,
+			  convert (TREE_TYPE (stmt), arg0));
       stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
       return stmt;
     }
diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index 3578136e79b..4b6131a5145 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -58,6 +58,7 @@ extern bool rs6000_split_128bit_ok_p (rtx []);
 extern void rs6000_expand_float128_convert (rtx, rtx, bool);
 extern void rs6000_expand_vector_init (rtx, rtx);
 extern void rs6000_expand_vector_set (rtx, rtx, rtx);
+extern void rs6000_expand_vector_set_var (rtx, rtx, rtx);
 extern void rs6000_expand_vector_extract (rtx, rtx, rtx);
 extern void rs6000_split_vec_extract_var (rtx, rtx, rtx, rtx, rtx);
 extern rtx rs6000_adjust_vec_address (rtx, rtx, rtx, rtx, machine_mode);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index a5b59395abd..96f76c7a74c 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6709,6 +6709,12 @@ rs6000_expand_vector_set (rtx target, rtx val, rtx elt_rtx)
 
   if (VECTOR_MEM_VSX_P (mode))
     {
+      if (!CONST_INT_P (elt_rtx))
+	{
+	  rs6000_expand_vector_set_var (target, val, elt_rtx);
+	  return;
+	}
+
       rtx insn = NULL_RTX;
 
       if (mode == V2DFmode)
@@ -6799,6 +6805,53 @@ rs6000_expand_vector_set (rtx target, rtx val, rtx elt_rtx)
   emit_insn (gen_rtx_SET (target, x));
 }
 
+/* Insert VAL into IDX of TARGET, VAL size is same of the vector element, IDX
+   is variable and also counts by vector element size.  */
+
+void
+rs6000_expand_vector_set_var (rtx target, rtx val, rtx idx)
+{
+  machine_mode mode = GET_MODE (target);
+
+  gcc_assert (VECTOR_MEM_VSX_P (mode) && !CONST_INT_P (idx));
+
+  gcc_assert (GET_MODE (idx) == E_SImode);
+
+  machine_mode inner_mode = GET_MODE (val);
+
+  rtx tmp = gen_reg_rtx (GET_MODE (idx));
+  int width = GET_MODE_SIZE (inner_mode);
+
+  gcc_assert (width >= 1 && width <= 8);
+
+  int shift = exact_log2 (width);
+  /* Generate the IDX for permute shift, width is the vector element size.
+     idx = idx * width.  */
+  emit_insn (gen_ashlsi3 (tmp, idx, GEN_INT (shift)));
+
+  tmp = convert_modes (DImode, SImode, tmp, 1);
+
+  /*  lvsr    v1,0,idx.  */
+  rtx pcvr = gen_reg_rtx (V16QImode);
+  emit_insn (gen_altivec_lvsr_reg (pcvr, tmp));
+
+  /*  lvsl    v2,0,idx.  */
+  rtx pcvl = gen_reg_rtx (V16QImode);
+  emit_insn (gen_altivec_lvsl_reg (pcvl, tmp));
+
+  rtx sub_target = simplify_gen_subreg (V16QImode, target, mode, 0);
+
+  rtx permr
+    = gen_altivec_vperm_v8hiv16qi (sub_target, sub_target, sub_target, pcvr);
+  emit_insn (permr);
+
+  rs6000_expand_vector_set (target, val, const0_rtx);
+
+  rtx perml
+    = gen_altivec_vperm_v8hiv16qi (sub_target, sub_target, sub_target, pcvl);
+  emit_insn (perml);
+}
+
 /* Extract field ELT from VEC into TARGET.  */
 
 void
diff --git a/gcc/testsuite/gcc.target/powerpc/pr79251-run.c b/gcc/testsuite/gcc.target/powerpc/pr79251-run.c
new file mode 100644
index 00000000000..08f69df1146
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr79251-run.c
@@ -0,0 +1,28 @@
+/* { dg-options "-O2 -maltivec" } */
+
+#include <stddef.h>
+#include <altivec.h>
+#include "pr79251.h"
+
+TEST_VEC_INSERT_ALL (test)
+
+#define run_test(TYPE, num)                                                    \
+  {                                                                            \
+    vector TYPE v;                                                             \
+    vector TYPE u = {0x0};                                                     \
+    for (long k = 0; k < 16 / sizeof (TYPE); k++)                              \
+      v[k] = 0xaa;                                                             \
+    for (long k = 0; k < 16 / sizeof (TYPE); k++)                              \
+      {                                                                        \
+	u = test##num (v, 254, k);                                             \
+	if (u[k] != (TYPE) 254)                                                \
+	  __builtin_abort ();                                                  \
+      }                                                                        \
+  }
+
+int
+main (void)
+{
+  TEST_VEC_INSERT_ALL (run_test)
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/pr79251.h b/gcc/testsuite/gcc.target/powerpc/pr79251.h
new file mode 100644
index 00000000000..addb067f9ed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr79251.h
@@ -0,0 +1,19 @@
+
+#define test(TYPE, num)                                                        \
+  __attribute__ ((noinline, noclone))                                          \
+    vector TYPE test##num (vector TYPE v, TYPE i, signed int n)                \
+  {                                                                            \
+    return vec_insert (i, v, n);                                               \
+  }
+
+#define TEST_VEC_INSERT_ALL(T)                                                 \
+  T (char, 0)                                                                  \
+  T (unsigned char, 1)                                                         \
+  T (short, 2)                                                                 \
+  T (unsigned short, 3)                                                        \
+  T (int, 4)                                                                   \
+  T (unsigned int, 5)                                                          \
+  T (long long, 6)                                                             \
+  T (unsigned long long, 7)                                                    \
+  T (float, 8)                                                                 \
+  T (double, 9)
diff --git a/gcc/testsuite/gcc.target/powerpc/pr79251.p9.c b/gcc/testsuite/gcc.target/powerpc/pr79251.p9.c
new file mode 100644
index 00000000000..ec1cb255888
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr79251.p9.c
@@ -0,0 +1,18 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power9 -maltivec" } */
+
+#include <stddef.h>
+#include <altivec.h>
+#include "pr79251.h"
+
+TEST_VEC_INSERT_ALL (test)
+
+/* { dg-final { scan-assembler-not {\mstxw\M} } } */
+/* { dg-final { scan-assembler-times {\mlvsl\M} 10 } } */
+/* { dg-final { scan-assembler-times {\mlvsr\M} 10 } } */
+/* { dg-final { scan-assembler-times {\mxxperm\M} 20 } } */
+/* { dg-final { scan-assembler-times {\mxxinsertw\M} 3 } } */
+/* { dg-final { scan-assembler-times {\mvinserth\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mvinsertb\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mxxpermdi\M} 3 } } */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 3/4] rs6000: Enable vec_insert for P8 with rs6000_expand_vector_set_var_p8
  2020-10-10  8:08 [PATCH 0/4] rs6000: Enable variable vec_insert with IFN VEC_SET Xionghu Luo
  2020-10-10  8:08 ` [PATCH 1/4] rs6000: Change rs6000_expand_vector_set param Xionghu Luo
  2020-10-10  8:08 ` [PATCH 2/4] rs6000: Support variable insert and Expand vec_insert in expander [PR79251] Xionghu Luo
@ 2020-10-10  8:08 ` Xionghu Luo
  2020-11-27  1:04   ` Xionghu Luo
  2021-01-21 23:48   ` Segher Boessenkool
  2020-10-10  8:08 ` [PATCH 4/4] rs6000: Update testcases' instruction count Xionghu Luo
  2020-11-05  1:34 ` Ping: [PATCH 0/4] rs6000: Enable variable vec_insert with IFN VEC_SET Xionghu Luo
  4 siblings, 2 replies; 21+ messages in thread
From: Xionghu Luo @ 2020-10-10  8:08 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc, wschmidt, guojiufu, linkw, Xionghu Luo

gcc/ChangeLog:

2020-10-10  Xionghu Luo  <luoxhu@linux.ibm.com>

	* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
	Generate ARRAY_REF(VIEW_CONVERT_EXPR) for P8 and later
	platforms.
	* config/rs6000/rs6000.c (rs6000_expand_vector_set_var): Update
	to call different path for P8 and P9.
	(rs6000_expand_vector_set_var_p9): New function.
	(rs6000_expand_vector_set_var_p8): New function.

gcc/testsuite/ChangeLog:

2020-10-10  Xionghu Luo  <luoxhu@linux.ibm.com>

	* gcc.target/powerpc/pr79251.p8.c: New test.
---
 gcc/config/rs6000/rs6000-c.c                  |  27 +++-
 gcc/config/rs6000/rs6000.c                    | 117 +++++++++++++++++-
 gcc/testsuite/gcc.target/powerpc/pr79251.p8.c |  17 +++
 3 files changed, 155 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251.p8.c

diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index 5551a21d738..4bea8001ec6 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -1599,10 +1599,29 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,
 	  SET_EXPR_LOCATION (stmt, loc);
 	  stmt = build1 (COMPOUND_LITERAL_EXPR, arg1_type, stmt);
 	}
-      stmt = build_array_ref (loc, stmt, arg2);
-      stmt = fold_build2 (MODIFY_EXPR, TREE_TYPE (arg0), stmt,
-			  convert (TREE_TYPE (stmt), arg0));
-      stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
+
+      if (TARGET_P8_VECTOR)
+	{
+	  stmt = build_array_ref (loc, stmt, arg2);
+	  stmt = fold_build2 (MODIFY_EXPR, TREE_TYPE (arg0), stmt,
+			      convert (TREE_TYPE (stmt), arg0));
+	  stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
+	}
+      else
+	{
+	  tree arg1_inner_type;
+	  tree innerptrtype;
+	  arg1_inner_type = TREE_TYPE (arg1_type);
+	  innerptrtype = build_pointer_type (arg1_inner_type);
+
+	  stmt = build_unary_op (loc, ADDR_EXPR, stmt, 0);
+	  stmt = convert (innerptrtype, stmt);
+	  stmt = build_binary_op (loc, PLUS_EXPR, stmt, arg2, 1);
+	  stmt = build_indirect_ref (loc, stmt, RO_NULL);
+	  stmt = build2 (MODIFY_EXPR, TREE_TYPE (stmt), stmt,
+			 convert (TREE_TYPE (stmt), arg0));
+	  stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
+	}
       return stmt;
     }
 
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 96f76c7a74c..33ca839cb28 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -6806,10 +6806,10 @@ rs6000_expand_vector_set (rtx target, rtx val, rtx elt_rtx)
 }
 
 /* Insert VAL into IDX of TARGET, VAL size is same of the vector element, IDX
-   is variable and also counts by vector element size.  */
+   is variable and also counts by vector element size for p9 and above.  */
 
 void
-rs6000_expand_vector_set_var (rtx target, rtx val, rtx idx)
+rs6000_expand_vector_set_var_p9 (rtx target, rtx val, rtx idx)
 {
   machine_mode mode = GET_MODE (target);
 
@@ -6852,6 +6852,119 @@ rs6000_expand_vector_set_var (rtx target, rtx val, rtx idx)
   emit_insn (perml);
 }
 
+/* Insert VAL into IDX of TARGET, VAL size is same of the vector element, IDX
+   is variable and also counts by vector element size for p8.  */
+
+void
+rs6000_expand_vector_set_var_p8 (rtx target, rtx val, rtx idx)
+{
+  machine_mode mode = GET_MODE (target);
+
+  gcc_assert (VECTOR_MEM_VSX_P (mode) && !CONST_INT_P (idx));
+
+  gcc_assert (GET_MODE (idx) == E_SImode);
+
+  machine_mode inner_mode = GET_MODE (val);
+  HOST_WIDE_INT mode_mask = GET_MODE_MASK (inner_mode);
+
+  rtx tmp = gen_reg_rtx (GET_MODE (idx));
+  int width = GET_MODE_SIZE (inner_mode);
+
+  gcc_assert (width >= 1 && width <= 4);
+
+  if (!BYTES_BIG_ENDIAN)
+    {
+      /*  idx = idx * width.  */
+      emit_insn (gen_mulsi3 (tmp, idx, GEN_INT (width)));
+      /*  idx = idx + 8.  */
+      emit_insn (gen_addsi3 (tmp, tmp, GEN_INT (8)));
+    }
+  else
+    {
+      emit_insn (gen_mulsi3 (tmp, idx, GEN_INT (width)));
+      emit_insn (gen_subsi3 (tmp, GEN_INT (24 - width), tmp));
+    }
+
+  /*  lxv vs33, mask.
+      DImode: 0xffffffffffffffff0000000000000000
+      SImode: 0x00000000ffffffff0000000000000000
+      HImode: 0x000000000000ffff0000000000000000.
+      QImode: 0x00000000000000ff0000000000000000.  */
+  rtx mask = gen_reg_rtx (V16QImode);
+  rtx mask_v2di = gen_reg_rtx (V2DImode);
+  rtvec v = rtvec_alloc (2);
+  if (!BYTES_BIG_ENDIAN)
+    {
+      RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (DImode, 0);
+      RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (DImode, mode_mask);
+    }
+  else
+    {
+      RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (DImode, mode_mask);
+      RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (DImode, 0);
+    }
+  emit_insn (gen_vec_initv2didi (mask_v2di, gen_rtx_PARALLEL (V2DImode, v)));
+  rtx sub_mask = simplify_gen_subreg (V16QImode, mask_v2di, V2DImode, 0);
+  emit_insn (gen_rtx_SET (mask, sub_mask));
+
+  /*  mtvsrd[wz] f0,tmp_val.  */
+  rtx tmp_val = gen_reg_rtx (SImode);
+  if (inner_mode == E_SFmode)
+    emit_insn (gen_movsi_from_sf (tmp_val, val));
+  else
+    tmp_val = force_reg (SImode, val);
+
+  rtx val_v16qi = gen_reg_rtx (V16QImode);
+  rtx val_v2di = gen_reg_rtx (V2DImode);
+  rtvec vec_val = rtvec_alloc (2);
+  if (!BYTES_BIG_ENDIAN)
+  {
+    RTVEC_ELT (vec_val, 0) = gen_rtx_CONST_INT (DImode, 0);
+    RTVEC_ELT (vec_val, 1) = tmp_val;
+  }
+  else
+  {
+    RTVEC_ELT (vec_val, 0) = tmp_val;
+    RTVEC_ELT (vec_val, 1) = gen_rtx_CONST_INT (DImode, 0);
+  }
+  emit_insn (
+    gen_vec_initv2didi (val_v2di, gen_rtx_PARALLEL (V2DImode, vec_val)));
+  rtx sub_val = simplify_gen_subreg (V16QImode, val_v2di, V2DImode, 0);
+  emit_insn (gen_rtx_SET (val_v16qi, sub_val));
+
+  /*  lvsl    13,0,idx.  */
+  tmp = convert_modes (DImode, SImode, tmp, 1);
+  rtx pcv = gen_reg_rtx (V16QImode);
+  emit_insn (gen_altivec_lvsl_reg (pcv, tmp));
+
+  /*  vperm 1,1,1,13.  */
+  /*  vperm 0,0,0,13.  */
+  rtx val_perm = gen_reg_rtx (V16QImode);
+  rtx mask_perm = gen_reg_rtx (V16QImode);
+  emit_insn (gen_altivec_vperm_v8hiv16qi (val_perm, val_v16qi, val_v16qi, pcv));
+  emit_insn (gen_altivec_vperm_v8hiv16qi (mask_perm, mask, mask, pcv));
+
+  rtx target_v16qi = simplify_gen_subreg (V16QImode, target, mode, 0);
+
+  /*  xxsel 34,34,32,33.  */
+  emit_insn (
+    gen_vector_select_v16qi (target_v16qi, target_v16qi, val_perm, mask_perm));
+}
+
+/* Insert VAL into IDX of TARGET, VAL size is same of the vector element, IDX
+   is variable and also counts by vector element size.  */
+
+void
+rs6000_expand_vector_set_var (rtx target, rtx val, rtx idx)
+{
+  machine_mode mode = GET_MODE (target);
+  machine_mode inner_mode = GET_MODE_INNER (mode);
+  if (TARGET_P9_VECTOR || GET_MODE_SIZE (inner_mode) == 8)
+    rs6000_expand_vector_set_var_p9 (target, val, idx);
+  else
+    rs6000_expand_vector_set_var_p8 (target, val, idx);
+}
+
 /* Extract field ELT from VEC into TARGET.  */
 
 void
diff --git a/gcc/testsuite/gcc.target/powerpc/pr79251.p8.c b/gcc/testsuite/gcc.target/powerpc/pr79251.p8.c
new file mode 100644
index 00000000000..06da47b7758
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr79251.p8.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-O2 -mdejagnu-cpu=power8 -maltivec" } */
+
+#include <stddef.h>
+#include <altivec.h>
+#include "pr79251.h"
+
+TEST_VEC_INSERT_ALL (test)
+
+/* { dg-final { scan-assembler-not {\mstxw\M} } } */
+/* { dg-final { scan-assembler-times {\mlvsl\M} 10 } } */
+/* { dg-final { scan-assembler-times {\mlvsr\M} 3 } } */
+/* { dg-final { scan-assembler-times {\mvperm\M} 20 } } */
+/* { dg-final { scan-assembler-times {\mxxpermdi\M} 10 } } */
+/* { dg-final { scan-assembler-times {\mxxsel\M} 7 } } */
+
-- 
2.25.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 4/4] rs6000: Update testcases' instruction count
  2020-10-10  8:08 [PATCH 0/4] rs6000: Enable variable vec_insert with IFN VEC_SET Xionghu Luo
                   ` (2 preceding siblings ...)
  2020-10-10  8:08 ` [PATCH 3/4] rs6000: Enable vec_insert for P8 with rs6000_expand_vector_set_var_p8 Xionghu Luo
@ 2020-10-10  8:08 ` Xionghu Luo
  2021-01-22  0:17   ` Segher Boessenkool
  2020-11-05  1:34 ` Ping: [PATCH 0/4] rs6000: Enable variable vec_insert with IFN VEC_SET Xionghu Luo
  4 siblings, 1 reply; 21+ messages in thread
From: Xionghu Luo @ 2020-10-10  8:08 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc, wschmidt, guojiufu, linkw, Xionghu Luo

gcc/testsuite/ChangeLog:

2020-10-10  Xionghu Luo  <luoxhu@linux.ibm.com>

	* gcc.target/powerpc/fold-vec-insert-char-p8.c: Adjust
	instruction counts.
	* gcc.target/powerpc/fold-vec-insert-char-p9.c: Likewise.
	* gcc.target/powerpc/fold-vec-insert-double.c: Likewise.
	* gcc.target/powerpc/fold-vec-insert-float-p8.c: Likewise.
	* gcc.target/powerpc/fold-vec-insert-float-p9.c: Likewise.
	* gcc.target/powerpc/fold-vec-insert-int-p8.c: Likewise.
	* gcc.target/powerpc/fold-vec-insert-int-p9.c: Likewise.
	* gcc.target/powerpc/fold-vec-insert-longlong.c: Likewise.
	* gcc.target/powerpc/fold-vec-insert-short-p8.c: Likewise.
	* gcc.target/powerpc/fold-vec-insert-short-p9.c: Likewise.
	* gcc.target/powerpc/vsx-builtin-7.c: Likewise.
---
 .../gcc.target/powerpc/fold-vec-insert-char-p8.c     | 11 ++++++-----
 .../gcc.target/powerpc/fold-vec-insert-char-p9.c     | 12 ++++++------
 .../gcc.target/powerpc/fold-vec-insert-double.c      | 11 ++++++++---
 .../gcc.target/powerpc/fold-vec-insert-float-p8.c    |  6 +++---
 .../gcc.target/powerpc/fold-vec-insert-float-p9.c    | 10 +++++-----
 .../gcc.target/powerpc/fold-vec-insert-int-p8.c      |  9 +++++----
 .../gcc.target/powerpc/fold-vec-insert-int-p9.c      | 11 +++++------
 .../gcc.target/powerpc/fold-vec-insert-longlong.c    | 10 +++-------
 .../gcc.target/powerpc/fold-vec-insert-short-p8.c    |  9 +++++----
 .../gcc.target/powerpc/fold-vec-insert-short-p9.c    |  8 ++++----
 gcc/testsuite/gcc.target/powerpc/vsx-builtin-7.c     |  4 ++--
 11 files changed, 52 insertions(+), 49 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-char-p8.c b/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-char-p8.c
index b13c8ca19c7..1ad23de99a9 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-char-p8.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-char-p8.c
@@ -44,15 +44,16 @@ vector unsigned char testuu_cst (unsigned char x, vector unsigned char v)
        return vec_insert (x, v, 12);
 }
 
-/* one store per _var test */
-/* { dg-final { scan-assembler-times {\mstvx\M|\mstxvw4x\M} 4 } } */
+/* no store per _var test */
+/* { dg-final { scan-assembler-times {\mstvx\M|\mstxvw4x\M} 0 } } */
 /* one store-byte per test */
-/* { dg-final { scan-assembler-times {\mstb\M} 8 } } */
+/* { dg-final { scan-assembler-times {\mstb\M} 4 } } */
 /* one load per test */
-/* { dg-final { scan-assembler-times {\mlvx\M|\mlxvw4x\M} 8 } } */
+/* { dg-final { scan-assembler-times {\mlvx\M|\mlxvw4x\M} 8 { target le } } } */
+/* { dg-final { scan-assembler-times {\mlvx\M|\mlxvw4x\M} 4 { target be } } } */
 
 /* one lvebx per _cst test.*/
 /* { dg-final { scan-assembler-times {\mlvebx\M} 4 } } */
 /* one vperm per _cst test.*/
-/* { dg-final { scan-assembler-times {\mvperm\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mvperm\M} 12 } } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-char-p9.c b/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-char-p9.c
index 16432289d68..400caa31bb4 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-char-p9.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-char-p9.c
@@ -44,13 +44,13 @@ vector unsigned char testuu_cst (unsigned char x, vector unsigned char v)
        return vec_insert (x, v, 12);
 }
 
-/* load immediate, add, store, stb, load variable test.  */
-/* { dg-final { scan-assembler-times {\mstxv\M|\mstvx\M} 4 { target lp64 } } } */
-/* { dg-final { scan-assembler-times {\mstb\M} 4 { target lp64 } } } */
-/* { dg-final { scan-assembler-times {\mlvebx\M|\mlxv\M|\mlvx\M} 4 { target lp64} } } */
+/* no store per _var test.  */
+/* { dg-final { scan-assembler-times {\mstxv\M|\mstvx\M} 0 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mstb\M} 0 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mlvebx\M|\mlxv\M|\mlvx\M} 0 { target lp64} } } */
 /* an insert and a move per constant test. */
-/* { dg-final { scan-assembler-times {\mmtvsrwz\M} 4 { target lp64 } } } */
-/* { dg-final { scan-assembler-times {\mvinsertb\M} 4 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mmtvsrwz\M} 8 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mvinsertb\M} 8 { target lp64 } } } */
 
 /* -m32 codegen. */
 /* { dg-final { scan-assembler-times {\mrlwinm\M} 4 { target ilp32 } } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-double.c b/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-double.c
index 435d28d5420..842fe9bbcad 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-double.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-double.c
@@ -23,7 +23,12 @@ testd_cst (double d, vector double vd)
 /* { dg-final { scan-assembler {\mxxpermdi\M} } } */
 
 /* { dg-final { scan-assembler-times {\mrldic\M|\mrlwinm\M} 1 } } */
-/* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstxv\M|\mstvx\M} 1 } } */
-/* { dg-final { scan-assembler-times {\mstfdx\M|\mstfd\M} 1 } } */
-/* { dg-final { scan-assembler-times {\mlxvd2x\M|\mlxv\M|\mlvx\M} 1 } } */
+
+/* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstxv\M|\mstvx\M} 1 { target { ! has_arch_pwr8 } } } } */
+/* { dg-final { scan-assembler-times {\mstfdx\M|\mstfd\M} 1 { target { ! has_arch_pwr8 } } } } */
+/* { dg-final { scan-assembler-times {\mlxvd2x\M|\mlxv\M|\mlvx\M} 1 { target { ! has_arch_pwr8 } } } } */
+
+/* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstxv\M|\mstvx\M} 0 { target { has_arch_pwr8 } } } } */
+/* { dg-final { scan-assembler-times {\mstfdx\M|\mstfd\M} 0 { target { has_arch_pwr8 } } } } */
+/* { dg-final { scan-assembler-times {\mlxvd2x\M|\mlxv\M|\mlvx\M} 0 { target { has_arch_pwr8 } } } } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-float-p8.c b/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-float-p8.c
index 7682aea8165..6a3b1b4c39e 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-float-p8.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-float-p8.c
@@ -19,12 +19,12 @@ testf_cst (float f, vector float vf)
   return vec_insert (f, vf, 12);
 }
 
-/* { dg-final { scan-assembler-times {\mstvx\M|\mstxv\M|\mstxvd2x\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mstvx\M|\mstxv\M|\mstxvd2x\M} 0 } } */
 /* cst tests has stfs instead of stfsx. */
-/* { dg-final { scan-assembler-times {\mstfs\M|\mstfsx\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mstfs\M|\mstfsx\M} 1 } } */
 /* { dg-final { scan-assembler-times {\mlvx\M|\mlxv\M|\mlxvd2x\M|\mlxvw4x\M} 2 } } */
 
 /* cst test has a lvewx,vperm combo */
 /* { dg-final { scan-assembler-times {\mlvewx\M} 1 } } */
-/* { dg-final { scan-assembler-times {\mvperm\M} 1 } } */
+/* { dg-final { scan-assembler-times {\mvperm\M} 3 } } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-float-p9.c b/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-float-p9.c
index 93c263e04da..9b719a07916 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-float-p9.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-float-p9.c
@@ -20,13 +20,13 @@ testf_cst (float f, vector float vf)
 }
 
 /* var test has a load and store. */
-/* { dg-final { scan-assembler-times {\mlxv\M|\mlvx\M} 1 { target lp64 } } } */
-/* { dg-final { scan-assembler-times {\mstfsx\M} 1 { target lp64} } } */
+/* { dg-final { scan-assembler-times {\mlxv\M|\mlvx\M} 0 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mstfsx\M} 0 { target lp64} } } */
 
 /* cst test have a xscvdpspn,xxextractuw,xxinsertw combo */
-/* { dg-final { scan-assembler-times {\mxscvdpspn\M} 1 { target lp64 } } } */
-/* { dg-final { scan-assembler-times {\mxxextractuw\M} 1 { target lp64 } } } */
-/* { dg-final { scan-assembler-times {\mxxinsertw\M} 1 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mxscvdpspn\M} 2 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mxxextractuw\M} 2 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mxxinsertw\M} 2 { target lp64 } } } */
 
 /* { dg-final { scan-assembler-times {\mstfs\M} 2 { target ilp32 } } } */
 /* { dg-final { scan-assembler-times {\mlxv\M} 2 { target ilp32 } } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-int-p8.c b/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-int-p8.c
index 4a3b1ae6fc1..6e4851de658 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-int-p8.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-int-p8.c
@@ -49,9 +49,10 @@ testui2_cst(unsigned int x, vector unsigned int v)
 }
 
 /* Each test has lvx (8).  cst tests have additional lvewx. (4) */
-/* var tests have both stwx (4) and stvx (4).  cst tests have stw (4).*/
-/* { dg-final { scan-assembler-times {\mstvx\M|\mstwx\M|\mstw\M|\mstxvw4x\M} 12 } } */
-/* { dg-final { scan-assembler-times {\mlvx\M|\mlxvw4x\M} 8 } } */
+/* var tests have no stwx and stvx.  cst tests have stw (4).*/
+/* { dg-final { scan-assembler-times {\mstvx\M|\mstwx\M|\mstw\M|\mstxvw4x\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mlvx\M|\mlxvw4x\M} 8 { target le } } } */
+/* { dg-final { scan-assembler-times {\mlvx\M|\mlxvw4x\M} 4 { target be } } } */
 
 /* { dg-final { scan-assembler-times {\mlvewx\M} 4 } } */
-/* { dg-final { scan-assembler-times {\mvperm\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mvperm\M} 12 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-int-p9.c b/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-int-p9.c
index 5ba5d53f276..50af92168f1 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-int-p9.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-int-p9.c
@@ -49,14 +49,13 @@ testui2_cst(unsigned int x, vector unsigned int v)
 }
 
 
-/* load immediate, add, store, stb, load variable test.  */
-/* { dg-final { scan-assembler-times {\mstxv\M|\mstvx\M} 4 } } */
-/* { dg-final { scan-assembler-times {\mstwx\M} 4 { target lp64 } } } */
-/* { dg-final { scan-assembler-times {\mlxv\M|\mlvx\M} 4 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mstxv\M|\mstvx\M} 0 } } */
+/* { dg-final { scan-assembler-times {\mstwx\M} 0 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mlxv\M|\mlvx\M} 0 { target lp64 } } } */
 
 /* an insert and a move per constant test. */
-/* { dg-final { scan-assembler-times {\mmtvsrwz\M} 4 { target lp64 } } } */
-/* { dg-final { scan-assembler-times {\mxxinsertw\M} 4 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mmtvsrwz\M} 8 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mxxinsertw\M} 8 { target lp64 } } } */
 
 
 /* { dg-final { scan-assembler-times {\maddi\M} 12 { target ilp32 } } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-longlong.c b/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-longlong.c
index 337b38fb7d3..e003b76d0b9 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-longlong.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-longlong.c
@@ -60,13 +60,9 @@ testul2_cst(unsigned long long x, vector unsigned long long v)
 
 /* { dg-final { scan-assembler-times {\mrldic\M|\mrlwinm\M} 4 } } */
 
-/* The number of addi instructions decreases on newer systems.  Measured as 8 on
- power7 and power8 targets, and drops to 4 on power9 targets that use the
- newer stxv,lxv instructions.  For this test ensure we get at least one.  */
-/* { dg-final { scan-assembler {\maddi\M} } } */
-/* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstvx\M|\mstxv\M} 4 } } */
-/* { dg-final { scan-assembler-times {\mstdx\M} 4 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mstxvd2x\M|\mstvx\M|\mstxv\M} 0 } } */
+/* { dg-final { scan-assembler-times {\mstdx\M} 0 { target lp64 } } } */
 /* { dg-final { scan-assembler-times {\mstw\M} 8 { target ilp32 } } } */
 
-/* { dg-final { scan-assembler-times {\mlxvd2x\M|\mlxv\M|\mlvx\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mlxvd2x\M|\mlxv\M|\mlvx\M} 0 } } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-short-p8.c b/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-short-p8.c
index 3ed40043095..d3faae018d7 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-short-p8.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-short-p8.c
@@ -48,10 +48,11 @@ testus2_cst(unsigned short x, vector unsigned short v)
    return vec_insert(x, v, 12);
 }
 
-/* { dg-final { scan-assembler-times {\mlhz\M|\mlvx\M|\mlxv\M|\mlxvw4x\M} 8 } } */
-/* stores.. 2 each per variable tests, 1 each per cst test. */
-/* { dg-final { scan-assembler-times {\msthx\M|\mstvx\M|\msth\M|\mstxvw4x\M} 12 } } */
+/* { dg-final { scan-assembler-times {\mlhz\M|\mlvx\M|\mlxv\M|\mlxvw4x\M} 8 { target le } } } */
+/* { dg-final { scan-assembler-times {\mlhz\M|\mlvx\M|\mlxv\M|\mlxvw4x\M} 4 { target be } } } */
+/* stores.. 0 per variable tests, 1 each per cst test. */
+/* { dg-final { scan-assembler-times {\msthx\M|\mstvx\M|\msth\M|\mstxvw4x\M} 4 } } */
 
 /* { dg-final { scan-assembler-times {\mlvehx\M} 4 } } */
-/* { dg-final { scan-assembler-times {\mvperm\M} 4 } } */
+/* { dg-final { scan-assembler-times {\mvperm\M} 12 } } */
 
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-short-p9.c b/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-short-p9.c
index f09fd21691c..d864a83ee0f 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-short-p9.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-insert-short-p9.c
@@ -48,11 +48,11 @@ testus2_cst(unsigned short x, vector unsigned short v)
    return vec_insert(x, v, 12);
 }
 
-/* { dg-final { scan-assembler-times {\mmtvsrwz\M} 4 { target lp64 } } } */
-/* { dg-final { scan-assembler-times {\mvinserth\M} 4 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mmtvsrwz\M} 8 { target lp64 } } } */
+/* { dg-final { scan-assembler-times {\mvinserth\M} 8 { target lp64 } } } */
 
-/* { dg-final { scan-assembler-times {\mstxv\M|\mstvx\M} 4 } } */
-/* { dg-final { scan-assembler-times {\mlxv\M|\mlvx\M} 4 { target lp64 }} } */
+/* { dg-final { scan-assembler-times {\mstxv\M|\mstvx\M} 0 } } */
+/* { dg-final { scan-assembler-times {\mlxv\M|\mlvx\M} 0 { target lp64 }} } */
 
 /* -m32 uses sth/lvehx as part of the sequence. */
 /* { dg-final { scan-assembler-times {\msth\M} 8 { target ilp32 }} } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-7.c b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-7.c
index 0780b01ffab..341cfb15555 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-7.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-7.c
@@ -193,8 +193,8 @@ vector unsigned __int128 splat_uint128 (unsigned __int128 x) { return vec_splats
 /* { dg-final { scan-assembler-times {\mrldic\M} 0  { target { be && ilp32 } } } } */
 /* { dg-final { scan-assembler-times {\mrldic\M} 64 { target { be && lp64 } } } } */
 /* { dg-final { scan-assembler-times {\mrldic\M} 64 { target le } } } */
-/* { dg-final { scan-assembler-times "xxpermdi" 4 { target be } } } */
-/* { dg-final { scan-assembler-times "xxpermdi" 6 { target le } } } */
+/* { dg-final { scan-assembler-times "xxpermdi" 11 { target be } } } */
+/* { dg-final { scan-assembler-times "xxpermdi" 19 { target le } } } */
 /* { dg-final { scan-assembler-times "vspltisb" 2 } } */
 /* { dg-final { scan-assembler-times "vspltish" 2 } } */
 /* { dg-final { scan-assembler-times "vspltisw" 2 } } */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Ping: [PATCH 0/4] rs6000: Enable variable vec_insert with IFN VEC_SET
  2020-10-10  8:08 [PATCH 0/4] rs6000: Enable variable vec_insert with IFN VEC_SET Xionghu Luo
                   ` (3 preceding siblings ...)
  2020-10-10  8:08 ` [PATCH 4/4] rs6000: Update testcases' instruction count Xionghu Luo
@ 2020-11-05  1:34 ` Xionghu Luo
  2020-11-13  2:05   ` Ping^2: " Xionghu Luo
  4 siblings, 1 reply; 21+ messages in thread
From: Xionghu Luo @ 2020-11-05  1:34 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc, wschmidt, guojiufu, linkw

Ping.

On 2020/10/10 16:08, Xionghu Luo wrote:
> Originated from
> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554240.html
> with patch split and some refinement per review comments.
> 
> Patch of IFN VEC_SET for ARRAY_REF(VIEW_CONVERT_EXPR) is committed,
> this patch set enables expanding IFN VEC_SET for Power9 and Power8
> with specfic instruction sequences.
> 
> Xionghu Luo (4):
>    rs6000: Change rs6000_expand_vector_set param
>    rs6000: Support variable insert and Expand vec_insert in expander [PR79251]
>    rs6000: Enable vec_insert for P8 with rs6000_expand_vector_set_var_p8
>    rs6000: Update testcases' instruction count
> 
>   gcc/config/rs6000/rs6000-c.c                  |  44 +++--
>   gcc/config/rs6000/rs6000-call.c               |   2 +-
>   gcc/config/rs6000/rs6000-protos.h             |   3 +-
>   gcc/config/rs6000/rs6000.c                    | 181 +++++++++++++++++-
>   gcc/config/rs6000/vector.md                   |   4 +-
>   .../powerpc/fold-vec-insert-char-p8.c         |   8 +-
>   .../powerpc/fold-vec-insert-char-p9.c         |  12 +-
>   .../powerpc/fold-vec-insert-double.c          |  11 +-
>   .../powerpc/fold-vec-insert-float-p8.c        |   6 +-
>   .../powerpc/fold-vec-insert-float-p9.c        |  10 +-
>   .../powerpc/fold-vec-insert-int-p8.c          |   6 +-
>   .../powerpc/fold-vec-insert-int-p9.c          |  11 +-
>   .../powerpc/fold-vec-insert-longlong.c        |  10 +-
>   .../powerpc/fold-vec-insert-short-p8.c        |   6 +-
>   .../powerpc/fold-vec-insert-short-p9.c        |   8 +-
>   .../gcc.target/powerpc/pr79251-run.c          |  28 +++
>   gcc/testsuite/gcc.target/powerpc/pr79251.h    |  19 ++
>   gcc/testsuite/gcc.target/powerpc/pr79251.p8.c |  17 ++
>   gcc/testsuite/gcc.target/powerpc/pr79251.p9.c |  18 ++
>   .../gcc.target/powerpc/vsx-builtin-7.c        |   4 +-
>   20 files changed, 337 insertions(+), 71 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251-run.c
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251.h
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251.p8.c
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251.p9.c
> 

-- 
Thanks,
Xionghu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Ping^2: [PATCH 0/4] rs6000: Enable variable vec_insert with IFN VEC_SET
  2020-11-05  1:34 ` Ping: [PATCH 0/4] rs6000: Enable variable vec_insert with IFN VEC_SET Xionghu Luo
@ 2020-11-13  2:05   ` Xionghu Luo
  2020-11-24  2:29     ` Ping^3: " Xionghu Luo
  0 siblings, 1 reply; 21+ messages in thread
From: Xionghu Luo @ 2020-11-13  2:05 UTC (permalink / raw)
  To: gcc-patches; +Cc: wschmidt, dje.gcc, segher, linkw

Ping^2, thanks.

On 2020/11/5 09:34, Xionghu Luo via Gcc-patches wrote:
> Ping.
> 
> On 2020/10/10 16:08, Xionghu Luo wrote:
>> Originated from
>> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554240.html
>> with patch split and some refinement per review comments.
>>
>> Patch of IFN VEC_SET for ARRAY_REF(VIEW_CONVERT_EXPR) is committed,
>> this patch set enables expanding IFN VEC_SET for Power9 and Power8
>> with specfic instruction sequences.
>>
>> Xionghu Luo (4):
>>    rs6000: Change rs6000_expand_vector_set param
>>    rs6000: Support variable insert and Expand vec_insert in expander 
>> [PR79251]
>>    rs6000: Enable vec_insert for P8 with rs6000_expand_vector_set_var_p8
>>    rs6000: Update testcases' instruction count
>>
>>   gcc/config/rs6000/rs6000-c.c                  |  44 +++--
>>   gcc/config/rs6000/rs6000-call.c               |   2 +-
>>   gcc/config/rs6000/rs6000-protos.h             |   3 +-
>>   gcc/config/rs6000/rs6000.c                    | 181 +++++++++++++++++-
>>   gcc/config/rs6000/vector.md                   |   4 +-
>>   .../powerpc/fold-vec-insert-char-p8.c         |   8 +-
>>   .../powerpc/fold-vec-insert-char-p9.c         |  12 +-
>>   .../powerpc/fold-vec-insert-double.c          |  11 +-
>>   .../powerpc/fold-vec-insert-float-p8.c        |   6 +-
>>   .../powerpc/fold-vec-insert-float-p9.c        |  10 +-
>>   .../powerpc/fold-vec-insert-int-p8.c          |   6 +-
>>   .../powerpc/fold-vec-insert-int-p9.c          |  11 +-
>>   .../powerpc/fold-vec-insert-longlong.c        |  10 +-
>>   .../powerpc/fold-vec-insert-short-p8.c        |   6 +-
>>   .../powerpc/fold-vec-insert-short-p9.c        |   8 +-
>>   .../gcc.target/powerpc/pr79251-run.c          |  28 +++
>>   gcc/testsuite/gcc.target/powerpc/pr79251.h    |  19 ++
>>   gcc/testsuite/gcc.target/powerpc/pr79251.p8.c |  17 ++
>>   gcc/testsuite/gcc.target/powerpc/pr79251.p9.c |  18 ++
>>   .../gcc.target/powerpc/vsx-builtin-7.c        |   4 +-
>>   20 files changed, 337 insertions(+), 71 deletions(-)
>>   create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251-run.c
>>   create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251.h
>>   create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251.p8.c
>>   create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251.p9.c
>>
> 

-- 
Thanks,
Xionghu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Ping^3: [PATCH 0/4] rs6000: Enable variable vec_insert with IFN VEC_SET
  2020-11-13  2:05   ` Ping^2: " Xionghu Luo
@ 2020-11-24  2:29     ` Xionghu Luo
  0 siblings, 0 replies; 21+ messages in thread
From: Xionghu Luo @ 2020-11-24  2:29 UTC (permalink / raw)
  To: gcc-patches; +Cc: wschmidt, segher, dje.gcc, linkw

Ping^3, thanks.

https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555905.html


On 2020/11/13 10:05, Xionghu Luo via Gcc-patches wrote:
> Ping^2, thanks.
> 
> On 2020/11/5 09:34, Xionghu Luo via Gcc-patches wrote:
>> Ping.
>>
>> On 2020/10/10 16:08, Xionghu Luo wrote:
>>> Originated from
>>> https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554240.html
>>> with patch split and some refinement per review comments.
>>>
>>> Patch of IFN VEC_SET for ARRAY_REF(VIEW_CONVERT_EXPR) is committed,
>>> this patch set enables expanding IFN VEC_SET for Power9 and Power8
>>> with specfic instruction sequences.
>>>
>>> Xionghu Luo (4):
>>>    rs6000: Change rs6000_expand_vector_set param
>>>    rs6000: Support variable insert and Expand vec_insert in expander 
>>> [PR79251]
>>>    rs6000: Enable vec_insert for P8 with rs6000_expand_vector_set_var_p8
>>>    rs6000: Update testcases' instruction count
>>>
>>>   gcc/config/rs6000/rs6000-c.c                  |  44 +++--
>>>   gcc/config/rs6000/rs6000-call.c               |   2 +-
>>>   gcc/config/rs6000/rs6000-protos.h             |   3 +-
>>>   gcc/config/rs6000/rs6000.c                    | 181 +++++++++++++++++-
>>>   gcc/config/rs6000/vector.md                   |   4 +-
>>>   .../powerpc/fold-vec-insert-char-p8.c         |   8 +-
>>>   .../powerpc/fold-vec-insert-char-p9.c         |  12 +-
>>>   .../powerpc/fold-vec-insert-double.c          |  11 +-
>>>   .../powerpc/fold-vec-insert-float-p8.c        |   6 +-
>>>   .../powerpc/fold-vec-insert-float-p9.c        |  10 +-
>>>   .../powerpc/fold-vec-insert-int-p8.c          |   6 +-
>>>   .../powerpc/fold-vec-insert-int-p9.c          |  11 +-
>>>   .../powerpc/fold-vec-insert-longlong.c        |  10 +-
>>>   .../powerpc/fold-vec-insert-short-p8.c        |   6 +-
>>>   .../powerpc/fold-vec-insert-short-p9.c        |   8 +-
>>>   .../gcc.target/powerpc/pr79251-run.c          |  28 +++
>>>   gcc/testsuite/gcc.target/powerpc/pr79251.h    |  19 ++
>>>   gcc/testsuite/gcc.target/powerpc/pr79251.p8.c |  17 ++
>>>   gcc/testsuite/gcc.target/powerpc/pr79251.p9.c |  18 ++
>>>   .../gcc.target/powerpc/vsx-builtin-7.c        |   4 +-
>>>   20 files changed, 337 insertions(+), 71 deletions(-)
>>>   create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251-run.c
>>>   create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251.h
>>>   create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251.p8.c
>>>   create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251.p9.c
>>>
>>
> 

-- 
Thanks,
Xionghu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/4] rs6000: Change rs6000_expand_vector_set param
  2020-10-10  8:08 ` [PATCH 1/4] rs6000: Change rs6000_expand_vector_set param Xionghu Luo
@ 2020-11-24 19:44   ` Segher Boessenkool
  0 siblings, 0 replies; 21+ messages in thread
From: Segher Boessenkool @ 2020-11-24 19:44 UTC (permalink / raw)
  To: Xionghu Luo; +Cc: gcc-patches, dje.gcc, wschmidt, guojiufu, linkw

On Sat, Oct 10, 2020 at 03:08:22AM -0500, Xionghu Luo wrote:
> rs6000_expand_vector_set could accept insert either to constant position
> or variable position, so change the operand to reg_or_cint_operand.

This is okay for trunk.  Thank you!


Segher


> 2020-10-10  Xionghu Luo  <luoxhu@linux.ibm.com>
> 
> 	* config/rs6000/rs6000-call.c (altivec_expand_vec_set_builtin):
> 	Change call param 2 from type int to rtx.
> 	* config/rs6000/rs6000-protos.h (rs6000_expand_vector_set):
> 	Likewise.
> 	* config/rs6000/rs6000.c (rs6000_expand_vector_init):
> 	Change call param 2 from type int to rtx.
> 	(rs6000_expand_vector_set): Likewise.
> 	* config/rs6000/vector.md (vec_set<mode>): Support both constant
> 	and variable index vec_set.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 2/4] rs6000: Support variable insert and Expand vec_insert in expander [PR79251]
  2020-10-10  8:08 ` [PATCH 2/4] rs6000: Support variable insert and Expand vec_insert in expander [PR79251] Xionghu Luo
@ 2020-11-24 22:37   ` Segher Boessenkool
  0 siblings, 0 replies; 21+ messages in thread
From: Segher Boessenkool @ 2020-11-24 22:37 UTC (permalink / raw)
  To: Xionghu Luo; +Cc: gcc-patches, dje.gcc, wschmidt, guojiufu, linkw

Hi!

On Sat, Oct 10, 2020 at 03:08:23AM -0500, Xionghu Luo wrote:
> vec_insert accepts 3 arguments, arg0 is input vector, arg1 is the value
> to be insert, arg2 is the place to insert arg1 to arg0.  Current expander
> generates stxv+stwx+lxv if arg2 is variable instead of constant, which
> causes serious store hit load performance issue on Power.  This patch tries
>  1) Build VIEW_CONVERT_EXPR for vec_insert (i, v, n) like v[n&3] = i to
> unify the gimple code, then expander could use vec_set_optab to expand.
>  2) Expand the IFN VEC_SET to fast instructions: lvsr+insert+lvsl.
> In this way, "vec_insert (i, v, n)" and "v[n&3] = i" won't be expanded too
> early in gimple stage if arg2 is variable, avoid generating store hit load
> instructions.
> 
> For Power9 V4SI:
> 	addi 9,1,-16
> 	rldic 6,6,2,60
> 	stxv 34,-16(1)
> 	stwx 5,9,6
> 	lxv 34,-16(1)
> =>
> 	rlwinm 6,6,2,28,29
> 	mtvsrwz 0,5
> 	lvsr 1,0,6
> 	lvsl 0,0,6
> 	xxperm 34,34,33
> 	xxinsertw 34,0,12
> 	xxperm 34,34,32

It still takes me quite some time to verify this, tricky bit-fiddling!

But the code that generates this is easier to read :-)

> +/* Insert VAL into IDX of TARGET, VAL size is same of the vector element, IDX
> +   is variable and also counts by vector element size.  */

"Set vector element IDX of TARGET to VAL.  IDX is not a constant
integer."?


Okay for trunk (with an improved comment).  Thanks!


Segher

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/4] rs6000: Enable vec_insert for P8 with rs6000_expand_vector_set_var_p8
  2020-10-10  8:08 ` [PATCH 3/4] rs6000: Enable vec_insert for P8 with rs6000_expand_vector_set_var_p8 Xionghu Luo
@ 2020-11-27  1:04   ` Xionghu Luo
  2020-12-03 14:16     ` Xionghu Luo
  2021-01-21 23:48   ` Segher Boessenkool
  1 sibling, 1 reply; 21+ messages in thread
From: Xionghu Luo @ 2020-11-27  1:04 UTC (permalink / raw)
  To: gcc-patches; +Cc: segher, dje.gcc, wschmidt, guojiufu, linkw

Hi Segher,
Thanks for the approval of [PATCH 1/4] and [PATCH 2/4], what's your
opinion of this [PATCH 3/4] for P8, please?  xxinsertw only exists since 
v3.0, so we had to implement by another way.


Xionghu


On 2020/10/10 16:08, Xionghu Luo wrote:
> gcc/ChangeLog:
> 
> 2020-10-10  Xionghu Luo  <luoxhu@linux.ibm.com>
> 
> 	* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
> 	Generate ARRAY_REF(VIEW_CONVERT_EXPR) for P8 and later
> 	platforms.
> 	* config/rs6000/rs6000.c (rs6000_expand_vector_set_var): Update
> 	to call different path for P8 and P9.
> 	(rs6000_expand_vector_set_var_p9): New function.
> 	(rs6000_expand_vector_set_var_p8): New function.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-10-10  Xionghu Luo  <luoxhu@linux.ibm.com>
> 
> 	* gcc.target/powerpc/pr79251.p8.c: New test.
> ---
>   gcc/config/rs6000/rs6000-c.c                  |  27 +++-
>   gcc/config/rs6000/rs6000.c                    | 117 +++++++++++++++++-
>   gcc/testsuite/gcc.target/powerpc/pr79251.p8.c |  17 +++
>   3 files changed, 155 insertions(+), 6 deletions(-)
>   create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251.p8.c
> 
> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
> index 5551a21d738..4bea8001ec6 100644
> --- a/gcc/config/rs6000/rs6000-c.c
> +++ b/gcc/config/rs6000/rs6000-c.c
> @@ -1599,10 +1599,29 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,
>   	  SET_EXPR_LOCATION (stmt, loc);
>   	  stmt = build1 (COMPOUND_LITERAL_EXPR, arg1_type, stmt);
>   	}
> -      stmt = build_array_ref (loc, stmt, arg2);
> -      stmt = fold_build2 (MODIFY_EXPR, TREE_TYPE (arg0), stmt,
> -			  convert (TREE_TYPE (stmt), arg0));
> -      stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
> +
> +      if (TARGET_P8_VECTOR)
> +	{
> +	  stmt = build_array_ref (loc, stmt, arg2);
> +	  stmt = fold_build2 (MODIFY_EXPR, TREE_TYPE (arg0), stmt,
> +			      convert (TREE_TYPE (stmt), arg0));
> +	  stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
> +	}
> +      else
> +	{
> +	  tree arg1_inner_type;
> +	  tree innerptrtype;
> +	  arg1_inner_type = TREE_TYPE (arg1_type);
> +	  innerptrtype = build_pointer_type (arg1_inner_type);
> +
> +	  stmt = build_unary_op (loc, ADDR_EXPR, stmt, 0);
> +	  stmt = convert (innerptrtype, stmt);
> +	  stmt = build_binary_op (loc, PLUS_EXPR, stmt, arg2, 1);
> +	  stmt = build_indirect_ref (loc, stmt, RO_NULL);
> +	  stmt = build2 (MODIFY_EXPR, TREE_TYPE (stmt), stmt,
> +			 convert (TREE_TYPE (stmt), arg0));
> +	  stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
> +	}
>         return stmt;
>       }
>   
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 96f76c7a74c..33ca839cb28 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -6806,10 +6806,10 @@ rs6000_expand_vector_set (rtx target, rtx val, rtx elt_rtx)
>   }
>   
>   /* Insert VAL into IDX of TARGET, VAL size is same of the vector element, IDX
> -   is variable and also counts by vector element size.  */
> +   is variable and also counts by vector element size for p9 and above.  */
>   
>   void
> -rs6000_expand_vector_set_var (rtx target, rtx val, rtx idx)
> +rs6000_expand_vector_set_var_p9 (rtx target, rtx val, rtx idx)
>   {
>     machine_mode mode = GET_MODE (target);
>   
> @@ -6852,6 +6852,119 @@ rs6000_expand_vector_set_var (rtx target, rtx val, rtx idx)
>     emit_insn (perml);
>   }
>   
> +/* Insert VAL into IDX of TARGET, VAL size is same of the vector element, IDX
> +   is variable and also counts by vector element size for p8.  */
> +
> +void
> +rs6000_expand_vector_set_var_p8 (rtx target, rtx val, rtx idx)
> +{
> +  machine_mode mode = GET_MODE (target);
> +
> +  gcc_assert (VECTOR_MEM_VSX_P (mode) && !CONST_INT_P (idx));
> +
> +  gcc_assert (GET_MODE (idx) == E_SImode);
> +
> +  machine_mode inner_mode = GET_MODE (val);
> +  HOST_WIDE_INT mode_mask = GET_MODE_MASK (inner_mode);
> +
> +  rtx tmp = gen_reg_rtx (GET_MODE (idx));
> +  int width = GET_MODE_SIZE (inner_mode);
> +
> +  gcc_assert (width >= 1 && width <= 4);
> +
> +  if (!BYTES_BIG_ENDIAN)
> +    {
> +      /*  idx = idx * width.  */
> +      emit_insn (gen_mulsi3 (tmp, idx, GEN_INT (width)));
> +      /*  idx = idx + 8.  */
> +      emit_insn (gen_addsi3 (tmp, tmp, GEN_INT (8)));
> +    }
> +  else
> +    {
> +      emit_insn (gen_mulsi3 (tmp, idx, GEN_INT (width)));
> +      emit_insn (gen_subsi3 (tmp, GEN_INT (24 - width), tmp));
> +    }
> +
> +  /*  lxv vs33, mask.
> +      DImode: 0xffffffffffffffff0000000000000000
> +      SImode: 0x00000000ffffffff0000000000000000
> +      HImode: 0x000000000000ffff0000000000000000.
> +      QImode: 0x00000000000000ff0000000000000000.  */
> +  rtx mask = gen_reg_rtx (V16QImode);
> +  rtx mask_v2di = gen_reg_rtx (V2DImode);
> +  rtvec v = rtvec_alloc (2);
> +  if (!BYTES_BIG_ENDIAN)
> +    {
> +      RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (DImode, 0);
> +      RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (DImode, mode_mask);
> +    }
> +  else
> +    {
> +      RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (DImode, mode_mask);
> +      RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (DImode, 0);
> +    }
> +  emit_insn (gen_vec_initv2didi (mask_v2di, gen_rtx_PARALLEL (V2DImode, v)));
> +  rtx sub_mask = simplify_gen_subreg (V16QImode, mask_v2di, V2DImode, 0);
> +  emit_insn (gen_rtx_SET (mask, sub_mask));
> +
> +  /*  mtvsrd[wz] f0,tmp_val.  */
> +  rtx tmp_val = gen_reg_rtx (SImode);
> +  if (inner_mode == E_SFmode)
> +    emit_insn (gen_movsi_from_sf (tmp_val, val));
> +  else
> +    tmp_val = force_reg (SImode, val);
> +
> +  rtx val_v16qi = gen_reg_rtx (V16QImode);
> +  rtx val_v2di = gen_reg_rtx (V2DImode);
> +  rtvec vec_val = rtvec_alloc (2);
> +  if (!BYTES_BIG_ENDIAN)
> +  {
> +    RTVEC_ELT (vec_val, 0) = gen_rtx_CONST_INT (DImode, 0);
> +    RTVEC_ELT (vec_val, 1) = tmp_val;
> +  }
> +  else
> +  {
> +    RTVEC_ELT (vec_val, 0) = tmp_val;
> +    RTVEC_ELT (vec_val, 1) = gen_rtx_CONST_INT (DImode, 0);
> +  }
> +  emit_insn (
> +    gen_vec_initv2didi (val_v2di, gen_rtx_PARALLEL (V2DImode, vec_val)));
> +  rtx sub_val = simplify_gen_subreg (V16QImode, val_v2di, V2DImode, 0);
> +  emit_insn (gen_rtx_SET (val_v16qi, sub_val));
> +
> +  /*  lvsl    13,0,idx.  */
> +  tmp = convert_modes (DImode, SImode, tmp, 1);
> +  rtx pcv = gen_reg_rtx (V16QImode);
> +  emit_insn (gen_altivec_lvsl_reg (pcv, tmp));
> +
> +  /*  vperm 1,1,1,13.  */
> +  /*  vperm 0,0,0,13.  */
> +  rtx val_perm = gen_reg_rtx (V16QImode);
> +  rtx mask_perm = gen_reg_rtx (V16QImode);
> +  emit_insn (gen_altivec_vperm_v8hiv16qi (val_perm, val_v16qi, val_v16qi, pcv));
> +  emit_insn (gen_altivec_vperm_v8hiv16qi (mask_perm, mask, mask, pcv));
> +
> +  rtx target_v16qi = simplify_gen_subreg (V16QImode, target, mode, 0);
> +
> +  /*  xxsel 34,34,32,33.  */
> +  emit_insn (
> +    gen_vector_select_v16qi (target_v16qi, target_v16qi, val_perm, mask_perm));
> +}
> +
> +/* Insert VAL into IDX of TARGET, VAL size is same of the vector element, IDX
> +   is variable and also counts by vector element size.  */
> +
> +void
> +rs6000_expand_vector_set_var (rtx target, rtx val, rtx idx)
> +{
> +  machine_mode mode = GET_MODE (target);
> +  machine_mode inner_mode = GET_MODE_INNER (mode);
> +  if (TARGET_P9_VECTOR || GET_MODE_SIZE (inner_mode) == 8)
> +    rs6000_expand_vector_set_var_p9 (target, val, idx);
> +  else
> +    rs6000_expand_vector_set_var_p8 (target, val, idx);
> +}
> +
>   /* Extract field ELT from VEC into TARGET.  */
>   
>   void
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr79251.p8.c b/gcc/testsuite/gcc.target/powerpc/pr79251.p8.c
> new file mode 100644
> index 00000000000..06da47b7758
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr79251.p8.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power8 -maltivec" } */
> +
> +#include <stddef.h>
> +#include <altivec.h>
> +#include "pr79251.h"
> +
> +TEST_VEC_INSERT_ALL (test)
> +
> +/* { dg-final { scan-assembler-not {\mstxw\M} } } */
> +/* { dg-final { scan-assembler-times {\mlvsl\M} 10 } } */
> +/* { dg-final { scan-assembler-times {\mlvsr\M} 3 } } */
> +/* { dg-final { scan-assembler-times {\mvperm\M} 20 } } */
> +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 10 } } */
> +/* { dg-final { scan-assembler-times {\mxxsel\M} 7 } } */
> +
> 

-- 
Thanks,
Xionghu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/4] rs6000: Enable vec_insert for P8 with rs6000_expand_vector_set_var_p8
  2020-11-27  1:04   ` Xionghu Luo
@ 2020-12-03 14:16     ` Xionghu Luo
  2020-12-10  3:32       ` Xionghu Luo
  2020-12-23  2:18       ` Ping ^ 3: " Xionghu Luo
  0 siblings, 2 replies; 21+ messages in thread
From: Xionghu Luo @ 2020-12-03 14:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: wschmidt, dje.gcc, segher, linkw

Ping. Thanks.


On 2020/11/27 09:04, Xionghu Luo via Gcc-patches wrote:
> Hi Segher,
> Thanks for the approval of [PATCH 1/4] and [PATCH 2/4], what's your
> opinion of this [PATCH 3/4] for P8, please?  xxinsertw only exists since
> v3.0, so we had to implement by another way.
> 
> 
> Xionghu
> 
> 
> On 2020/10/10 16:08, Xionghu Luo wrote:
>> gcc/ChangeLog:
>>
>> 2020-10-10  Xionghu Luo  <luoxhu@linux.ibm.com>
>>
>> 	* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
>> 	Generate ARRAY_REF(VIEW_CONVERT_EXPR) for P8 and later
>> 	platforms.
>> 	* config/rs6000/rs6000.c (rs6000_expand_vector_set_var): Update
>> 	to call different path for P8 and P9.
>> 	(rs6000_expand_vector_set_var_p9): New function.
>> 	(rs6000_expand_vector_set_var_p8): New function.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2020-10-10  Xionghu Luo  <luoxhu@linux.ibm.com>
>>
>> 	* gcc.target/powerpc/pr79251.p8.c: New test.
>> ---
>>    gcc/config/rs6000/rs6000-c.c                  |  27 +++-
>>    gcc/config/rs6000/rs6000.c                    | 117 +++++++++++++++++-
>>    gcc/testsuite/gcc.target/powerpc/pr79251.p8.c |  17 +++
>>    3 files changed, 155 insertions(+), 6 deletions(-)
>>    create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251.p8.c
>>
>> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
>> index 5551a21d738..4bea8001ec6 100644
>> --- a/gcc/config/rs6000/rs6000-c.c
>> +++ b/gcc/config/rs6000/rs6000-c.c
>> @@ -1599,10 +1599,29 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl,
>>    	  SET_EXPR_LOCATION (stmt, loc);
>>    	  stmt = build1 (COMPOUND_LITERAL_EXPR, arg1_type, stmt);
>>    	}
>> -      stmt = build_array_ref (loc, stmt, arg2);
>> -      stmt = fold_build2 (MODIFY_EXPR, TREE_TYPE (arg0), stmt,
>> -			  convert (TREE_TYPE (stmt), arg0));
>> -      stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
>> +
>> +      if (TARGET_P8_VECTOR)
>> +	{
>> +	  stmt = build_array_ref (loc, stmt, arg2);
>> +	  stmt = fold_build2 (MODIFY_EXPR, TREE_TYPE (arg0), stmt,
>> +			      convert (TREE_TYPE (stmt), arg0));
>> +	  stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
>> +	}
>> +      else
>> +	{
>> +	  tree arg1_inner_type;
>> +	  tree innerptrtype;
>> +	  arg1_inner_type = TREE_TYPE (arg1_type);
>> +	  innerptrtype = build_pointer_type (arg1_inner_type);
>> +
>> +	  stmt = build_unary_op (loc, ADDR_EXPR, stmt, 0);
>> +	  stmt = convert (innerptrtype, stmt);
>> +	  stmt = build_binary_op (loc, PLUS_EXPR, stmt, arg2, 1);
>> +	  stmt = build_indirect_ref (loc, stmt, RO_NULL);
>> +	  stmt = build2 (MODIFY_EXPR, TREE_TYPE (stmt), stmt,
>> +			 convert (TREE_TYPE (stmt), arg0));
>> +	  stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
>> +	}
>>          return stmt;
>>        }
>>    
>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>> index 96f76c7a74c..33ca839cb28 100644
>> --- a/gcc/config/rs6000/rs6000.c
>> +++ b/gcc/config/rs6000/rs6000.c
>> @@ -6806,10 +6806,10 @@ rs6000_expand_vector_set (rtx target, rtx val, rtx elt_rtx)
>>    }
>>    
>>    /* Insert VAL into IDX of TARGET, VAL size is same of the vector element, IDX
>> -   is variable and also counts by vector element size.  */
>> +   is variable and also counts by vector element size for p9 and above.  */
>>    
>>    void
>> -rs6000_expand_vector_set_var (rtx target, rtx val, rtx idx)
>> +rs6000_expand_vector_set_var_p9 (rtx target, rtx val, rtx idx)
>>    {
>>      machine_mode mode = GET_MODE (target);
>>    
>> @@ -6852,6 +6852,119 @@ rs6000_expand_vector_set_var (rtx target, rtx val, rtx idx)
>>      emit_insn (perml);
>>    }
>>    
>> +/* Insert VAL into IDX of TARGET, VAL size is same of the vector element, IDX
>> +   is variable and also counts by vector element size for p8.  */
>> +
>> +void
>> +rs6000_expand_vector_set_var_p8 (rtx target, rtx val, rtx idx)
>> +{
>> +  machine_mode mode = GET_MODE (target);
>> +
>> +  gcc_assert (VECTOR_MEM_VSX_P (mode) && !CONST_INT_P (idx));
>> +
>> +  gcc_assert (GET_MODE (idx) == E_SImode);
>> +
>> +  machine_mode inner_mode = GET_MODE (val);
>> +  HOST_WIDE_INT mode_mask = GET_MODE_MASK (inner_mode);
>> +
>> +  rtx tmp = gen_reg_rtx (GET_MODE (idx));
>> +  int width = GET_MODE_SIZE (inner_mode);
>> +
>> +  gcc_assert (width >= 1 && width <= 4);
>> +
>> +  if (!BYTES_BIG_ENDIAN)
>> +    {
>> +      /*  idx = idx * width.  */
>> +      emit_insn (gen_mulsi3 (tmp, idx, GEN_INT (width)));
>> +      /*  idx = idx + 8.  */
>> +      emit_insn (gen_addsi3 (tmp, tmp, GEN_INT (8)));
>> +    }
>> +  else
>> +    {
>> +      emit_insn (gen_mulsi3 (tmp, idx, GEN_INT (width)));
>> +      emit_insn (gen_subsi3 (tmp, GEN_INT (24 - width), tmp));
>> +    }
>> +
>> +  /*  lxv vs33, mask.
>> +      DImode: 0xffffffffffffffff0000000000000000
>> +      SImode: 0x00000000ffffffff0000000000000000
>> +      HImode: 0x000000000000ffff0000000000000000.
>> +      QImode: 0x00000000000000ff0000000000000000.  */
>> +  rtx mask = gen_reg_rtx (V16QImode);
>> +  rtx mask_v2di = gen_reg_rtx (V2DImode);
>> +  rtvec v = rtvec_alloc (2);
>> +  if (!BYTES_BIG_ENDIAN)
>> +    {
>> +      RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (DImode, 0);
>> +      RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (DImode, mode_mask);
>> +    }
>> +  else
>> +    {
>> +      RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (DImode, mode_mask);
>> +      RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (DImode, 0);
>> +    }
>> +  emit_insn (gen_vec_initv2didi (mask_v2di, gen_rtx_PARALLEL (V2DImode, v)));
>> +  rtx sub_mask = simplify_gen_subreg (V16QImode, mask_v2di, V2DImode, 0);
>> +  emit_insn (gen_rtx_SET (mask, sub_mask));
>> +
>> +  /*  mtvsrd[wz] f0,tmp_val.  */
>> +  rtx tmp_val = gen_reg_rtx (SImode);
>> +  if (inner_mode == E_SFmode)
>> +    emit_insn (gen_movsi_from_sf (tmp_val, val));
>> +  else
>> +    tmp_val = force_reg (SImode, val);
>> +
>> +  rtx val_v16qi = gen_reg_rtx (V16QImode);
>> +  rtx val_v2di = gen_reg_rtx (V2DImode);
>> +  rtvec vec_val = rtvec_alloc (2);
>> +  if (!BYTES_BIG_ENDIAN)
>> +  {
>> +    RTVEC_ELT (vec_val, 0) = gen_rtx_CONST_INT (DImode, 0);
>> +    RTVEC_ELT (vec_val, 1) = tmp_val;
>> +  }
>> +  else
>> +  {
>> +    RTVEC_ELT (vec_val, 0) = tmp_val;
>> +    RTVEC_ELT (vec_val, 1) = gen_rtx_CONST_INT (DImode, 0);
>> +  }
>> +  emit_insn (
>> +    gen_vec_initv2didi (val_v2di, gen_rtx_PARALLEL (V2DImode, vec_val)));
>> +  rtx sub_val = simplify_gen_subreg (V16QImode, val_v2di, V2DImode, 0);
>> +  emit_insn (gen_rtx_SET (val_v16qi, sub_val));
>> +
>> +  /*  lvsl    13,0,idx.  */
>> +  tmp = convert_modes (DImode, SImode, tmp, 1);
>> +  rtx pcv = gen_reg_rtx (V16QImode);
>> +  emit_insn (gen_altivec_lvsl_reg (pcv, tmp));
>> +
>> +  /*  vperm 1,1,1,13.  */
>> +  /*  vperm 0,0,0,13.  */
>> +  rtx val_perm = gen_reg_rtx (V16QImode);
>> +  rtx mask_perm = gen_reg_rtx (V16QImode);
>> +  emit_insn (gen_altivec_vperm_v8hiv16qi (val_perm, val_v16qi, val_v16qi, pcv));
>> +  emit_insn (gen_altivec_vperm_v8hiv16qi (mask_perm, mask, mask, pcv));
>> +
>> +  rtx target_v16qi = simplify_gen_subreg (V16QImode, target, mode, 0);
>> +
>> +  /*  xxsel 34,34,32,33.  */
>> +  emit_insn (
>> +    gen_vector_select_v16qi (target_v16qi, target_v16qi, val_perm, mask_perm));
>> +}
>> +
>> +/* Insert VAL into IDX of TARGET, VAL size is same of the vector element, IDX
>> +   is variable and also counts by vector element size.  */
>> +
>> +void
>> +rs6000_expand_vector_set_var (rtx target, rtx val, rtx idx)
>> +{
>> +  machine_mode mode = GET_MODE (target);
>> +  machine_mode inner_mode = GET_MODE_INNER (mode);
>> +  if (TARGET_P9_VECTOR || GET_MODE_SIZE (inner_mode) == 8)
>> +    rs6000_expand_vector_set_var_p9 (target, val, idx);
>> +  else
>> +    rs6000_expand_vector_set_var_p8 (target, val, idx);
>> +}
>> +
>>    /* Extract field ELT from VEC into TARGET.  */
>>    
>>    void
>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr79251.p8.c b/gcc/testsuite/gcc.target/powerpc/pr79251.p8.c
>> new file mode 100644
>> index 00000000000..06da47b7758
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.target/powerpc/pr79251.p8.c
>> @@ -0,0 +1,17 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target powerpc_p8vector_ok } */
>> +/* { dg-options "-O2 -mdejagnu-cpu=power8 -maltivec" } */
>> +
>> +#include <stddef.h>
>> +#include <altivec.h>
>> +#include "pr79251.h"
>> +
>> +TEST_VEC_INSERT_ALL (test)
>> +
>> +/* { dg-final { scan-assembler-not {\mstxw\M} } } */
>> +/* { dg-final { scan-assembler-times {\mlvsl\M} 10 } } */
>> +/* { dg-final { scan-assembler-times {\mlvsr\M} 3 } } */
>> +/* { dg-final { scan-assembler-times {\mvperm\M} 20 } } */
>> +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 10 } } */
>> +/* { dg-final { scan-assembler-times {\mxxsel\M} 7 } } */
>> +
>>
> 

-- 
Thanks,
Xionghu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/4] rs6000: Enable vec_insert for P8 with rs6000_expand_vector_set_var_p8
  2020-12-03 14:16     ` Xionghu Luo
@ 2020-12-10  3:32       ` Xionghu Luo
  2020-12-23  2:18       ` Ping ^ 3: " Xionghu Luo
  1 sibling, 0 replies; 21+ messages in thread
From: Xionghu Luo @ 2020-12-10  3:32 UTC (permalink / raw)
  To: gcc-patches; +Cc: wschmidt, segher, dje.gcc, linkw

Ping^2. Thanks.

On 2020/12/3 22:16, Xionghu Luo via Gcc-patches wrote:
> Ping. Thanks.
> 
> 
> On 2020/11/27 09:04, Xionghu Luo via Gcc-patches wrote:
>> Hi Segher,
>> Thanks for the approval of [PATCH 1/4] and [PATCH 2/4], what's your
>> opinion of this [PATCH 3/4] for P8, please?  xxinsertw only exists since
>> v3.0, so we had to implement by another way.
>>
>>
>> Xionghu
>>
>>
>> On 2020/10/10 16:08, Xionghu Luo wrote:
>>> gcc/ChangeLog:
>>>
>>> 2020-10-10  Xionghu Luo  <luoxhu@linux.ibm.com>
>>>
>>>     * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
>>>     Generate ARRAY_REF(VIEW_CONVERT_EXPR) for P8 and later
>>>     platforms.
>>>     * config/rs6000/rs6000.c (rs6000_expand_vector_set_var): Update
>>>     to call different path for P8 and P9.
>>>     (rs6000_expand_vector_set_var_p9): New function.
>>>     (rs6000_expand_vector_set_var_p8): New function.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2020-10-10  Xionghu Luo  <luoxhu@linux.ibm.com>
>>>
>>>     * gcc.target/powerpc/pr79251.p8.c: New test.
>>> ---
>>>    gcc/config/rs6000/rs6000-c.c                  |  27 +++-
>>>    gcc/config/rs6000/rs6000.c                    | 117 
>>> +++++++++++++++++-
>>>    gcc/testsuite/gcc.target/powerpc/pr79251.p8.c |  17 +++
>>>    3 files changed, 155 insertions(+), 6 deletions(-)
>>>    create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251.p8.c
>>>
>>> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
>>> index 5551a21d738..4bea8001ec6 100644
>>> --- a/gcc/config/rs6000/rs6000-c.c
>>> +++ b/gcc/config/rs6000/rs6000-c.c
>>> @@ -1599,10 +1599,29 @@ altivec_resolve_overloaded_builtin 
>>> (location_t loc, tree fndecl,
>>>          SET_EXPR_LOCATION (stmt, loc);
>>>          stmt = build1 (COMPOUND_LITERAL_EXPR, arg1_type, stmt);
>>>        }
>>> -      stmt = build_array_ref (loc, stmt, arg2);
>>> -      stmt = fold_build2 (MODIFY_EXPR, TREE_TYPE (arg0), stmt,
>>> -              convert (TREE_TYPE (stmt), arg0));
>>> -      stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
>>> +
>>> +      if (TARGET_P8_VECTOR)
>>> +    {
>>> +      stmt = build_array_ref (loc, stmt, arg2);
>>> +      stmt = fold_build2 (MODIFY_EXPR, TREE_TYPE (arg0), stmt,
>>> +                  convert (TREE_TYPE (stmt), arg0));
>>> +      stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
>>> +    }
>>> +      else
>>> +    {
>>> +      tree arg1_inner_type;
>>> +      tree innerptrtype;
>>> +      arg1_inner_type = TREE_TYPE (arg1_type);
>>> +      innerptrtype = build_pointer_type (arg1_inner_type);
>>> +
>>> +      stmt = build_unary_op (loc, ADDR_EXPR, stmt, 0);
>>> +      stmt = convert (innerptrtype, stmt);
>>> +      stmt = build_binary_op (loc, PLUS_EXPR, stmt, arg2, 1);
>>> +      stmt = build_indirect_ref (loc, stmt, RO_NULL);
>>> +      stmt = build2 (MODIFY_EXPR, TREE_TYPE (stmt), stmt,
>>> +             convert (TREE_TYPE (stmt), arg0));
>>> +      stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
>>> +    }
>>>          return stmt;
>>>        }
>>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>>> index 96f76c7a74c..33ca839cb28 100644
>>> --- a/gcc/config/rs6000/rs6000.c
>>> +++ b/gcc/config/rs6000/rs6000.c
>>> @@ -6806,10 +6806,10 @@ rs6000_expand_vector_set (rtx target, rtx 
>>> val, rtx elt_rtx)
>>>    }
>>>    /* Insert VAL into IDX of TARGET, VAL size is same of the vector 
>>> element, IDX
>>> -   is variable and also counts by vector element size.  */
>>> +   is variable and also counts by vector element size for p9 and 
>>> above.  */
>>>    void
>>> -rs6000_expand_vector_set_var (rtx target, rtx val, rtx idx)
>>> +rs6000_expand_vector_set_var_p9 (rtx target, rtx val, rtx idx)
>>>    {
>>>      machine_mode mode = GET_MODE (target);
>>> @@ -6852,6 +6852,119 @@ rs6000_expand_vector_set_var (rtx target, rtx 
>>> val, rtx idx)
>>>      emit_insn (perml);
>>>    }
>>> +/* Insert VAL into IDX of TARGET, VAL size is same of the vector 
>>> element, IDX
>>> +   is variable and also counts by vector element size for p8.  */
>>> +
>>> +void
>>> +rs6000_expand_vector_set_var_p8 (rtx target, rtx val, rtx idx)
>>> +{
>>> +  machine_mode mode = GET_MODE (target);
>>> +
>>> +  gcc_assert (VECTOR_MEM_VSX_P (mode) && !CONST_INT_P (idx));
>>> +
>>> +  gcc_assert (GET_MODE (idx) == E_SImode);
>>> +
>>> +  machine_mode inner_mode = GET_MODE (val);
>>> +  HOST_WIDE_INT mode_mask = GET_MODE_MASK (inner_mode);
>>> +
>>> +  rtx tmp = gen_reg_rtx (GET_MODE (idx));
>>> +  int width = GET_MODE_SIZE (inner_mode);
>>> +
>>> +  gcc_assert (width >= 1 && width <= 4);
>>> +
>>> +  if (!BYTES_BIG_ENDIAN)
>>> +    {
>>> +      /*  idx = idx * width.  */
>>> +      emit_insn (gen_mulsi3 (tmp, idx, GEN_INT (width)));
>>> +      /*  idx = idx + 8.  */
>>> +      emit_insn (gen_addsi3 (tmp, tmp, GEN_INT (8)));
>>> +    }
>>> +  else
>>> +    {
>>> +      emit_insn (gen_mulsi3 (tmp, idx, GEN_INT (width)));
>>> +      emit_insn (gen_subsi3 (tmp, GEN_INT (24 - width), tmp));
>>> +    }
>>> +
>>> +  /*  lxv vs33, mask.
>>> +      DImode: 0xffffffffffffffff0000000000000000
>>> +      SImode: 0x00000000ffffffff0000000000000000
>>> +      HImode: 0x000000000000ffff0000000000000000.
>>> +      QImode: 0x00000000000000ff0000000000000000.  */
>>> +  rtx mask = gen_reg_rtx (V16QImode);
>>> +  rtx mask_v2di = gen_reg_rtx (V2DImode);
>>> +  rtvec v = rtvec_alloc (2);
>>> +  if (!BYTES_BIG_ENDIAN)
>>> +    {
>>> +      RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (DImode, 0);
>>> +      RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (DImode, mode_mask);
>>> +    }
>>> +  else
>>> +    {
>>> +      RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (DImode, mode_mask);
>>> +      RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (DImode, 0);
>>> +    }
>>> +  emit_insn (gen_vec_initv2didi (mask_v2di, gen_rtx_PARALLEL 
>>> (V2DImode, v)));
>>> +  rtx sub_mask = simplify_gen_subreg (V16QImode, mask_v2di, 
>>> V2DImode, 0);
>>> +  emit_insn (gen_rtx_SET (mask, sub_mask));
>>> +
>>> +  /*  mtvsrd[wz] f0,tmp_val.  */
>>> +  rtx tmp_val = gen_reg_rtx (SImode);
>>> +  if (inner_mode == E_SFmode)
>>> +    emit_insn (gen_movsi_from_sf (tmp_val, val));
>>> +  else
>>> +    tmp_val = force_reg (SImode, val);
>>> +
>>> +  rtx val_v16qi = gen_reg_rtx (V16QImode);
>>> +  rtx val_v2di = gen_reg_rtx (V2DImode);
>>> +  rtvec vec_val = rtvec_alloc (2);
>>> +  if (!BYTES_BIG_ENDIAN)
>>> +  {
>>> +    RTVEC_ELT (vec_val, 0) = gen_rtx_CONST_INT (DImode, 0);
>>> +    RTVEC_ELT (vec_val, 1) = tmp_val;
>>> +  }
>>> +  else
>>> +  {
>>> +    RTVEC_ELT (vec_val, 0) = tmp_val;
>>> +    RTVEC_ELT (vec_val, 1) = gen_rtx_CONST_INT (DImode, 0);
>>> +  }
>>> +  emit_insn (
>>> +    gen_vec_initv2didi (val_v2di, gen_rtx_PARALLEL (V2DImode, 
>>> vec_val)));
>>> +  rtx sub_val = simplify_gen_subreg (V16QImode, val_v2di, V2DImode, 0);
>>> +  emit_insn (gen_rtx_SET (val_v16qi, sub_val));
>>> +
>>> +  /*  lvsl    13,0,idx.  */
>>> +  tmp = convert_modes (DImode, SImode, tmp, 1);
>>> +  rtx pcv = gen_reg_rtx (V16QImode);
>>> +  emit_insn (gen_altivec_lvsl_reg (pcv, tmp));
>>> +
>>> +  /*  vperm 1,1,1,13.  */
>>> +  /*  vperm 0,0,0,13.  */
>>> +  rtx val_perm = gen_reg_rtx (V16QImode);
>>> +  rtx mask_perm = gen_reg_rtx (V16QImode);
>>> +  emit_insn (gen_altivec_vperm_v8hiv16qi (val_perm, val_v16qi, 
>>> val_v16qi, pcv));
>>> +  emit_insn (gen_altivec_vperm_v8hiv16qi (mask_perm, mask, mask, pcv));
>>> +
>>> +  rtx target_v16qi = simplify_gen_subreg (V16QImode, target, mode, 0);
>>> +
>>> +  /*  xxsel 34,34,32,33.  */
>>> +  emit_insn (
>>> +    gen_vector_select_v16qi (target_v16qi, target_v16qi, val_perm, 
>>> mask_perm));
>>> +}
>>> +
>>> +/* Insert VAL into IDX of TARGET, VAL size is same of the vector 
>>> element, IDX
>>> +   is variable and also counts by vector element size.  */
>>> +
>>> +void
>>> +rs6000_expand_vector_set_var (rtx target, rtx val, rtx idx)
>>> +{
>>> +  machine_mode mode = GET_MODE (target);
>>> +  machine_mode inner_mode = GET_MODE_INNER (mode);
>>> +  if (TARGET_P9_VECTOR || GET_MODE_SIZE (inner_mode) == 8)
>>> +    rs6000_expand_vector_set_var_p9 (target, val, idx);
>>> +  else
>>> +    rs6000_expand_vector_set_var_p8 (target, val, idx);
>>> +}
>>> +
>>>    /* Extract field ELT from VEC into TARGET.  */
>>>    void
>>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr79251.p8.c 
>>> b/gcc/testsuite/gcc.target/powerpc/pr79251.p8.c
>>> new file mode 100644
>>> index 00000000000..06da47b7758
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/powerpc/pr79251.p8.c
>>> @@ -0,0 +1,17 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-require-effective-target powerpc_p8vector_ok } */
>>> +/* { dg-options "-O2 -mdejagnu-cpu=power8 -maltivec" } */
>>> +
>>> +#include <stddef.h>
>>> +#include <altivec.h>
>>> +#include "pr79251.h"
>>> +
>>> +TEST_VEC_INSERT_ALL (test)
>>> +
>>> +/* { dg-final { scan-assembler-not {\mstxw\M} } } */
>>> +/* { dg-final { scan-assembler-times {\mlvsl\M} 10 } } */
>>> +/* { dg-final { scan-assembler-times {\mlvsr\M} 3 } } */
>>> +/* { dg-final { scan-assembler-times {\mvperm\M} 20 } } */
>>> +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 10 } } */
>>> +/* { dg-final { scan-assembler-times {\mxxsel\M} 7 } } */
>>> +
>>>
>>
> 

-- 
Thanks,
Xionghu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Ping ^ 3: [PATCH 3/4] rs6000: Enable vec_insert for P8 with rs6000_expand_vector_set_var_p8
  2020-12-03 14:16     ` Xionghu Luo
  2020-12-10  3:32       ` Xionghu Luo
@ 2020-12-23  2:18       ` Xionghu Luo
  2021-01-15  2:48         ` Ping ^ 4: " Xionghu Luo
  1 sibling, 1 reply; 21+ messages in thread
From: Xionghu Luo @ 2020-12-23  2:18 UTC (permalink / raw)
  To: gcc-patches; +Cc: wschmidt, segher, dje.gcc, linkw

Ping^3 for stage 3.

And this followed patch:
[PATCH 4/4] rs6000: Update testcases' instruction count.

Thanks:)


On 2020/12/3 22:16, Xionghu Luo via Gcc-patches wrote:
> Ping. Thanks.
> 
> 
> On 2020/11/27 09:04, Xionghu Luo via Gcc-patches wrote:
>> Hi Segher,
>> Thanks for the approval of [PATCH 1/4] and [PATCH 2/4], what's your
>> opinion of this [PATCH 3/4] for P8, please?  xxinsertw only exists since
>> v3.0, so we had to implement by another way.
>>
>>
>> Xionghu
>>
>>
>> On 2020/10/10 16:08, Xionghu Luo wrote:
>>> gcc/ChangeLog:
>>>
>>> 2020-10-10  Xionghu Luo  <luoxhu@linux.ibm.com>
>>>
>>>     * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
>>>     Generate ARRAY_REF(VIEW_CONVERT_EXPR) for P8 and later
>>>     platforms.
>>>     * config/rs6000/rs6000.c (rs6000_expand_vector_set_var): Update
>>>     to call different path for P8 and P9.
>>>     (rs6000_expand_vector_set_var_p9): New function.
>>>     (rs6000_expand_vector_set_var_p8): New function.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2020-10-10  Xionghu Luo  <luoxhu@linux.ibm.com>
>>>
>>>     * gcc.target/powerpc/pr79251.p8.c: New test.
>>> ---
>>>    gcc/config/rs6000/rs6000-c.c                  |  27 +++-
>>>    gcc/config/rs6000/rs6000.c                    | 117 
>>> +++++++++++++++++-
>>>    gcc/testsuite/gcc.target/powerpc/pr79251.p8.c |  17 +++
>>>    3 files changed, 155 insertions(+), 6 deletions(-)
>>>    create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251.p8.c
>>>
>>> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
>>> index 5551a21d738..4bea8001ec6 100644
>>> --- a/gcc/config/rs6000/rs6000-c.c
>>> +++ b/gcc/config/rs6000/rs6000-c.c
>>> @@ -1599,10 +1599,29 @@ altivec_resolve_overloaded_builtin 
>>> (location_t loc, tree fndecl,
>>>          SET_EXPR_LOCATION (stmt, loc);
>>>          stmt = build1 (COMPOUND_LITERAL_EXPR, arg1_type, stmt);
>>>        }
>>> -      stmt = build_array_ref (loc, stmt, arg2);
>>> -      stmt = fold_build2 (MODIFY_EXPR, TREE_TYPE (arg0), stmt,
>>> -              convert (TREE_TYPE (stmt), arg0));
>>> -      stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
>>> +
>>> +      if (TARGET_P8_VECTOR)
>>> +    {
>>> +      stmt = build_array_ref (loc, stmt, arg2);
>>> +      stmt = fold_build2 (MODIFY_EXPR, TREE_TYPE (arg0), stmt,
>>> +                  convert (TREE_TYPE (stmt), arg0));
>>> +      stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
>>> +    }
>>> +      else
>>> +    {
>>> +      tree arg1_inner_type;
>>> +      tree innerptrtype;
>>> +      arg1_inner_type = TREE_TYPE (arg1_type);
>>> +      innerptrtype = build_pointer_type (arg1_inner_type);
>>> +
>>> +      stmt = build_unary_op (loc, ADDR_EXPR, stmt, 0);
>>> +      stmt = convert (innerptrtype, stmt);
>>> +      stmt = build_binary_op (loc, PLUS_EXPR, stmt, arg2, 1);
>>> +      stmt = build_indirect_ref (loc, stmt, RO_NULL);
>>> +      stmt = build2 (MODIFY_EXPR, TREE_TYPE (stmt), stmt,
>>> +             convert (TREE_TYPE (stmt), arg0));
>>> +      stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
>>> +    }
>>>          return stmt;
>>>        }
>>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>>> index 96f76c7a74c..33ca839cb28 100644
>>> --- a/gcc/config/rs6000/rs6000.c
>>> +++ b/gcc/config/rs6000/rs6000.c
>>> @@ -6806,10 +6806,10 @@ rs6000_expand_vector_set (rtx target, rtx 
>>> val, rtx elt_rtx)
>>>    }
>>>    /* Insert VAL into IDX of TARGET, VAL size is same of the vector 
>>> element, IDX
>>> -   is variable and also counts by vector element size.  */
>>> +   is variable and also counts by vector element size for p9 and 
>>> above.  */
>>>    void
>>> -rs6000_expand_vector_set_var (rtx target, rtx val, rtx idx)
>>> +rs6000_expand_vector_set_var_p9 (rtx target, rtx val, rtx idx)
>>>    {
>>>      machine_mode mode = GET_MODE (target);
>>> @@ -6852,6 +6852,119 @@ rs6000_expand_vector_set_var (rtx target, rtx 
>>> val, rtx idx)
>>>      emit_insn (perml);
>>>    }
>>> +/* Insert VAL into IDX of TARGET, VAL size is same of the vector 
>>> element, IDX
>>> +   is variable and also counts by vector element size for p8.  */
>>> +
>>> +void
>>> +rs6000_expand_vector_set_var_p8 (rtx target, rtx val, rtx idx)
>>> +{
>>> +  machine_mode mode = GET_MODE (target);
>>> +
>>> +  gcc_assert (VECTOR_MEM_VSX_P (mode) && !CONST_INT_P (idx));
>>> +
>>> +  gcc_assert (GET_MODE (idx) == E_SImode);
>>> +
>>> +  machine_mode inner_mode = GET_MODE (val);
>>> +  HOST_WIDE_INT mode_mask = GET_MODE_MASK (inner_mode);
>>> +
>>> +  rtx tmp = gen_reg_rtx (GET_MODE (idx));
>>> +  int width = GET_MODE_SIZE (inner_mode);
>>> +
>>> +  gcc_assert (width >= 1 && width <= 4);
>>> +
>>> +  if (!BYTES_BIG_ENDIAN)
>>> +    {
>>> +      /*  idx = idx * width.  */
>>> +      emit_insn (gen_mulsi3 (tmp, idx, GEN_INT (width)));
>>> +      /*  idx = idx + 8.  */
>>> +      emit_insn (gen_addsi3 (tmp, tmp, GEN_INT (8)));
>>> +    }
>>> +  else
>>> +    {
>>> +      emit_insn (gen_mulsi3 (tmp, idx, GEN_INT (width)));
>>> +      emit_insn (gen_subsi3 (tmp, GEN_INT (24 - width), tmp));
>>> +    }
>>> +
>>> +  /*  lxv vs33, mask.
>>> +      DImode: 0xffffffffffffffff0000000000000000
>>> +      SImode: 0x00000000ffffffff0000000000000000
>>> +      HImode: 0x000000000000ffff0000000000000000.
>>> +      QImode: 0x00000000000000ff0000000000000000.  */
>>> +  rtx mask = gen_reg_rtx (V16QImode);
>>> +  rtx mask_v2di = gen_reg_rtx (V2DImode);
>>> +  rtvec v = rtvec_alloc (2);
>>> +  if (!BYTES_BIG_ENDIAN)
>>> +    {
>>> +      RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (DImode, 0);
>>> +      RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (DImode, mode_mask);
>>> +    }
>>> +  else
>>> +    {
>>> +      RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (DImode, mode_mask);
>>> +      RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (DImode, 0);
>>> +    }
>>> +  emit_insn (gen_vec_initv2didi (mask_v2di, gen_rtx_PARALLEL 
>>> (V2DImode, v)));
>>> +  rtx sub_mask = simplify_gen_subreg (V16QImode, mask_v2di, 
>>> V2DImode, 0);
>>> +  emit_insn (gen_rtx_SET (mask, sub_mask));
>>> +
>>> +  /*  mtvsrd[wz] f0,tmp_val.  */
>>> +  rtx tmp_val = gen_reg_rtx (SImode);
>>> +  if (inner_mode == E_SFmode)
>>> +    emit_insn (gen_movsi_from_sf (tmp_val, val));
>>> +  else
>>> +    tmp_val = force_reg (SImode, val);
>>> +
>>> +  rtx val_v16qi = gen_reg_rtx (V16QImode);
>>> +  rtx val_v2di = gen_reg_rtx (V2DImode);
>>> +  rtvec vec_val = rtvec_alloc (2);
>>> +  if (!BYTES_BIG_ENDIAN)
>>> +  {
>>> +    RTVEC_ELT (vec_val, 0) = gen_rtx_CONST_INT (DImode, 0);
>>> +    RTVEC_ELT (vec_val, 1) = tmp_val;
>>> +  }
>>> +  else
>>> +  {
>>> +    RTVEC_ELT (vec_val, 0) = tmp_val;
>>> +    RTVEC_ELT (vec_val, 1) = gen_rtx_CONST_INT (DImode, 0);
>>> +  }
>>> +  emit_insn (
>>> +    gen_vec_initv2didi (val_v2di, gen_rtx_PARALLEL (V2DImode, 
>>> vec_val)));
>>> +  rtx sub_val = simplify_gen_subreg (V16QImode, val_v2di, V2DImode, 0);
>>> +  emit_insn (gen_rtx_SET (val_v16qi, sub_val));
>>> +
>>> +  /*  lvsl    13,0,idx.  */
>>> +  tmp = convert_modes (DImode, SImode, tmp, 1);
>>> +  rtx pcv = gen_reg_rtx (V16QImode);
>>> +  emit_insn (gen_altivec_lvsl_reg (pcv, tmp));
>>> +
>>> +  /*  vperm 1,1,1,13.  */
>>> +  /*  vperm 0,0,0,13.  */
>>> +  rtx val_perm = gen_reg_rtx (V16QImode);
>>> +  rtx mask_perm = gen_reg_rtx (V16QImode);
>>> +  emit_insn (gen_altivec_vperm_v8hiv16qi (val_perm, val_v16qi, 
>>> val_v16qi, pcv));
>>> +  emit_insn (gen_altivec_vperm_v8hiv16qi (mask_perm, mask, mask, pcv));
>>> +
>>> +  rtx target_v16qi = simplify_gen_subreg (V16QImode, target, mode, 0);
>>> +
>>> +  /*  xxsel 34,34,32,33.  */
>>> +  emit_insn (
>>> +    gen_vector_select_v16qi (target_v16qi, target_v16qi, val_perm, 
>>> mask_perm));
>>> +}
>>> +
>>> +/* Insert VAL into IDX of TARGET, VAL size is same of the vector 
>>> element, IDX
>>> +   is variable and also counts by vector element size.  */
>>> +
>>> +void
>>> +rs6000_expand_vector_set_var (rtx target, rtx val, rtx idx)
>>> +{
>>> +  machine_mode mode = GET_MODE (target);
>>> +  machine_mode inner_mode = GET_MODE_INNER (mode);
>>> +  if (TARGET_P9_VECTOR || GET_MODE_SIZE (inner_mode) == 8)
>>> +    rs6000_expand_vector_set_var_p9 (target, val, idx);
>>> +  else
>>> +    rs6000_expand_vector_set_var_p8 (target, val, idx);
>>> +}
>>> +
>>>    /* Extract field ELT from VEC into TARGET.  */
>>>    void
>>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr79251.p8.c 
>>> b/gcc/testsuite/gcc.target/powerpc/pr79251.p8.c
>>> new file mode 100644
>>> index 00000000000..06da47b7758
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/powerpc/pr79251.p8.c
>>> @@ -0,0 +1,17 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-require-effective-target powerpc_p8vector_ok } */
>>> +/* { dg-options "-O2 -mdejagnu-cpu=power8 -maltivec" } */
>>> +
>>> +#include <stddef.h>
>>> +#include <altivec.h>
>>> +#include "pr79251.h"
>>> +
>>> +TEST_VEC_INSERT_ALL (test)
>>> +
>>> +/* { dg-final { scan-assembler-not {\mstxw\M} } } */
>>> +/* { dg-final { scan-assembler-times {\mlvsl\M} 10 } } */
>>> +/* { dg-final { scan-assembler-times {\mlvsr\M} 3 } } */
>>> +/* { dg-final { scan-assembler-times {\mvperm\M} 20 } } */
>>> +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 10 } } */
>>> +/* { dg-final { scan-assembler-times {\mxxsel\M} 7 } } */
>>> +
>>>
>>
> 

-- 
Thanks,
Xionghu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Ping ^ 4: [PATCH 3/4] rs6000: Enable vec_insert for P8 with rs6000_expand_vector_set_var_p8
  2020-12-23  2:18       ` Ping ^ 3: " Xionghu Luo
@ 2021-01-15  2:48         ` Xionghu Luo
  0 siblings, 0 replies; 21+ messages in thread
From: Xionghu Luo @ 2021-01-15  2:48 UTC (permalink / raw)
  To: gcc-patches; +Cc: wschmidt, dje.gcc, segher, linkw

Ping^4, thanks.


On 2020/12/23 10:18, Xionghu Luo via Gcc-patches wrote:
> Ping^3 for stage 3.
> 
> And this followed patch:
> [PATCH 4/4] rs6000: Update testcases' instruction count.
> 
> Thanks:)
> 
> 
> On 2020/12/3 22:16, Xionghu Luo via Gcc-patches wrote:
>> Ping. Thanks.
>>
>>
>> On 2020/11/27 09:04, Xionghu Luo via Gcc-patches wrote:
>>> Hi Segher,
>>> Thanks for the approval of [PATCH 1/4] and [PATCH 2/4], what's your
>>> opinion of this [PATCH 3/4] for P8, please?  xxinsertw only exists since
>>> v3.0, so we had to implement by another way.
>>>
>>>
>>> Xionghu
>>>
>>>
>>> On 2020/10/10 16:08, Xionghu Luo wrote:
>>>> gcc/ChangeLog:
>>>>
>>>> 2020-10-10  Xionghu Luo  <luoxhu@linux.ibm.com>
>>>>
>>>>     * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
>>>>     Generate ARRAY_REF(VIEW_CONVERT_EXPR) for P8 and later
>>>>     platforms.
>>>>     * config/rs6000/rs6000.c (rs6000_expand_vector_set_var): Update
>>>>     to call different path for P8 and P9.
>>>>     (rs6000_expand_vector_set_var_p9): New function.
>>>>     (rs6000_expand_vector_set_var_p8): New function.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>>
>>>> 2020-10-10  Xionghu Luo  <luoxhu@linux.ibm.com>
>>>>
>>>>     * gcc.target/powerpc/pr79251.p8.c: New test.
>>>> ---
>>>>    gcc/config/rs6000/rs6000-c.c                  |  27 +++-
>>>>    gcc/config/rs6000/rs6000.c                    | 117 
>>>> +++++++++++++++++-
>>>>    gcc/testsuite/gcc.target/powerpc/pr79251.p8.c |  17 +++
>>>>    3 files changed, 155 insertions(+), 6 deletions(-)
>>>>    create mode 100644 gcc/testsuite/gcc.target/powerpc/pr79251.p8.c
>>>>
>>>> diff --git a/gcc/config/rs6000/rs6000-c.c 
>>>> b/gcc/config/rs6000/rs6000-c.c
>>>> index 5551a21d738..4bea8001ec6 100644
>>>> --- a/gcc/config/rs6000/rs6000-c.c
>>>> +++ b/gcc/config/rs6000/rs6000-c.c
>>>> @@ -1599,10 +1599,29 @@ altivec_resolve_overloaded_builtin 
>>>> (location_t loc, tree fndecl,
>>>>          SET_EXPR_LOCATION (stmt, loc);
>>>>          stmt = build1 (COMPOUND_LITERAL_EXPR, arg1_type, stmt);
>>>>        }
>>>> -      stmt = build_array_ref (loc, stmt, arg2);
>>>> -      stmt = fold_build2 (MODIFY_EXPR, TREE_TYPE (arg0), stmt,
>>>> -              convert (TREE_TYPE (stmt), arg0));
>>>> -      stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
>>>> +
>>>> +      if (TARGET_P8_VECTOR)
>>>> +    {
>>>> +      stmt = build_array_ref (loc, stmt, arg2);
>>>> +      stmt = fold_build2 (MODIFY_EXPR, TREE_TYPE (arg0), stmt,
>>>> +                  convert (TREE_TYPE (stmt), arg0));
>>>> +      stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
>>>> +    }
>>>> +      else
>>>> +    {
>>>> +      tree arg1_inner_type;
>>>> +      tree innerptrtype;
>>>> +      arg1_inner_type = TREE_TYPE (arg1_type);
>>>> +      innerptrtype = build_pointer_type (arg1_inner_type);
>>>> +
>>>> +      stmt = build_unary_op (loc, ADDR_EXPR, stmt, 0);
>>>> +      stmt = convert (innerptrtype, stmt);
>>>> +      stmt = build_binary_op (loc, PLUS_EXPR, stmt, arg2, 1);
>>>> +      stmt = build_indirect_ref (loc, stmt, RO_NULL);
>>>> +      stmt = build2 (MODIFY_EXPR, TREE_TYPE (stmt), stmt,
>>>> +             convert (TREE_TYPE (stmt), arg0));
>>>> +      stmt = build2 (COMPOUND_EXPR, arg1_type, stmt, decl);
>>>> +    }
>>>>          return stmt;
>>>>        }
>>>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>>>> index 96f76c7a74c..33ca839cb28 100644
>>>> --- a/gcc/config/rs6000/rs6000.c
>>>> +++ b/gcc/config/rs6000/rs6000.c
>>>> @@ -6806,10 +6806,10 @@ rs6000_expand_vector_set (rtx target, rtx 
>>>> val, rtx elt_rtx)
>>>>    }
>>>>    /* Insert VAL into IDX of TARGET, VAL size is same of the vector 
>>>> element, IDX
>>>> -   is variable and also counts by vector element size.  */
>>>> +   is variable and also counts by vector element size for p9 and 
>>>> above.  */
>>>>    void
>>>> -rs6000_expand_vector_set_var (rtx target, rtx val, rtx idx)
>>>> +rs6000_expand_vector_set_var_p9 (rtx target, rtx val, rtx idx)
>>>>    {
>>>>      machine_mode mode = GET_MODE (target);
>>>> @@ -6852,6 +6852,119 @@ rs6000_expand_vector_set_var (rtx target, 
>>>> rtx val, rtx idx)
>>>>      emit_insn (perml);
>>>>    }
>>>> +/* Insert VAL into IDX of TARGET, VAL size is same of the vector 
>>>> element, IDX
>>>> +   is variable and also counts by vector element size for p8.  */
>>>> +
>>>> +void
>>>> +rs6000_expand_vector_set_var_p8 (rtx target, rtx val, rtx idx)
>>>> +{
>>>> +  machine_mode mode = GET_MODE (target);
>>>> +
>>>> +  gcc_assert (VECTOR_MEM_VSX_P (mode) && !CONST_INT_P (idx));
>>>> +
>>>> +  gcc_assert (GET_MODE (idx) == E_SImode);
>>>> +
>>>> +  machine_mode inner_mode = GET_MODE (val);
>>>> +  HOST_WIDE_INT mode_mask = GET_MODE_MASK (inner_mode);
>>>> +
>>>> +  rtx tmp = gen_reg_rtx (GET_MODE (idx));
>>>> +  int width = GET_MODE_SIZE (inner_mode);
>>>> +
>>>> +  gcc_assert (width >= 1 && width <= 4);
>>>> +
>>>> +  if (!BYTES_BIG_ENDIAN)
>>>> +    {
>>>> +      /*  idx = idx * width.  */
>>>> +      emit_insn (gen_mulsi3 (tmp, idx, GEN_INT (width)));
>>>> +      /*  idx = idx + 8.  */
>>>> +      emit_insn (gen_addsi3 (tmp, tmp, GEN_INT (8)));
>>>> +    }
>>>> +  else
>>>> +    {
>>>> +      emit_insn (gen_mulsi3 (tmp, idx, GEN_INT (width)));
>>>> +      emit_insn (gen_subsi3 (tmp, GEN_INT (24 - width), tmp));
>>>> +    }
>>>> +
>>>> +  /*  lxv vs33, mask.
>>>> +      DImode: 0xffffffffffffffff0000000000000000
>>>> +      SImode: 0x00000000ffffffff0000000000000000
>>>> +      HImode: 0x000000000000ffff0000000000000000.
>>>> +      QImode: 0x00000000000000ff0000000000000000.  */
>>>> +  rtx mask = gen_reg_rtx (V16QImode);
>>>> +  rtx mask_v2di = gen_reg_rtx (V2DImode);
>>>> +  rtvec v = rtvec_alloc (2);
>>>> +  if (!BYTES_BIG_ENDIAN)
>>>> +    {
>>>> +      RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (DImode, 0);
>>>> +      RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (DImode, mode_mask);
>>>> +    }
>>>> +  else
>>>> +    {
>>>> +      RTVEC_ELT (v, 0) = gen_rtx_CONST_INT (DImode, mode_mask);
>>>> +      RTVEC_ELT (v, 1) = gen_rtx_CONST_INT (DImode, 0);
>>>> +    }
>>>> +  emit_insn (gen_vec_initv2didi (mask_v2di, gen_rtx_PARALLEL 
>>>> (V2DImode, v)));
>>>> +  rtx sub_mask = simplify_gen_subreg (V16QImode, mask_v2di, 
>>>> V2DImode, 0);
>>>> +  emit_insn (gen_rtx_SET (mask, sub_mask));
>>>> +
>>>> +  /*  mtvsrd[wz] f0,tmp_val.  */
>>>> +  rtx tmp_val = gen_reg_rtx (SImode);
>>>> +  if (inner_mode == E_SFmode)
>>>> +    emit_insn (gen_movsi_from_sf (tmp_val, val));
>>>> +  else
>>>> +    tmp_val = force_reg (SImode, val);
>>>> +
>>>> +  rtx val_v16qi = gen_reg_rtx (V16QImode);
>>>> +  rtx val_v2di = gen_reg_rtx (V2DImode);
>>>> +  rtvec vec_val = rtvec_alloc (2);
>>>> +  if (!BYTES_BIG_ENDIAN)
>>>> +  {
>>>> +    RTVEC_ELT (vec_val, 0) = gen_rtx_CONST_INT (DImode, 0);
>>>> +    RTVEC_ELT (vec_val, 1) = tmp_val;
>>>> +  }
>>>> +  else
>>>> +  {
>>>> +    RTVEC_ELT (vec_val, 0) = tmp_val;
>>>> +    RTVEC_ELT (vec_val, 1) = gen_rtx_CONST_INT (DImode, 0);
>>>> +  }
>>>> +  emit_insn (
>>>> +    gen_vec_initv2didi (val_v2di, gen_rtx_PARALLEL (V2DImode, 
>>>> vec_val)));
>>>> +  rtx sub_val = simplify_gen_subreg (V16QImode, val_v2di, V2DImode, 
>>>> 0);
>>>> +  emit_insn (gen_rtx_SET (val_v16qi, sub_val));
>>>> +
>>>> +  /*  lvsl    13,0,idx.  */
>>>> +  tmp = convert_modes (DImode, SImode, tmp, 1);
>>>> +  rtx pcv = gen_reg_rtx (V16QImode);
>>>> +  emit_insn (gen_altivec_lvsl_reg (pcv, tmp));
>>>> +
>>>> +  /*  vperm 1,1,1,13.  */
>>>> +  /*  vperm 0,0,0,13.  */
>>>> +  rtx val_perm = gen_reg_rtx (V16QImode);
>>>> +  rtx mask_perm = gen_reg_rtx (V16QImode);
>>>> +  emit_insn (gen_altivec_vperm_v8hiv16qi (val_perm, val_v16qi, 
>>>> val_v16qi, pcv));
>>>> +  emit_insn (gen_altivec_vperm_v8hiv16qi (mask_perm, mask, mask, 
>>>> pcv));
>>>> +
>>>> +  rtx target_v16qi = simplify_gen_subreg (V16QImode, target, mode, 0);
>>>> +
>>>> +  /*  xxsel 34,34,32,33.  */
>>>> +  emit_insn (
>>>> +    gen_vector_select_v16qi (target_v16qi, target_v16qi, val_perm, 
>>>> mask_perm));
>>>> +}
>>>> +
>>>> +/* Insert VAL into IDX of TARGET, VAL size is same of the vector 
>>>> element, IDX
>>>> +   is variable and also counts by vector element size.  */
>>>> +
>>>> +void
>>>> +rs6000_expand_vector_set_var (rtx target, rtx val, rtx idx)
>>>> +{
>>>> +  machine_mode mode = GET_MODE (target);
>>>> +  machine_mode inner_mode = GET_MODE_INNER (mode);
>>>> +  if (TARGET_P9_VECTOR || GET_MODE_SIZE (inner_mode) == 8)
>>>> +    rs6000_expand_vector_set_var_p9 (target, val, idx);
>>>> +  else
>>>> +    rs6000_expand_vector_set_var_p8 (target, val, idx);
>>>> +}
>>>> +
>>>>    /* Extract field ELT from VEC into TARGET.  */
>>>>    void
>>>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr79251.p8.c 
>>>> b/gcc/testsuite/gcc.target/powerpc/pr79251.p8.c
>>>> new file mode 100644
>>>> index 00000000000..06da47b7758
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.target/powerpc/pr79251.p8.c
>>>> @@ -0,0 +1,17 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-require-effective-target powerpc_p8vector_ok } */
>>>> +/* { dg-options "-O2 -mdejagnu-cpu=power8 -maltivec" } */
>>>> +
>>>> +#include <stddef.h>
>>>> +#include <altivec.h>
>>>> +#include "pr79251.h"
>>>> +
>>>> +TEST_VEC_INSERT_ALL (test)
>>>> +
>>>> +/* { dg-final { scan-assembler-not {\mstxw\M} } } */
>>>> +/* { dg-final { scan-assembler-times {\mlvsl\M} 10 } } */
>>>> +/* { dg-final { scan-assembler-times {\mlvsr\M} 3 } } */
>>>> +/* { dg-final { scan-assembler-times {\mvperm\M} 20 } } */
>>>> +/* { dg-final { scan-assembler-times {\mxxpermdi\M} 10 } } */
>>>> +/* { dg-final { scan-assembler-times {\mxxsel\M} 7 } } */
>>>> +
>>>>
>>>
>>
> 

-- 
Thanks,
Xionghu

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/4] rs6000: Enable vec_insert for P8 with rs6000_expand_vector_set_var_p8
  2020-10-10  8:08 ` [PATCH 3/4] rs6000: Enable vec_insert for P8 with rs6000_expand_vector_set_var_p8 Xionghu Luo
  2020-11-27  1:04   ` Xionghu Luo
@ 2021-01-21 23:48   ` Segher Boessenkool
  2021-01-22 20:08     ` David Edelsohn
  1 sibling, 1 reply; 21+ messages in thread
From: Segher Boessenkool @ 2021-01-21 23:48 UTC (permalink / raw)
  To: Xionghu Luo; +Cc: gcc-patches, dje.gcc, wschmidt, guojiufu, linkw

Hi!

You never committed 2/4?  That makes it harder to review this one :-)

On Sat, Oct 10, 2020 at 03:08:24AM -0500, Xionghu Luo wrote:
> gcc/ChangeLog:
> 
> 2020-10-10  Xionghu Luo  <luoxhu@linux.ibm.com>
> 
> 	* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
> 	Generate ARRAY_REF(VIEW_CONVERT_EXPR) for P8 and later
> 	platforms.
> 	* config/rs6000/rs6000.c (rs6000_expand_vector_set_var): Update
> 	to call different path for P8 and P9.
> 	(rs6000_expand_vector_set_var_p9): New function.
> 	(rs6000_expand_vector_set_var_p8): New function.
> 
> gcc/testsuite/ChangeLog:
> 
> 2020-10-10  Xionghu Luo  <luoxhu@linux.ibm.com>
> 
> 	* gcc.target/powerpc/pr79251.p8.c: New test.

If testing on P9 LE and P7 BE (32-bit and 64-bit) worked, this is okay
for trunk.  Thanks!

(Let me know if you need help testing.)


Segher

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] rs6000: Update testcases' instruction count
  2020-10-10  8:08 ` [PATCH 4/4] rs6000: Update testcases' instruction count Xionghu Luo
@ 2021-01-22  0:17   ` Segher Boessenkool
  2021-01-22 20:02     ` David Edelsohn
  0 siblings, 1 reply; 21+ messages in thread
From: Segher Boessenkool @ 2021-01-22  0:17 UTC (permalink / raw)
  To: Xionghu Luo; +Cc: gcc-patches, dje.gcc, wschmidt, guojiufu, linkw

Hi!

On Sat, Oct 10, 2020 at 03:08:25AM -0500, Xionghu Luo wrote:
> 2020-10-10  Xionghu Luo  <luoxhu@linux.ibm.com>
> 
> 	* gcc.target/powerpc/fold-vec-insert-char-p8.c: Adjust
> 	instruction counts.
> 	* gcc.target/powerpc/fold-vec-insert-char-p9.c: Likewise.
> 	* gcc.target/powerpc/fold-vec-insert-double.c: Likewise.
> 	* gcc.target/powerpc/fold-vec-insert-float-p8.c: Likewise.
> 	* gcc.target/powerpc/fold-vec-insert-float-p9.c: Likewise.
> 	* gcc.target/powerpc/fold-vec-insert-int-p8.c: Likewise.
> 	* gcc.target/powerpc/fold-vec-insert-int-p9.c: Likewise.
> 	* gcc.target/powerpc/fold-vec-insert-longlong.c: Likewise.
> 	* gcc.target/powerpc/fold-vec-insert-short-p8.c: Likewise.
> 	* gcc.target/powerpc/fold-vec-insert-short-p9.c: Likewise.
> 	* gcc.target/powerpc/vsx-builtin-7.c: Likewise.

Looks good.  I assume you tested all those changed counts are actual
wanted code?  Okay for trunk if so.  Thanks!


Segher

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] rs6000: Update testcases' instruction count
  2021-01-22  0:17   ` Segher Boessenkool
@ 2021-01-22 20:02     ` David Edelsohn
  2021-01-23  1:01       ` Segher Boessenkool
  0 siblings, 1 reply; 21+ messages in thread
From: David Edelsohn @ 2021-01-22 20:02 UTC (permalink / raw)
  To: Segher Boessenkool, Xionghu Luo
  Cc: GCC Patches, Bill Schmidt, guojiufu, linkw

All of these testcases no fail on AIX.  This was not tested properly.
Please fix.

Thanks, David

On Thu, Jan 21, 2021 at 7:19 PM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> Hi!
>
> On Sat, Oct 10, 2020 at 03:08:25AM -0500, Xionghu Luo wrote:
> > 2020-10-10  Xionghu Luo  <luoxhu@linux.ibm.com>
> >
> >       * gcc.target/powerpc/fold-vec-insert-char-p8.c: Adjust
> >       instruction counts.
> >       * gcc.target/powerpc/fold-vec-insert-char-p9.c: Likewise.
> >       * gcc.target/powerpc/fold-vec-insert-double.c: Likewise.
> >       * gcc.target/powerpc/fold-vec-insert-float-p8.c: Likewise.
> >       * gcc.target/powerpc/fold-vec-insert-float-p9.c: Likewise.
> >       * gcc.target/powerpc/fold-vec-insert-int-p8.c: Likewise.
> >       * gcc.target/powerpc/fold-vec-insert-int-p9.c: Likewise.
> >       * gcc.target/powerpc/fold-vec-insert-longlong.c: Likewise.
> >       * gcc.target/powerpc/fold-vec-insert-short-p8.c: Likewise.
> >       * gcc.target/powerpc/fold-vec-insert-short-p9.c: Likewise.
> >       * gcc.target/powerpc/vsx-builtin-7.c: Likewise.
>
> Looks good.  I assume you tested all those changed counts are actual
> wanted code?  Okay for trunk if so.  Thanks!
>
>
> Segher

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/4] rs6000: Enable vec_insert for P8 with rs6000_expand_vector_set_var_p8
  2021-01-21 23:48   ` Segher Boessenkool
@ 2021-01-22 20:08     ` David Edelsohn
  0 siblings, 0 replies; 21+ messages in thread
From: David Edelsohn @ 2021-01-22 20:08 UTC (permalink / raw)
  To: Segher Boessenkool, Xionghu Luo
  Cc: GCC Patches, Bill Schmidt, guojiufu, linkw

On Thu, Jan 21, 2021 at 6:51 PM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> Hi!
>
> You never committed 2/4?  That makes it harder to review this one :-)
>
> On Sat, Oct 10, 2020 at 03:08:24AM -0500, Xionghu Luo wrote:
> > gcc/ChangeLog:
> >
> > 2020-10-10  Xionghu Luo  <luoxhu@linux.ibm.com>
> >
> >       * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin):
> >       Generate ARRAY_REF(VIEW_CONVERT_EXPR) for P8 and later
> >       platforms.
> >       * config/rs6000/rs6000.c (rs6000_expand_vector_set_var): Update
> >       to call different path for P8 and P9.
> >       (rs6000_expand_vector_set_var_p9): New function.
> >       (rs6000_expand_vector_set_var_p8): New function.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2020-10-10  Xionghu Luo  <luoxhu@linux.ibm.com>
> >
> >       * gcc.target/powerpc/pr79251.p8.c: New test.
>
> If testing on P9 LE and P7 BE (32-bit and 64-bit) worked, this is okay
> for trunk.  Thanks!

This testcase ICEs on AIX.  Please fix.  This was not tested properly.

The new pattern does not have matching target conditions for patterns
on which it relies.

Thanks, David

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] rs6000: Update testcases' instruction count
  2021-01-22 20:02     ` David Edelsohn
@ 2021-01-23  1:01       ` Segher Boessenkool
  2021-01-23  1:24         ` David Edelsohn
  0 siblings, 1 reply; 21+ messages in thread
From: Segher Boessenkool @ 2021-01-23  1:01 UTC (permalink / raw)
  To: David Edelsohn; +Cc: Xionghu Luo, GCC Patches, Bill Schmidt, guojiufu, linkw

On Fri, Jan 22, 2021 at 03:02:47PM -0500, David Edelsohn wrote:
> All of these testcases no fail on AIX.  This was not tested properly.
> Please fix.

They fail on -m32 Linux as well: all failures are an unexpected count
of addi insns.  This may be related to the LRA regression we have (just
based on it being addi, nothing else; this is a shot in the dark).  It
could of course be something different just as well :-)


Segher


> > >       * gcc.target/powerpc/fold-vec-insert-char-p8.c: Adjust
> > >       instruction counts.
> > >       * gcc.target/powerpc/fold-vec-insert-char-p9.c: Likewise.
> > >       * gcc.target/powerpc/fold-vec-insert-double.c: Likewise.
> > >       * gcc.target/powerpc/fold-vec-insert-float-p8.c: Likewise.
> > >       * gcc.target/powerpc/fold-vec-insert-float-p9.c: Likewise.
> > >       * gcc.target/powerpc/fold-vec-insert-int-p8.c: Likewise.
> > >       * gcc.target/powerpc/fold-vec-insert-int-p9.c: Likewise.
> > >       * gcc.target/powerpc/fold-vec-insert-longlong.c: Likewise.
> > >       * gcc.target/powerpc/fold-vec-insert-short-p8.c: Likewise.
> > >       * gcc.target/powerpc/fold-vec-insert-short-p9.c: Likewise.
> > >       * gcc.target/powerpc/vsx-builtin-7.c: Likewise.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 4/4] rs6000: Update testcases' instruction count
  2021-01-23  1:01       ` Segher Boessenkool
@ 2021-01-23  1:24         ` David Edelsohn
  0 siblings, 0 replies; 21+ messages in thread
From: David Edelsohn @ 2021-01-23  1:24 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: Xionghu Luo, GCC Patches, Bill Schmidt, guojiufu, linkw

Those are the fold-vec-extract-* changes.  And they fix a regression
on AIX.  Another difference to detangle.

I'm referring to the new fold-vec-insert-* failures.  I fixed the p9
failures, but some of the tests now ICE when targeting P8.

Thanks, David

On Fri, Jan 22, 2021 at 8:03 PM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On Fri, Jan 22, 2021 at 03:02:47PM -0500, David Edelsohn wrote:
> > All of these testcases no fail on AIX.  This was not tested properly.
> > Please fix.
>
> They fail on -m32 Linux as well: all failures are an unexpected count
> of addi insns.  This may be related to the LRA regression we have (just
> based on it being addi, nothing else; this is a shot in the dark).  It
> could of course be something different just as well :-)
>
>
> Segher
>
>
> > > >       * gcc.target/powerpc/fold-vec-insert-char-p8.c: Adjust
> > > >       instruction counts.
> > > >       * gcc.target/powerpc/fold-vec-insert-char-p9.c: Likewise.
> > > >       * gcc.target/powerpc/fold-vec-insert-double.c: Likewise.
> > > >       * gcc.target/powerpc/fold-vec-insert-float-p8.c: Likewise.
> > > >       * gcc.target/powerpc/fold-vec-insert-float-p9.c: Likewise.
> > > >       * gcc.target/powerpc/fold-vec-insert-int-p8.c: Likewise.
> > > >       * gcc.target/powerpc/fold-vec-insert-int-p9.c: Likewise.
> > > >       * gcc.target/powerpc/fold-vec-insert-longlong.c: Likewise.
> > > >       * gcc.target/powerpc/fold-vec-insert-short-p8.c: Likewise.
> > > >       * gcc.target/powerpc/fold-vec-insert-short-p9.c: Likewise.
> > > >       * gcc.target/powerpc/vsx-builtin-7.c: Likewise.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2021-01-23  1:25 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-10  8:08 [PATCH 0/4] rs6000: Enable variable vec_insert with IFN VEC_SET Xionghu Luo
2020-10-10  8:08 ` [PATCH 1/4] rs6000: Change rs6000_expand_vector_set param Xionghu Luo
2020-11-24 19:44   ` Segher Boessenkool
2020-10-10  8:08 ` [PATCH 2/4] rs6000: Support variable insert and Expand vec_insert in expander [PR79251] Xionghu Luo
2020-11-24 22:37   ` Segher Boessenkool
2020-10-10  8:08 ` [PATCH 3/4] rs6000: Enable vec_insert for P8 with rs6000_expand_vector_set_var_p8 Xionghu Luo
2020-11-27  1:04   ` Xionghu Luo
2020-12-03 14:16     ` Xionghu Luo
2020-12-10  3:32       ` Xionghu Luo
2020-12-23  2:18       ` Ping ^ 3: " Xionghu Luo
2021-01-15  2:48         ` Ping ^ 4: " Xionghu Luo
2021-01-21 23:48   ` Segher Boessenkool
2021-01-22 20:08     ` David Edelsohn
2020-10-10  8:08 ` [PATCH 4/4] rs6000: Update testcases' instruction count Xionghu Luo
2021-01-22  0:17   ` Segher Boessenkool
2021-01-22 20:02     ` David Edelsohn
2021-01-23  1:01       ` Segher Boessenkool
2021-01-23  1:24         ` David Edelsohn
2020-11-05  1:34 ` Ping: [PATCH 0/4] rs6000: Enable variable vec_insert with IFN VEC_SET Xionghu Luo
2020-11-13  2:05   ` Ping^2: " Xionghu Luo
2020-11-24  2:29     ` Ping^3: " Xionghu Luo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).