public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH V3] PR88497 - Extend reassoc for vector bit_field_ref
@ 2019-03-20  3:33 Kewen.Lin
  2019-04-03 22:00 ` [PING] " Kewen.Lin
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Kewen.Lin @ 2019-03-20  3:33 UTC (permalink / raw)
  To: GCC Patches; +Cc: Bill Schmidt, Segher Boessenkool, Richard Guenther

Hi,

Please refer to below link for previous threads.
https://gcc.gnu.org/ml/gcc-patches/2019-03/msg00348.html

Comparing to patch v2, I've moved up the vector operation target 
check upward together with vector type target check.  Besides, I
ran bootstrap and regtest on powerpc64-linux-gnu (BE), updated 
testcases' requirements and options for robustness.

Is it OK for GCC10?


gcc/ChangeLog

2019-03-20  Kewen Lin  <linkw@gcc.gnu.org>

	PR target/88497
	* tree-ssa-reassoc.c (reassociate_bb): Swap the positions of 
	GIMPLE_BINARY_RHS check and gimple_visited_p check, call new 
	function undistribute_bitref_for_vector.
	(undistribute_bitref_for_vector): New function.
	(cleanup_vinfo_map): Likewise.
	(unsigned_cmp): Likewise.

gcc/testsuite/ChangeLog

2019-03-20  Kewen Lin  <linkw@gcc.gnu.org>

	* gcc.dg/tree-ssa/pr88497-1.c: New test.
	* gcc.dg/tree-ssa/pr88497-2.c: Likewise.
	* gcc.dg/tree-ssa/pr88497-3.c: Likewise.
	* gcc.dg/tree-ssa/pr88497-4.c: Likewise.
	* gcc.dg/tree-ssa/pr88497-5.c: Likewise.

---
 gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c |  44 +++++
 gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c |  33 ++++
 gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c |  33 ++++
 gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c |  33 ++++
 gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c |  33 ++++
 gcc/tree-ssa-reassoc.c                    | 306 +++++++++++++++++++++++++++++-
 6 files changed, 477 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c
new file mode 100644
index 0000000..99c9af8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_double } */
+/* { dg-require-effective-target powerpc_vsx_ok { target { powerpc*-*-* } } } */
+/* { dg-options "-O2 -ffast-math" } */
+/* { dg-options "-O2 -ffast-math -mvsx -fdump-tree-reassoc1" { target { powerpc*-*-* } } } */
+
+/* To test reassoc can undistribute vector bit_field_ref summation.
+
+   arg1 and arg2 are two arrays whose elements of type vector double.
+   Assuming:
+     A0 = arg1[0], A1 = arg1[1], A2 = arg1[2], A3 = arg1[3],
+     B0 = arg2[0], B1 = arg2[1], B2 = arg2[2], B3 = arg2[3],
+
+   Then:
+     V0 = A0 * B0, V1 = A1 * B1, V2 = A2 * B2, V3 = A3 * B3,
+
+   reassoc transforms
+
+     accumulator += V0[0] + V0[1] + V1[0] + V1[1] + V2[0] + V2[1]
+		  + V3[0] + V3[1];
+
+   into:
+
+     T = V0 + V1 + V2 + V3
+     accumulator += T[0] + T[1];
+
+   Fewer bit_field_refs, only two for 128 or more bits vector.  */
+
+typedef double v2df __attribute__ ((vector_size (16)));
+double
+test (double accumulator, v2df arg1[], v2df arg2[])
+{
+  v2df temp;
+  temp = arg1[0] * arg2[0];
+  accumulator += temp[0] + temp[1];
+  temp = arg1[1] * arg2[1];
+  accumulator += temp[0] + temp[1];
+  temp = arg1[2] * arg2[2];
+  accumulator += temp[0] + temp[1];
+  temp = arg1[3] * arg2[3];
+  accumulator += temp[0] + temp[1];
+  return accumulator;
+}
+/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 2 "reassoc1" { target { powerpc*-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c
new file mode 100644
index 0000000..61ed0bf5
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_float } */
+/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */
+/* { dg-options "-O2 -ffast-math" } */
+/* { dg-options "-O2 -ffast-math -maltivec -fdump-tree-reassoc1" { target { powerpc*-*-* } } } */
+
+/* To test reassoc can undistribute vector bit_field_ref on multiplication.
+
+   v1, v2, v3, v4 of type vector float.
+
+   reassoc transforms
+
+     accumulator *= v1[0] * v1[1] * v1[2] * v1[3] *
+                    v2[0] * v2[1] * v2[2] * v2[3] *
+                    v3[0] * v3[1] * v3[2] * v3[3] *
+                    v4[0] * v4[1] * v4[2] * v4[3] ;
+
+   into:
+
+     T = v1 * v2 * v3 * v4;
+     accumulator *= T[0] * T[1] * T[2] * T[3];
+
+   Fewer bit_field_refs, only four for 128 or more bits vector.  */
+
+typedef float v4si __attribute__((vector_size(16)));
+float test(float accumulator, v4si v1, v4si v2, v4si v3, v4si v4) {
+  accumulator *= v1[0] * v1[1] * v1[2] * v1[3];
+  accumulator *= v2[0] * v2[1] * v2[2] * v2[3];
+  accumulator *= v3[0] * v3[1] * v3[2] * v3[3];
+  accumulator *= v4[0] * v4[1] * v4[2] * v4[3];
+  return accumulator;
+}
+/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 4 "reassoc1" { target { powerpc*-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c
new file mode 100644
index 0000000..3790afc
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */
+/* { dg-options "-O2 -ffast-math" } */
+/* { dg-options "-O2 -ffast-math -maltivec -fdump-tree-reassoc1" { target { powerpc*-*-* } } } */
+
+/* To test reassoc can undistribute vector bit_field_ref on bitwise AND.
+
+   v1, v2, v3, v4 of type vector int.
+
+   reassoc transforms
+
+     accumulator &= v1[0] & v1[1] & v1[2] & v1[3] &
+                    v2[0] & v2[1] & v2[2] & v2[3] &
+                    v3[0] & v3[1] & v3[2] & v3[3] &
+                    v4[0] & v4[1] & v4[2] & v4[3] ;
+
+   into:
+
+     T = v1 & v2 & v3 & v4;
+     accumulator &= T[0] & T[1] & T[2] & T[3];
+
+   Fewer bit_field_refs, only four for 128 or more bits vector.  */
+
+typedef int v4si __attribute__((vector_size(16)));
+int test(int accumulator, v4si v1, v4si v2, v4si v3, v4si v4) {
+  accumulator &= v1[0] & v1[1] & v1[2] & v1[3];
+  accumulator &= v2[0] & v2[1] & v2[2] & v2[3];
+  accumulator &= v3[0] & v3[1] & v3[2] & v3[3];
+  accumulator &= v4[0] & v4[1] & v4[2] & v4[3];
+  return accumulator;
+}
+/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 4 "reassoc1" { target { powerpc*-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c
new file mode 100644
index 0000000..1864aad
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */
+/* { dg-options "-O2 -ffast-math" } */
+/* { dg-options "-O2 -ffast-math -maltivec -fdump-tree-reassoc1" { target { powerpc*-*-* } } } */
+
+/* To test reassoc can undistribute vector bit_field_ref on bitwise IOR.
+
+   v1, v2, v3, v4 of type vector int.
+
+   reassoc transforms
+
+     accumulator |= v1[0] | v1[1] | v1[2] | v1[3] |
+                    v2[0] | v2[1] | v2[2] | v2[3] |
+                    v3[0] | v3[1] | v3[2] | v3[3] |
+                    v4[0] | v4[1] | v4[2] | v4[3] ;
+
+   into:
+
+     T = v1 | v2 | v3 | v4;
+     accumulator |= T[0] | T[1] | T[2] | T[3];
+
+   Fewer bit_field_refs, only four for 128 or more bits vector.  */
+
+typedef int v4si __attribute__((vector_size(16)));
+int test(int accumulator, v4si v1, v4si v2, v4si v3, v4si v4) {
+  accumulator |= v1[0] | v1[1] | v1[2] | v1[3];
+  accumulator |= v2[0] | v2[1] | v2[2] | v2[3];
+  accumulator |= v3[0] | v3[1] | v3[2] | v3[3];
+  accumulator |= v4[0] | v4[1] | v4[2] | v4[3];
+  return accumulator;
+}
+/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 4 "reassoc1" { target { powerpc*-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c
new file mode 100644
index 0000000..f747372
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c
@@ -0,0 +1,33 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */
+/* { dg-options "-O2 -ffast-math" } */
+/* { dg-options "-O2 -ffast-math -maltivec -fdump-tree-reassoc1" { target { powerpc*-*-* } } } */
+
+/* To test reassoc can undistribute vector bit_field_ref on bitwise XOR.
+
+   v1, v2, v3, v4 of type vector int.
+
+   reassoc transforms
+
+     accumulator ^= v1[0] ^ v1[1] ^ v1[2] ^ v1[3] ^
+                    v2[0] ^ v2[1] ^ v2[2] ^ v2[3] ^
+                    v3[0] ^ v3[1] ^ v3[2] ^ v3[3] ^
+                    v4[0] ^ v4[1] ^ v4[2] ^ v4[3] ;
+
+   into:
+
+     T = v1 ^ v2 ^ v3 ^ v4;
+     accumulator ^= T[0] ^ T[1] ^ T[2] ^ T[3];
+
+   Fewer bit_field_refs, only four for 128 or more bits vector.  */
+
+typedef int v4si __attribute__((vector_size(16)));
+int test(int accumulator, v4si v1, v4si v2, v4si v3, v4si v4) {
+  accumulator ^= v1[0] ^ v1[1] ^ v1[2] ^ v1[3];
+  accumulator ^= v2[0] ^ v2[1] ^ v2[2] ^ v2[3];
+  accumulator ^= v3[0] ^ v3[1] ^ v3[2] ^ v3[3];
+  accumulator ^= v4[0] ^ v4[1] ^ v4[2] ^ v4[3];
+  return accumulator;
+}
+/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 4 "reassoc1" { target { powerpc*-*-* } } } } */
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index e1c4dfe..a6cd85a 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -1772,6 +1772,295 @@ undistribute_ops_list (enum tree_code opcode,
   return changed;
 }
 
+/* Hold the information of one specific VECTOR_TYPE SSA_NAME.
+    - offsets: for different BIT_FIELD_REF offsets accessing same VECTOR.
+    - ops_indexes: the index of vec ops* for each relavant BIT_FIELD_REF.  */
+struct v_info
+{
+  auto_vec<unsigned HOST_WIDE_INT, 32> offsets;
+  auto_vec<unsigned, 32> ops_indexes;
+};
+
+typedef struct v_info *v_info_ptr;
+
+/* Comparison function for qsort on unsigned BIT_FIELD_REF offsets.  */
+static int
+unsigned_cmp (const void *p_i, const void *p_j)
+{
+  if (*(const unsigned HOST_WIDE_INT *) p_i
+      >= *(const unsigned HOST_WIDE_INT *) p_j)
+    return 1;
+  else
+    return -1;
+}
+
+/* Cleanup hash map for VECTOR information.  */
+static void
+cleanup_vinfo_map (hash_map<tree, v_info_ptr> &info_map)
+{
+  for (hash_map<tree, v_info_ptr>::iterator it = info_map.begin ();
+       it != info_map.end (); ++it)
+    {
+      v_info_ptr info = (*it).second;
+      delete info;
+      (*it).second = NULL;
+    }
+}
+
+/* Perform un-distribution of BIT_FIELD_REF on VECTOR_TYPE.
+     V1[0] + V1[1] + ... + V1[k] + V2[0] + V2[1] + ... + V2[k] + ... Vn[k]
+   is transformed to
+     Vs = (V1 + V2 + ... + Vn)
+     Vs[0] + Vs[1] + ... + Vs[k]
+
+   The basic steps are listed below:
+
+    1) Check the addition chain *OPS by looking those summands coming from
+       VECTOR bit_field_ref on VECTOR type. Put the information into
+       v_info_map for each satisfied summand, using VECTOR SSA_NAME as key.
+
+    2) For each key (VECTOR SSA_NAME), validate all its BIT_FIELD_REFs are
+       continous, they can cover the whole VECTOR perfectly without any holes.
+       Obtain one VECTOR list which contain candidates to be transformed.
+
+    3) Build the addition statements for all VECTOR candidates, generate
+       BIT_FIELD_REFs accordingly.
+
+   TODO:
+    1) The current implementation restrict all candidate VECTORs should have
+       the same VECTOR type, but it can be extended into different groups by
+       VECTOR types in future if any profitable cases found.
+    2) The current implementation requires the whole VECTORs should be fully
+       covered, but it can be extended to support partial, checking adjacent
+       but not fill the whole, it may need some cost model to define the
+       boundary to do or not.
+*/
+static bool
+undistribute_bitref_for_vector (enum tree_code opcode, vec<operand_entry *> *ops,
+			     struct loop *loop)
+{
+  if (ops->length () <= 1)
+    return false;
+
+  if (opcode != PLUS_EXPR && opcode != MULT_EXPR && opcode != BIT_XOR_EXPR
+      && opcode != BIT_IOR_EXPR && opcode != BIT_AND_EXPR)
+    return false;
+
+  hash_map<tree, v_info_ptr> v_info_map;
+  operand_entry *oe1;
+  unsigned i;
+
+  /* Find those summands from VECTOR BIT_FIELD_REF in addition chain, put the
+     information into map.  */
+  FOR_EACH_VEC_ELT (*ops, i, oe1)
+    {
+      enum tree_code dcode;
+      gimple *oe1def;
+
+      if (TREE_CODE (oe1->op) != SSA_NAME)
+	continue;
+      oe1def = SSA_NAME_DEF_STMT (oe1->op);
+      if (!is_gimple_assign (oe1def))
+	continue;
+      dcode = gimple_assign_rhs_code (oe1def);
+      if (dcode != BIT_FIELD_REF || !is_reassociable_op (oe1def, dcode, loop))
+	continue;
+
+      tree rhs = gimple_op (oe1def, 1);
+      tree op0 = TREE_OPERAND (rhs, 0);
+      tree vec_type = TREE_TYPE (op0);
+
+      if (TREE_CODE (op0) != SSA_NAME || TREE_CODE (vec_type) != VECTOR_TYPE)
+	continue;
+
+      tree op1 = TREE_OPERAND (rhs, 1);
+      tree op2 = TREE_OPERAND (rhs, 2);
+
+      tree elem_type = TREE_TYPE (vec_type);
+      unsigned HOST_WIDE_INT size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
+      if (size != TREE_INT_CST_LOW (op1))
+	continue;
+
+      /* Ignore it if target machine can't support this VECTOR type.  */
+      if (!VECTOR_MODE_P (TYPE_MODE (vec_type)))
+	continue;
+
+      /* Ignore it if target machine can't support this type of VECTOR
+         operation.  */
+      optab op_tab = optab_for_tree_code (opcode, vec_type, optab_vector);
+      if (optab_handler (op_tab, TYPE_MODE (vec_type)) == CODE_FOR_nothing)
+	continue;
+
+      v_info_ptr *info_ptr = v_info_map.get (op0);
+      if (info_ptr)
+	{
+	  v_info_ptr info = *info_ptr;
+	  info->offsets.safe_push (TREE_INT_CST_LOW (op2));
+	  info->ops_indexes.safe_push (i);
+	}
+      else
+	{
+	  v_info_ptr info = new v_info;
+	  info->offsets.safe_push (TREE_INT_CST_LOW (op2));
+	  info->ops_indexes.safe_push (i);
+	  v_info_map.put (op0, info);
+	}
+    }
+
+  /* At least two VECTOR to combine.  */
+  if (v_info_map.elements () <= 1)
+    {
+      cleanup_vinfo_map (v_info_map);
+      return false;
+    }
+
+  /* Use the first VECTOR and its information as the reference.
+     Firstly, we should validate it, that is:
+       1) sorted offsets are adjacent, no holes.
+       2) can fill the whole VECTOR perfectly.  */
+  hash_map<tree, v_info_ptr>::iterator it = v_info_map.begin ();
+  tree ref_vec = (*it).first;
+  v_info_ptr ref_info = (*it).second;
+  ref_info->offsets.qsort (unsigned_cmp);
+  tree vec_type = TREE_TYPE (ref_vec);
+  tree elem_type = TREE_TYPE (vec_type);
+  unsigned HOST_WIDE_INT elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
+  unsigned HOST_WIDE_INT curr;
+  unsigned HOST_WIDE_INT prev = ref_info->offsets[0];
+
+  /* Continous check.  */
+  FOR_EACH_VEC_ELT_FROM (ref_info->offsets, i, curr, 1)
+    {
+      if (curr != (prev + elem_size))
+	{
+	  cleanup_vinfo_map (v_info_map);
+	  return false;
+	}
+      prev = curr;
+    }
+
+  /* Check whether fill the whole.  */
+  if ((prev + elem_size) != TREE_INT_CST_LOW (TYPE_SIZE (TREE_TYPE (ref_vec))))
+    {
+      cleanup_vinfo_map (v_info_map);
+      return false;
+    }
+
+  auto_vec<tree> vectors (v_info_map.elements ());
+  vectors.quick_push (ref_vec);
+
+  /* Use the ref_vec to filter others.  */
+  for (++it; it != v_info_map.end (); ++it)
+    {
+      tree vec = (*it).first;
+      v_info_ptr info = (*it).second;
+      if (TREE_TYPE (ref_vec) != TREE_TYPE (vec))
+	continue;
+      if (ref_info->offsets.length () != info->offsets.length ())
+	continue;
+      bool same_offset = true;
+      info->offsets.qsort (unsigned_cmp);
+      for (unsigned i = 0; i < ref_info->offsets.length (); i++)
+	{
+	  if (ref_info->offsets[i] != info->offsets[i])
+	    {
+	      same_offset = false;
+	      break;
+	    }
+	}
+      if (!same_offset)
+	continue;
+      vectors.quick_push (vec);
+    }
+
+  if (vectors.length () < 2)
+    {
+      cleanup_vinfo_map (v_info_map);
+      return false;
+    }
+
+  tree tr;
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "The bit_field_ref vector list for undistribute: ");
+      FOR_EACH_VEC_ELT (vectors, i, tr)
+	{
+	  print_generic_expr (dump_file, tr);
+	  fprintf (dump_file, "  ");
+	}
+      fprintf (dump_file, "\n");
+    }
+
+  /* Build the sum for all candidate VECTORs.  */
+  unsigned idx;
+  gimple *sum = NULL;
+  v_info_ptr info;
+  tree sum_vec = ref_vec;
+  FOR_EACH_VEC_ELT_FROM (vectors, i, tr, 1)
+    {
+      sum = build_and_add_sum (TREE_TYPE (ref_vec), sum_vec, tr, opcode);
+      info = *(v_info_map.get (tr));
+      unsigned j;
+      FOR_EACH_VEC_ELT (info->ops_indexes, j, idx)
+	{
+	  gimple *def = SSA_NAME_DEF_STMT ((*ops)[idx]->op);
+	  gimple_set_visited (def, true);
+	  if (opcode == PLUS_EXPR || opcode == BIT_XOR_EXPR
+	      || opcode == BIT_IOR_EXPR)
+	    (*ops)[idx]->op = build_zero_cst (TREE_TYPE ((*ops)[idx]->op));
+	  else if (opcode == MULT_EXPR)
+	    (*ops)[idx]->op = build_one_cst (TREE_TYPE ((*ops)[idx]->op));
+	  else
+	    {
+	      gcc_assert (opcode == BIT_AND_EXPR);
+	      (*ops)[idx]->op
+		= build_all_ones_cst (TREE_TYPE ((*ops)[idx]->op));
+	    }
+	  (*ops)[idx]->rank = 0;
+	}
+      sum_vec = gimple_get_lhs (sum);
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Generating addition -> ");
+	  print_gimple_stmt (dump_file, sum, 0);
+	}
+    }
+
+  /* Referring to any good shape VECTOR (here using ref_vec), generate the
+     BIT_FIELD_REF statements accordingly.  */
+  info = *(v_info_map.get (ref_vec));
+  gcc_assert (sum);
+  FOR_EACH_VEC_ELT (info->ops_indexes, i, idx)
+    {
+      tree dst = make_ssa_name (elem_type);
+      gimple *gs
+	= gimple_build_assign (dst, BIT_FIELD_REF,
+			       build3 (BIT_FIELD_REF, elem_type, sum_vec,
+				       TYPE_SIZE (elem_type),
+				       bitsize_int (info->offsets[i])));
+      insert_stmt_after (gs, sum);
+      update_stmt (gs);
+      gimple *def = SSA_NAME_DEF_STMT ((*ops)[idx]->op);
+      gimple_set_visited (def, true);
+      (*ops)[idx]->op = gimple_assign_lhs (gs);
+      (*ops)[idx]->rank = get_rank ((*ops)[idx]->op);
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Generating bit_field_ref -> ");
+	  print_gimple_stmt (dump_file, gs, 0);
+	}
+    }
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "undistributiong bit_field_ref for vector done.\n");
+    }
+
+  cleanup_vinfo_map (v_info_map);
+
+  return true;
+}
+
 /* If OPCODE is BIT_IOR_EXPR or BIT_AND_EXPR and CURR is a comparison
    expression, examine the other OPS to see if any of them are comparisons
    of the same values, which we may be able to combine or eliminate.
@@ -5880,11 +6169,6 @@ reassociate_bb (basic_block bb)
 	  tree lhs, rhs1, rhs2;
 	  enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
 
-	  /* If this is not a gimple binary expression, there is
-	     nothing for us to do with it.  */
-	  if (get_gimple_rhs_class (rhs_code) != GIMPLE_BINARY_RHS)
-	    continue;
-
 	  /* If this was part of an already processed statement,
 	     we don't need to touch it again. */
 	  if (gimple_visited_p (stmt))
@@ -5911,6 +6195,11 @@ reassociate_bb (basic_block bb)
 	      continue;
 	    }
 
+	  /* If this is not a gimple binary expression, there is
+	     nothing for us to do with it.  */
+	  if (get_gimple_rhs_class (rhs_code) != GIMPLE_BINARY_RHS)
+	    continue;
+
 	  lhs = gimple_assign_lhs (stmt);
 	  rhs1 = gimple_assign_rhs1 (stmt);
 	  rhs2 = gimple_assign_rhs2 (stmt);
@@ -5950,6 +6239,13 @@ reassociate_bb (basic_block bb)
 		  optimize_ops_list (rhs_code, &ops);
 		}
 
+	      if (undistribute_bitref_for_vector (rhs_code, &ops,
+						  loop_containing_stmt (stmt)))
+		{
+		  ops.qsort (sort_by_operand_rank);
+		  optimize_ops_list (rhs_code, &ops);
+		}
+
 	      if (rhs_code == PLUS_EXPR
 		  && transform_add_to_multiply (&ops))
 		ops.qsort (sort_by_operand_rank);
-- 
2.7.4



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PING] [PATCH V3] PR88497 - Extend reassoc for vector bit_field_ref
  2019-03-20  3:33 [PATCH V3] PR88497 - Extend reassoc for vector bit_field_ref Kewen.Lin
@ 2019-04-03 22:00 ` Kewen.Lin
  2019-05-05  6:15 ` Kewen.Lin
  2019-07-02 12:43 ` Richard Biener
  2 siblings, 0 replies; 16+ messages in thread
From: Kewen.Lin @ 2019-04-03 22:00 UTC (permalink / raw)
  To: GCC Patches; +Cc: Bill Schmidt, Segher Boessenkool, Richard Guenther

Hi,

Gentle ping for this patch:
https://gcc.gnu.org/ml/gcc-patches/2019-03/msg00966.html

Thanks!

on 2019/3/19 脡脧脦莽11:14, Kewen.Lin wrote:
> Hi,
> 
> Please refer to below link for previous threads.
> https://gcc.gnu.org/ml/gcc-patches/2019-03/msg00348.html
> 
> Comparing to patch v2, I've moved up the vector operation target 
> check upward together with vector type target check.  Besides, I
> ran bootstrap and regtest on powerpc64-linux-gnu (BE), updated 
> testcases' requirements and options for robustness.
> 
> Is it OK for GCC10?
> 
> 
> gcc/ChangeLog
> 
> 2019-03-20  Kewen Lin  <linkw@gcc.gnu.org>
> 
> 	PR target/88497
> 	* tree-ssa-reassoc.c (reassociate_bb): Swap the positions of 
> 	GIMPLE_BINARY_RHS check and gimple_visited_p check, call new 
> 	function undistribute_bitref_for_vector.
> 	(undistribute_bitref_for_vector): New function.
> 	(cleanup_vinfo_map): Likewise.
> 	(unsigned_cmp): Likewise.
> 
> gcc/testsuite/ChangeLog
> 
> 2019-03-20  Kewen Lin  <linkw@gcc.gnu.org>
> 
> 	* gcc.dg/tree-ssa/pr88497-1.c: New test.
> 	* gcc.dg/tree-ssa/pr88497-2.c: Likewise.
> 	* gcc.dg/tree-ssa/pr88497-3.c: Likewise.
> 	* gcc.dg/tree-ssa/pr88497-4.c: Likewise.
> 	* gcc.dg/tree-ssa/pr88497-5.c: Likewise.
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PING] [PATCH V3] PR88497 - Extend reassoc for vector bit_field_ref
  2019-03-20  3:33 [PATCH V3] PR88497 - Extend reassoc for vector bit_field_ref Kewen.Lin
  2019-04-03 22:00 ` [PING] " Kewen.Lin
@ 2019-05-05  6:15 ` Kewen.Lin
  2019-05-21  2:03   ` Kewen.Lin
  2019-07-02 12:43 ` Richard Biener
  2 siblings, 1 reply; 16+ messages in thread
From: Kewen.Lin @ 2019-05-05  6:15 UTC (permalink / raw)
  To: GCC Patches; +Cc: Bill Schmidt, Segher Boessenkool, Richard Guenther

Hi,

I'd like to gentle ping for this patch:
https://gcc.gnu.org/ml/gcc-patches/2019-03/msg00966.html

OK for trunk now?

Thanks!

on 2019/3/20 脡脧脦莽11:14, Kewen.Lin wrote:
> Hi,
> 
> Please refer to below link for previous threads.
> https://gcc.gnu.org/ml/gcc-patches/2019-03/msg00348.html
> 
> Comparing to patch v2, I've moved up the vector operation target 
> check upward together with vector type target check.  Besides, I
> ran bootstrap and regtest on powerpc64-linux-gnu (BE), updated 
> testcases' requirements and options for robustness.
> 
> Is it OK for GCC10?
> 
> 
> gcc/ChangeLog
> 
> 2019-03-20  Kewen Lin  <linkw@gcc.gnu.org>
> 
> 	PR target/88497
> 	* tree-ssa-reassoc.c (reassociate_bb): Swap the positions of 
> 	GIMPLE_BINARY_RHS check and gimple_visited_p check, call new 
> 	function undistribute_bitref_for_vector.
> 	(undistribute_bitref_for_vector): New function.
> 	(cleanup_vinfo_map): Likewise.
> 	(unsigned_cmp): Likewise.
> 
> gcc/testsuite/ChangeLog
> 
> 2019-03-20  Kewen Lin  <linkw@gcc.gnu.org>
> 
> 	* gcc.dg/tree-ssa/pr88497-1.c: New test.
> 	* gcc.dg/tree-ssa/pr88497-2.c: Likewise.
> 	* gcc.dg/tree-ssa/pr88497-3.c: Likewise.
> 	* gcc.dg/tree-ssa/pr88497-4.c: Likewise.
> 	* gcc.dg/tree-ssa/pr88497-5.c: Likewise.
> 
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c |  44 +++++
>  gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c |  33 ++++
>  gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c |  33 ++++
>  gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c |  33 ++++
>  gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c |  33 ++++
>  gcc/tree-ssa-reassoc.c                    | 306 +++++++++++++++++++++++++++++-
>  6 files changed, 477 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c
> new file mode 100644
> index 0000000..99c9af8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c
> @@ -0,0 +1,44 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_double } */
> +/* { dg-require-effective-target powerpc_vsx_ok { target { powerpc*-*-* } } } */
> +/* { dg-options "-O2 -ffast-math" } */
> +/* { dg-options "-O2 -ffast-math -mvsx -fdump-tree-reassoc1" { target { powerpc*-*-* } } } */
> +
> +/* To test reassoc can undistribute vector bit_field_ref summation.
> +
> +   arg1 and arg2 are two arrays whose elements of type vector double.
> +   Assuming:
> +     A0 = arg1[0], A1 = arg1[1], A2 = arg1[2], A3 = arg1[3],
> +     B0 = arg2[0], B1 = arg2[1], B2 = arg2[2], B3 = arg2[3],
> +
> +   Then:
> +     V0 = A0 * B0, V1 = A1 * B1, V2 = A2 * B2, V3 = A3 * B3,
> +
> +   reassoc transforms
> +
> +     accumulator += V0[0] + V0[1] + V1[0] + V1[1] + V2[0] + V2[1]
> +		  + V3[0] + V3[1];
> +
> +   into:
> +
> +     T = V0 + V1 + V2 + V3
> +     accumulator += T[0] + T[1];
> +
> +   Fewer bit_field_refs, only two for 128 or more bits vector.  */
> +
> +typedef double v2df __attribute__ ((vector_size (16)));
> +double
> +test (double accumulator, v2df arg1[], v2df arg2[])
> +{
> +  v2df temp;
> +  temp = arg1[0] * arg2[0];
> +  accumulator += temp[0] + temp[1];
> +  temp = arg1[1] * arg2[1];
> +  accumulator += temp[0] + temp[1];
> +  temp = arg1[2] * arg2[2];
> +  accumulator += temp[0] + temp[1];
> +  temp = arg1[3] * arg2[3];
> +  accumulator += temp[0] + temp[1];
> +  return accumulator;
> +}
> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 2 "reassoc1" { target { powerpc*-*-* } } } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c
> new file mode 100644
> index 0000000..61ed0bf5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_float } */
> +/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */
> +/* { dg-options "-O2 -ffast-math" } */
> +/* { dg-options "-O2 -ffast-math -maltivec -fdump-tree-reassoc1" { target { powerpc*-*-* } } } */
> +
> +/* To test reassoc can undistribute vector bit_field_ref on multiplication.
> +
> +   v1, v2, v3, v4 of type vector float.
> +
> +   reassoc transforms
> +
> +     accumulator *= v1[0] * v1[1] * v1[2] * v1[3] *
> +                    v2[0] * v2[1] * v2[2] * v2[3] *
> +                    v3[0] * v3[1] * v3[2] * v3[3] *
> +                    v4[0] * v4[1] * v4[2] * v4[3] ;
> +
> +   into:
> +
> +     T = v1 * v2 * v3 * v4;
> +     accumulator *= T[0] * T[1] * T[2] * T[3];
> +
> +   Fewer bit_field_refs, only four for 128 or more bits vector.  */
> +
> +typedef float v4si __attribute__((vector_size(16)));
> +float test(float accumulator, v4si v1, v4si v2, v4si v3, v4si v4) {
> +  accumulator *= v1[0] * v1[1] * v1[2] * v1[3];
> +  accumulator *= v2[0] * v2[1] * v2[2] * v2[3];
> +  accumulator *= v3[0] * v3[1] * v3[2] * v3[3];
> +  accumulator *= v4[0] * v4[1] * v4[2] * v4[3];
> +  return accumulator;
> +}
> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 4 "reassoc1" { target { powerpc*-*-* } } } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c
> new file mode 100644
> index 0000000..3790afc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */
> +/* { dg-options "-O2 -ffast-math" } */
> +/* { dg-options "-O2 -ffast-math -maltivec -fdump-tree-reassoc1" { target { powerpc*-*-* } } } */
> +
> +/* To test reassoc can undistribute vector bit_field_ref on bitwise AND.
> +
> +   v1, v2, v3, v4 of type vector int.
> +
> +   reassoc transforms
> +
> +     accumulator &= v1[0] & v1[1] & v1[2] & v1[3] &
> +                    v2[0] & v2[1] & v2[2] & v2[3] &
> +                    v3[0] & v3[1] & v3[2] & v3[3] &
> +                    v4[0] & v4[1] & v4[2] & v4[3] ;
> +
> +   into:
> +
> +     T = v1 & v2 & v3 & v4;
> +     accumulator &= T[0] & T[1] & T[2] & T[3];
> +
> +   Fewer bit_field_refs, only four for 128 or more bits vector.  */
> +
> +typedef int v4si __attribute__((vector_size(16)));
> +int test(int accumulator, v4si v1, v4si v2, v4si v3, v4si v4) {
> +  accumulator &= v1[0] & v1[1] & v1[2] & v1[3];
> +  accumulator &= v2[0] & v2[1] & v2[2] & v2[3];
> +  accumulator &= v3[0] & v3[1] & v3[2] & v3[3];
> +  accumulator &= v4[0] & v4[1] & v4[2] & v4[3];
> +  return accumulator;
> +}
> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 4 "reassoc1" { target { powerpc*-*-* } } } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c
> new file mode 100644
> index 0000000..1864aad
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */
> +/* { dg-options "-O2 -ffast-math" } */
> +/* { dg-options "-O2 -ffast-math -maltivec -fdump-tree-reassoc1" { target { powerpc*-*-* } } } */
> +
> +/* To test reassoc can undistribute vector bit_field_ref on bitwise IOR.
> +
> +   v1, v2, v3, v4 of type vector int.
> +
> +   reassoc transforms
> +
> +     accumulator |= v1[0] | v1[1] | v1[2] | v1[3] |
> +                    v2[0] | v2[1] | v2[2] | v2[3] |
> +                    v3[0] | v3[1] | v3[2] | v3[3] |
> +                    v4[0] | v4[1] | v4[2] | v4[3] ;
> +
> +   into:
> +
> +     T = v1 | v2 | v3 | v4;
> +     accumulator |= T[0] | T[1] | T[2] | T[3];
> +
> +   Fewer bit_field_refs, only four for 128 or more bits vector.  */
> +
> +typedef int v4si __attribute__((vector_size(16)));
> +int test(int accumulator, v4si v1, v4si v2, v4si v3, v4si v4) {
> +  accumulator |= v1[0] | v1[1] | v1[2] | v1[3];
> +  accumulator |= v2[0] | v2[1] | v2[2] | v2[3];
> +  accumulator |= v3[0] | v3[1] | v3[2] | v3[3];
> +  accumulator |= v4[0] | v4[1] | v4[2] | v4[3];
> +  return accumulator;
> +}
> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 4 "reassoc1" { target { powerpc*-*-* } } } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c
> new file mode 100644
> index 0000000..f747372
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */
> +/* { dg-options "-O2 -ffast-math" } */
> +/* { dg-options "-O2 -ffast-math -maltivec -fdump-tree-reassoc1" { target { powerpc*-*-* } } } */
> +
> +/* To test reassoc can undistribute vector bit_field_ref on bitwise XOR.
> +
> +   v1, v2, v3, v4 of type vector int.
> +
> +   reassoc transforms
> +
> +     accumulator ^= v1[0] ^ v1[1] ^ v1[2] ^ v1[3] ^
> +                    v2[0] ^ v2[1] ^ v2[2] ^ v2[3] ^
> +                    v3[0] ^ v3[1] ^ v3[2] ^ v3[3] ^
> +                    v4[0] ^ v4[1] ^ v4[2] ^ v4[3] ;
> +
> +   into:
> +
> +     T = v1 ^ v2 ^ v3 ^ v4;
> +     accumulator ^= T[0] ^ T[1] ^ T[2] ^ T[3];
> +
> +   Fewer bit_field_refs, only four for 128 or more bits vector.  */
> +
> +typedef int v4si __attribute__((vector_size(16)));
> +int test(int accumulator, v4si v1, v4si v2, v4si v3, v4si v4) {
> +  accumulator ^= v1[0] ^ v1[1] ^ v1[2] ^ v1[3];
> +  accumulator ^= v2[0] ^ v2[1] ^ v2[2] ^ v2[3];
> +  accumulator ^= v3[0] ^ v3[1] ^ v3[2] ^ v3[3];
> +  accumulator ^= v4[0] ^ v4[1] ^ v4[2] ^ v4[3];
> +  return accumulator;
> +}
> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 4 "reassoc1" { target { powerpc*-*-* } } } } */
> diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
> index e1c4dfe..a6cd85a 100644
> --- a/gcc/tree-ssa-reassoc.c
> +++ b/gcc/tree-ssa-reassoc.c
> @@ -1772,6 +1772,295 @@ undistribute_ops_list (enum tree_code opcode,
>    return changed;
>  }
>  
> +/* Hold the information of one specific VECTOR_TYPE SSA_NAME.
> +    - offsets: for different BIT_FIELD_REF offsets accessing same VECTOR.
> +    - ops_indexes: the index of vec ops* for each relavant BIT_FIELD_REF.  */
> +struct v_info
> +{
> +  auto_vec<unsigned HOST_WIDE_INT, 32> offsets;
> +  auto_vec<unsigned, 32> ops_indexes;
> +};
> +
> +typedef struct v_info *v_info_ptr;
> +
> +/* Comparison function for qsort on unsigned BIT_FIELD_REF offsets.  */
> +static int
> +unsigned_cmp (const void *p_i, const void *p_j)
> +{
> +  if (*(const unsigned HOST_WIDE_INT *) p_i
> +      >= *(const unsigned HOST_WIDE_INT *) p_j)
> +    return 1;
> +  else
> +    return -1;
> +}
> +
> +/* Cleanup hash map for VECTOR information.  */
> +static void
> +cleanup_vinfo_map (hash_map<tree, v_info_ptr> &info_map)
> +{
> +  for (hash_map<tree, v_info_ptr>::iterator it = info_map.begin ();
> +       it != info_map.end (); ++it)
> +    {
> +      v_info_ptr info = (*it).second;
> +      delete info;
> +      (*it).second = NULL;
> +    }
> +}
> +
> +/* Perform un-distribution of BIT_FIELD_REF on VECTOR_TYPE.
> +     V1[0] + V1[1] + ... + V1[k] + V2[0] + V2[1] + ... + V2[k] + ... Vn[k]
> +   is transformed to
> +     Vs = (V1 + V2 + ... + Vn)
> +     Vs[0] + Vs[1] + ... + Vs[k]
> +
> +   The basic steps are listed below:
> +
> +    1) Check the addition chain *OPS by looking those summands coming from
> +       VECTOR bit_field_ref on VECTOR type. Put the information into
> +       v_info_map for each satisfied summand, using VECTOR SSA_NAME as key.
> +
> +    2) For each key (VECTOR SSA_NAME), validate all its BIT_FIELD_REFs are
> +       continous, they can cover the whole VECTOR perfectly without any holes.
> +       Obtain one VECTOR list which contain candidates to be transformed.
> +
> +    3) Build the addition statements for all VECTOR candidates, generate
> +       BIT_FIELD_REFs accordingly.
> +
> +   TODO:
> +    1) The current implementation restrict all candidate VECTORs should have
> +       the same VECTOR type, but it can be extended into different groups by
> +       VECTOR types in future if any profitable cases found.
> +    2) The current implementation requires the whole VECTORs should be fully
> +       covered, but it can be extended to support partial, checking adjacent
> +       but not fill the whole, it may need some cost model to define the
> +       boundary to do or not.
> +*/
> +static bool
> +undistribute_bitref_for_vector (enum tree_code opcode, vec<operand_entry *> *ops,
> +			     struct loop *loop)
> +{
> +  if (ops->length () <= 1)
> +    return false;
> +
> +  if (opcode != PLUS_EXPR && opcode != MULT_EXPR && opcode != BIT_XOR_EXPR
> +      && opcode != BIT_IOR_EXPR && opcode != BIT_AND_EXPR)
> +    return false;
> +
> +  hash_map<tree, v_info_ptr> v_info_map;
> +  operand_entry *oe1;
> +  unsigned i;
> +
> +  /* Find those summands from VECTOR BIT_FIELD_REF in addition chain, put the
> +     information into map.  */
> +  FOR_EACH_VEC_ELT (*ops, i, oe1)
> +    {
> +      enum tree_code dcode;
> +      gimple *oe1def;
> +
> +      if (TREE_CODE (oe1->op) != SSA_NAME)
> +	continue;
> +      oe1def = SSA_NAME_DEF_STMT (oe1->op);
> +      if (!is_gimple_assign (oe1def))
> +	continue;
> +      dcode = gimple_assign_rhs_code (oe1def);
> +      if (dcode != BIT_FIELD_REF || !is_reassociable_op (oe1def, dcode, loop))
> +	continue;
> +
> +      tree rhs = gimple_op (oe1def, 1);
> +      tree op0 = TREE_OPERAND (rhs, 0);
> +      tree vec_type = TREE_TYPE (op0);
> +
> +      if (TREE_CODE (op0) != SSA_NAME || TREE_CODE (vec_type) != VECTOR_TYPE)
> +	continue;
> +
> +      tree op1 = TREE_OPERAND (rhs, 1);
> +      tree op2 = TREE_OPERAND (rhs, 2);
> +
> +      tree elem_type = TREE_TYPE (vec_type);
> +      unsigned HOST_WIDE_INT size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
> +      if (size != TREE_INT_CST_LOW (op1))
> +	continue;
> +
> +      /* Ignore it if target machine can't support this VECTOR type.  */
> +      if (!VECTOR_MODE_P (TYPE_MODE (vec_type)))
> +	continue;
> +
> +      /* Ignore it if target machine can't support this type of VECTOR
> +         operation.  */
> +      optab op_tab = optab_for_tree_code (opcode, vec_type, optab_vector);
> +      if (optab_handler (op_tab, TYPE_MODE (vec_type)) == CODE_FOR_nothing)
> +	continue;
> +
> +      v_info_ptr *info_ptr = v_info_map.get (op0);
> +      if (info_ptr)
> +	{
> +	  v_info_ptr info = *info_ptr;
> +	  info->offsets.safe_push (TREE_INT_CST_LOW (op2));
> +	  info->ops_indexes.safe_push (i);
> +	}
> +      else
> +	{
> +	  v_info_ptr info = new v_info;
> +	  info->offsets.safe_push (TREE_INT_CST_LOW (op2));
> +	  info->ops_indexes.safe_push (i);
> +	  v_info_map.put (op0, info);
> +	}
> +    }
> +
> +  /* At least two VECTOR to combine.  */
> +  if (v_info_map.elements () <= 1)
> +    {
> +      cleanup_vinfo_map (v_info_map);
> +      return false;
> +    }
> +
> +  /* Use the first VECTOR and its information as the reference.
> +     Firstly, we should validate it, that is:
> +       1) sorted offsets are adjacent, no holes.
> +       2) can fill the whole VECTOR perfectly.  */
> +  hash_map<tree, v_info_ptr>::iterator it = v_info_map.begin ();
> +  tree ref_vec = (*it).first;
> +  v_info_ptr ref_info = (*it).second;
> +  ref_info->offsets.qsort (unsigned_cmp);
> +  tree vec_type = TREE_TYPE (ref_vec);
> +  tree elem_type = TREE_TYPE (vec_type);
> +  unsigned HOST_WIDE_INT elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
> +  unsigned HOST_WIDE_INT curr;
> +  unsigned HOST_WIDE_INT prev = ref_info->offsets[0];
> +
> +  /* Continous check.  */
> +  FOR_EACH_VEC_ELT_FROM (ref_info->offsets, i, curr, 1)
> +    {
> +      if (curr != (prev + elem_size))
> +	{
> +	  cleanup_vinfo_map (v_info_map);
> +	  return false;
> +	}
> +      prev = curr;
> +    }
> +
> +  /* Check whether fill the whole.  */
> +  if ((prev + elem_size) != TREE_INT_CST_LOW (TYPE_SIZE (TREE_TYPE (ref_vec))))
> +    {
> +      cleanup_vinfo_map (v_info_map);
> +      return false;
> +    }
> +
> +  auto_vec<tree> vectors (v_info_map.elements ());
> +  vectors.quick_push (ref_vec);
> +
> +  /* Use the ref_vec to filter others.  */
> +  for (++it; it != v_info_map.end (); ++it)
> +    {
> +      tree vec = (*it).first;
> +      v_info_ptr info = (*it).second;
> +      if (TREE_TYPE (ref_vec) != TREE_TYPE (vec))
> +	continue;
> +      if (ref_info->offsets.length () != info->offsets.length ())
> +	continue;
> +      bool same_offset = true;
> +      info->offsets.qsort (unsigned_cmp);
> +      for (unsigned i = 0; i < ref_info->offsets.length (); i++)
> +	{
> +	  if (ref_info->offsets[i] != info->offsets[i])
> +	    {
> +	      same_offset = false;
> +	      break;
> +	    }
> +	}
> +      if (!same_offset)
> +	continue;
> +      vectors.quick_push (vec);
> +    }
> +
> +  if (vectors.length () < 2)
> +    {
> +      cleanup_vinfo_map (v_info_map);
> +      return false;
> +    }
> +
> +  tree tr;
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    {
> +      fprintf (dump_file, "The bit_field_ref vector list for undistribute: ");
> +      FOR_EACH_VEC_ELT (vectors, i, tr)
> +	{
> +	  print_generic_expr (dump_file, tr);
> +	  fprintf (dump_file, "  ");
> +	}
> +      fprintf (dump_file, "\n");
> +    }
> +
> +  /* Build the sum for all candidate VECTORs.  */
> +  unsigned idx;
> +  gimple *sum = NULL;
> +  v_info_ptr info;
> +  tree sum_vec = ref_vec;
> +  FOR_EACH_VEC_ELT_FROM (vectors, i, tr, 1)
> +    {
> +      sum = build_and_add_sum (TREE_TYPE (ref_vec), sum_vec, tr, opcode);
> +      info = *(v_info_map.get (tr));
> +      unsigned j;
> +      FOR_EACH_VEC_ELT (info->ops_indexes, j, idx)
> +	{
> +	  gimple *def = SSA_NAME_DEF_STMT ((*ops)[idx]->op);
> +	  gimple_set_visited (def, true);
> +	  if (opcode == PLUS_EXPR || opcode == BIT_XOR_EXPR
> +	      || opcode == BIT_IOR_EXPR)
> +	    (*ops)[idx]->op = build_zero_cst (TREE_TYPE ((*ops)[idx]->op));
> +	  else if (opcode == MULT_EXPR)
> +	    (*ops)[idx]->op = build_one_cst (TREE_TYPE ((*ops)[idx]->op));
> +	  else
> +	    {
> +	      gcc_assert (opcode == BIT_AND_EXPR);
> +	      (*ops)[idx]->op
> +		= build_all_ones_cst (TREE_TYPE ((*ops)[idx]->op));
> +	    }
> +	  (*ops)[idx]->rank = 0;
> +	}
> +      sum_vec = gimple_get_lhs (sum);
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +	{
> +	  fprintf (dump_file, "Generating addition -> ");
> +	  print_gimple_stmt (dump_file, sum, 0);
> +	}
> +    }
> +
> +  /* Referring to any good shape VECTOR (here using ref_vec), generate the
> +     BIT_FIELD_REF statements accordingly.  */
> +  info = *(v_info_map.get (ref_vec));
> +  gcc_assert (sum);
> +  FOR_EACH_VEC_ELT (info->ops_indexes, i, idx)
> +    {
> +      tree dst = make_ssa_name (elem_type);
> +      gimple *gs
> +	= gimple_build_assign (dst, BIT_FIELD_REF,
> +			       build3 (BIT_FIELD_REF, elem_type, sum_vec,
> +				       TYPE_SIZE (elem_type),
> +				       bitsize_int (info->offsets[i])));
> +      insert_stmt_after (gs, sum);
> +      update_stmt (gs);
> +      gimple *def = SSA_NAME_DEF_STMT ((*ops)[idx]->op);
> +      gimple_set_visited (def, true);
> +      (*ops)[idx]->op = gimple_assign_lhs (gs);
> +      (*ops)[idx]->rank = get_rank ((*ops)[idx]->op);
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +	{
> +	  fprintf (dump_file, "Generating bit_field_ref -> ");
> +	  print_gimple_stmt (dump_file, gs, 0);
> +	}
> +    }
> +
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    {
> +      fprintf (dump_file, "undistributiong bit_field_ref for vector done.\n");
> +    }
> +
> +  cleanup_vinfo_map (v_info_map);
> +
> +  return true;
> +}
> +
>  /* If OPCODE is BIT_IOR_EXPR or BIT_AND_EXPR and CURR is a comparison
>     expression, examine the other OPS to see if any of them are comparisons
>     of the same values, which we may be able to combine or eliminate.
> @@ -5880,11 +6169,6 @@ reassociate_bb (basic_block bb)
>  	  tree lhs, rhs1, rhs2;
>  	  enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
>  
> -	  /* If this is not a gimple binary expression, there is
> -	     nothing for us to do with it.  */
> -	  if (get_gimple_rhs_class (rhs_code) != GIMPLE_BINARY_RHS)
> -	    continue;
> -
>  	  /* If this was part of an already processed statement,
>  	     we don't need to touch it again. */
>  	  if (gimple_visited_p (stmt))
> @@ -5911,6 +6195,11 @@ reassociate_bb (basic_block bb)
>  	      continue;
>  	    }
>  
> +	  /* If this is not a gimple binary expression, there is
> +	     nothing for us to do with it.  */
> +	  if (get_gimple_rhs_class (rhs_code) != GIMPLE_BINARY_RHS)
> +	    continue;
> +
>  	  lhs = gimple_assign_lhs (stmt);
>  	  rhs1 = gimple_assign_rhs1 (stmt);
>  	  rhs2 = gimple_assign_rhs2 (stmt);
> @@ -5950,6 +6239,13 @@ reassociate_bb (basic_block bb)
>  		  optimize_ops_list (rhs_code, &ops);
>  		}
>  
> +	      if (undistribute_bitref_for_vector (rhs_code, &ops,
> +						  loop_containing_stmt (stmt)))
> +		{
> +		  ops.qsort (sort_by_operand_rank);
> +		  optimize_ops_list (rhs_code, &ops);
> +		}
> +
>  	      if (rhs_code == PLUS_EXPR
>  		  && transform_add_to_multiply (&ops))
>  		ops.qsort (sort_by_operand_rank);
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PING] [PATCH V3] PR88497 - Extend reassoc for vector bit_field_ref
  2019-05-05  6:15 ` Kewen.Lin
@ 2019-05-21  2:03   ` Kewen.Lin
  2019-06-11  2:46     ` [PING^4] " Kewen.Lin
  0 siblings, 1 reply; 16+ messages in thread
From: Kewen.Lin @ 2019-05-21  2:03 UTC (permalink / raw)
  To: GCC Patches; +Cc: Bill Schmidt, Segher Boessenkool, Richard Guenther

Hi,

Gentle ping again.  Thanks!


Kewen

on 2019/5/5 脧脗脦莽2:15, Kewen.Lin wrote:
> Hi,
> 
> I'd like to gentle ping for this patch:
> https://gcc.gnu.org/ml/gcc-patches/2019-03/msg00966.html
> 
> OK for trunk now?
> 
> Thanks!
> 
> on 2019/3/20 脡脧脦莽11:14, Kewen.Lin wrote:
>> Hi,
>>
>> Please refer to below link for previous threads.
>> https://gcc.gnu.org/ml/gcc-patches/2019-03/msg00348.html
>>
>> Comparing to patch v2, I've moved up the vector operation target 
>> check upward together with vector type target check.  Besides, I
>> ran bootstrap and regtest on powerpc64-linux-gnu (BE), updated 
>> testcases' requirements and options for robustness.
>>
>> Is it OK for GCC10?
>>
>>
>> gcc/ChangeLog
>>
>> 2019-03-20  Kewen Lin  <linkw@gcc.gnu.org>
>>
>> 	PR target/88497
>> 	* tree-ssa-reassoc.c (reassociate_bb): Swap the positions of 
>> 	GIMPLE_BINARY_RHS check and gimple_visited_p check, call new 
>> 	function undistribute_bitref_for_vector.
>> 	(undistribute_bitref_for_vector): New function.
>> 	(cleanup_vinfo_map): Likewise.
>> 	(unsigned_cmp): Likewise.
>>
>> gcc/testsuite/ChangeLog
>>
>> 2019-03-20  Kewen Lin  <linkw@gcc.gnu.org>
>>
>> 	* gcc.dg/tree-ssa/pr88497-1.c: New test.
>> 	* gcc.dg/tree-ssa/pr88497-2.c: Likewise.
>> 	* gcc.dg/tree-ssa/pr88497-3.c: Likewise.
>> 	* gcc.dg/tree-ssa/pr88497-4.c: Likewise.
>> 	* gcc.dg/tree-ssa/pr88497-5.c: Likewise.
>>
>> ---
>>  gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c |  44 +++++
>>  gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c |  33 ++++
>>  gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c |  33 ++++
>>  gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c |  33 ++++
>>  gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c |  33 ++++
>>  gcc/tree-ssa-reassoc.c                    | 306 +++++++++++++++++++++++++++++-
>>  6 files changed, 477 insertions(+), 5 deletions(-)
>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c
>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c
>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c
>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c
>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c
>>
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c
>> new file mode 100644
>> index 0000000..99c9af8
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c
>> @@ -0,0 +1,44 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target vect_double } */
>> +/* { dg-require-effective-target powerpc_vsx_ok { target { powerpc*-*-* } } } */
>> +/* { dg-options "-O2 -ffast-math" } */
>> +/* { dg-options "-O2 -ffast-math -mvsx -fdump-tree-reassoc1" { target { powerpc*-*-* } } } */
>> +
>> +/* To test reassoc can undistribute vector bit_field_ref summation.
>> +
>> +   arg1 and arg2 are two arrays whose elements of type vector double.
>> +   Assuming:
>> +     A0 = arg1[0], A1 = arg1[1], A2 = arg1[2], A3 = arg1[3],
>> +     B0 = arg2[0], B1 = arg2[1], B2 = arg2[2], B3 = arg2[3],
>> +
>> +   Then:
>> +     V0 = A0 * B0, V1 = A1 * B1, V2 = A2 * B2, V3 = A3 * B3,
>> +
>> +   reassoc transforms
>> +
>> +     accumulator += V0[0] + V0[1] + V1[0] + V1[1] + V2[0] + V2[1]
>> +		  + V3[0] + V3[1];
>> +
>> +   into:
>> +
>> +     T = V0 + V1 + V2 + V3
>> +     accumulator += T[0] + T[1];
>> +
>> +   Fewer bit_field_refs, only two for 128 or more bits vector.  */
>> +
>> +typedef double v2df __attribute__ ((vector_size (16)));
>> +double
>> +test (double accumulator, v2df arg1[], v2df arg2[])
>> +{
>> +  v2df temp;
>> +  temp = arg1[0] * arg2[0];
>> +  accumulator += temp[0] + temp[1];
>> +  temp = arg1[1] * arg2[1];
>> +  accumulator += temp[0] + temp[1];
>> +  temp = arg1[2] * arg2[2];
>> +  accumulator += temp[0] + temp[1];
>> +  temp = arg1[3] * arg2[3];
>> +  accumulator += temp[0] + temp[1];
>> +  return accumulator;
>> +}
>> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 2 "reassoc1" { target { powerpc*-*-* } } } } */
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c
>> new file mode 100644
>> index 0000000..61ed0bf5
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c
>> @@ -0,0 +1,33 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target vect_float } */
>> +/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */
>> +/* { dg-options "-O2 -ffast-math" } */
>> +/* { dg-options "-O2 -ffast-math -maltivec -fdump-tree-reassoc1" { target { powerpc*-*-* } } } */
>> +
>> +/* To test reassoc can undistribute vector bit_field_ref on multiplication.
>> +
>> +   v1, v2, v3, v4 of type vector float.
>> +
>> +   reassoc transforms
>> +
>> +     accumulator *= v1[0] * v1[1] * v1[2] * v1[3] *
>> +                    v2[0] * v2[1] * v2[2] * v2[3] *
>> +                    v3[0] * v3[1] * v3[2] * v3[3] *
>> +                    v4[0] * v4[1] * v4[2] * v4[3] ;
>> +
>> +   into:
>> +
>> +     T = v1 * v2 * v3 * v4;
>> +     accumulator *= T[0] * T[1] * T[2] * T[3];
>> +
>> +   Fewer bit_field_refs, only four for 128 or more bits vector.  */
>> +
>> +typedef float v4si __attribute__((vector_size(16)));
>> +float test(float accumulator, v4si v1, v4si v2, v4si v3, v4si v4) {
>> +  accumulator *= v1[0] * v1[1] * v1[2] * v1[3];
>> +  accumulator *= v2[0] * v2[1] * v2[2] * v2[3];
>> +  accumulator *= v3[0] * v3[1] * v3[2] * v3[3];
>> +  accumulator *= v4[0] * v4[1] * v4[2] * v4[3];
>> +  return accumulator;
>> +}
>> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 4 "reassoc1" { target { powerpc*-*-* } } } } */
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c
>> new file mode 100644
>> index 0000000..3790afc
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c
>> @@ -0,0 +1,33 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target vect_int } */
>> +/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */
>> +/* { dg-options "-O2 -ffast-math" } */
>> +/* { dg-options "-O2 -ffast-math -maltivec -fdump-tree-reassoc1" { target { powerpc*-*-* } } } */
>> +
>> +/* To test reassoc can undistribute vector bit_field_ref on bitwise AND.
>> +
>> +   v1, v2, v3, v4 of type vector int.
>> +
>> +   reassoc transforms
>> +
>> +     accumulator &= v1[0] & v1[1] & v1[2] & v1[3] &
>> +                    v2[0] & v2[1] & v2[2] & v2[3] &
>> +                    v3[0] & v3[1] & v3[2] & v3[3] &
>> +                    v4[0] & v4[1] & v4[2] & v4[3] ;
>> +
>> +   into:
>> +
>> +     T = v1 & v2 & v3 & v4;
>> +     accumulator &= T[0] & T[1] & T[2] & T[3];
>> +
>> +   Fewer bit_field_refs, only four for 128 or more bits vector.  */
>> +
>> +typedef int v4si __attribute__((vector_size(16)));
>> +int test(int accumulator, v4si v1, v4si v2, v4si v3, v4si v4) {
>> +  accumulator &= v1[0] & v1[1] & v1[2] & v1[3];
>> +  accumulator &= v2[0] & v2[1] & v2[2] & v2[3];
>> +  accumulator &= v3[0] & v3[1] & v3[2] & v3[3];
>> +  accumulator &= v4[0] & v4[1] & v4[2] & v4[3];
>> +  return accumulator;
>> +}
>> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 4 "reassoc1" { target { powerpc*-*-* } } } } */
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c
>> new file mode 100644
>> index 0000000..1864aad
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c
>> @@ -0,0 +1,33 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target vect_int } */
>> +/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */
>> +/* { dg-options "-O2 -ffast-math" } */
>> +/* { dg-options "-O2 -ffast-math -maltivec -fdump-tree-reassoc1" { target { powerpc*-*-* } } } */
>> +
>> +/* To test reassoc can undistribute vector bit_field_ref on bitwise IOR.
>> +
>> +   v1, v2, v3, v4 of type vector int.
>> +
>> +   reassoc transforms
>> +
>> +     accumulator |= v1[0] | v1[1] | v1[2] | v1[3] |
>> +                    v2[0] | v2[1] | v2[2] | v2[3] |
>> +                    v3[0] | v3[1] | v3[2] | v3[3] |
>> +                    v4[0] | v4[1] | v4[2] | v4[3] ;
>> +
>> +   into:
>> +
>> +     T = v1 | v2 | v3 | v4;
>> +     accumulator |= T[0] | T[1] | T[2] | T[3];
>> +
>> +   Fewer bit_field_refs, only four for 128 or more bits vector.  */
>> +
>> +typedef int v4si __attribute__((vector_size(16)));
>> +int test(int accumulator, v4si v1, v4si v2, v4si v3, v4si v4) {
>> +  accumulator |= v1[0] | v1[1] | v1[2] | v1[3];
>> +  accumulator |= v2[0] | v2[1] | v2[2] | v2[3];
>> +  accumulator |= v3[0] | v3[1] | v3[2] | v3[3];
>> +  accumulator |= v4[0] | v4[1] | v4[2] | v4[3];
>> +  return accumulator;
>> +}
>> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 4 "reassoc1" { target { powerpc*-*-* } } } } */
>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c
>> new file mode 100644
>> index 0000000..f747372
>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c
>> @@ -0,0 +1,33 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target vect_int } */
>> +/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */
>> +/* { dg-options "-O2 -ffast-math" } */
>> +/* { dg-options "-O2 -ffast-math -maltivec -fdump-tree-reassoc1" { target { powerpc*-*-* } } } */
>> +
>> +/* To test reassoc can undistribute vector bit_field_ref on bitwise XOR.
>> +
>> +   v1, v2, v3, v4 of type vector int.
>> +
>> +   reassoc transforms
>> +
>> +     accumulator ^= v1[0] ^ v1[1] ^ v1[2] ^ v1[3] ^
>> +                    v2[0] ^ v2[1] ^ v2[2] ^ v2[3] ^
>> +                    v3[0] ^ v3[1] ^ v3[2] ^ v3[3] ^
>> +                    v4[0] ^ v4[1] ^ v4[2] ^ v4[3] ;
>> +
>> +   into:
>> +
>> +     T = v1 ^ v2 ^ v3 ^ v4;
>> +     accumulator ^= T[0] ^ T[1] ^ T[2] ^ T[3];
>> +
>> +   Fewer bit_field_refs, only four for 128 or more bits vector.  */
>> +
>> +typedef int v4si __attribute__((vector_size(16)));
>> +int test(int accumulator, v4si v1, v4si v2, v4si v3, v4si v4) {
>> +  accumulator ^= v1[0] ^ v1[1] ^ v1[2] ^ v1[3];
>> +  accumulator ^= v2[0] ^ v2[1] ^ v2[2] ^ v2[3];
>> +  accumulator ^= v3[0] ^ v3[1] ^ v3[2] ^ v3[3];
>> +  accumulator ^= v4[0] ^ v4[1] ^ v4[2] ^ v4[3];
>> +  return accumulator;
>> +}
>> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 4 "reassoc1" { target { powerpc*-*-* } } } } */
>> diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
>> index e1c4dfe..a6cd85a 100644
>> --- a/gcc/tree-ssa-reassoc.c
>> +++ b/gcc/tree-ssa-reassoc.c
>> @@ -1772,6 +1772,295 @@ undistribute_ops_list (enum tree_code opcode,
>>    return changed;
>>  }
>>  
>> +/* Hold the information of one specific VECTOR_TYPE SSA_NAME.
>> +    - offsets: for different BIT_FIELD_REF offsets accessing same VECTOR.
>> +    - ops_indexes: the index of vec ops* for each relavant BIT_FIELD_REF.  */
>> +struct v_info
>> +{
>> +  auto_vec<unsigned HOST_WIDE_INT, 32> offsets;
>> +  auto_vec<unsigned, 32> ops_indexes;
>> +};
>> +
>> +typedef struct v_info *v_info_ptr;
>> +
>> +/* Comparison function for qsort on unsigned BIT_FIELD_REF offsets.  */
>> +static int
>> +unsigned_cmp (const void *p_i, const void *p_j)
>> +{
>> +  if (*(const unsigned HOST_WIDE_INT *) p_i
>> +      >= *(const unsigned HOST_WIDE_INT *) p_j)
>> +    return 1;
>> +  else
>> +    return -1;
>> +}
>> +
>> +/* Cleanup hash map for VECTOR information.  */
>> +static void
>> +cleanup_vinfo_map (hash_map<tree, v_info_ptr> &info_map)
>> +{
>> +  for (hash_map<tree, v_info_ptr>::iterator it = info_map.begin ();
>> +       it != info_map.end (); ++it)
>> +    {
>> +      v_info_ptr info = (*it).second;
>> +      delete info;
>> +      (*it).second = NULL;
>> +    }
>> +}
>> +
>> +/* Perform un-distribution of BIT_FIELD_REF on VECTOR_TYPE.
>> +     V1[0] + V1[1] + ... + V1[k] + V2[0] + V2[1] + ... + V2[k] + ... Vn[k]
>> +   is transformed to
>> +     Vs = (V1 + V2 + ... + Vn)
>> +     Vs[0] + Vs[1] + ... + Vs[k]
>> +
>> +   The basic steps are listed below:
>> +
>> +    1) Check the addition chain *OPS by looking those summands coming from
>> +       VECTOR bit_field_ref on VECTOR type. Put the information into
>> +       v_info_map for each satisfied summand, using VECTOR SSA_NAME as key.
>> +
>> +    2) For each key (VECTOR SSA_NAME), validate all its BIT_FIELD_REFs are
>> +       continous, they can cover the whole VECTOR perfectly without any holes.
>> +       Obtain one VECTOR list which contain candidates to be transformed.
>> +
>> +    3) Build the addition statements for all VECTOR candidates, generate
>> +       BIT_FIELD_REFs accordingly.
>> +
>> +   TODO:
>> +    1) The current implementation restrict all candidate VECTORs should have
>> +       the same VECTOR type, but it can be extended into different groups by
>> +       VECTOR types in future if any profitable cases found.
>> +    2) The current implementation requires the whole VECTORs should be fully
>> +       covered, but it can be extended to support partial, checking adjacent
>> +       but not fill the whole, it may need some cost model to define the
>> +       boundary to do or not.
>> +*/
>> +static bool
>> +undistribute_bitref_for_vector (enum tree_code opcode, vec<operand_entry *> *ops,
>> +			     struct loop *loop)
>> +{
>> +  if (ops->length () <= 1)
>> +    return false;
>> +
>> +  if (opcode != PLUS_EXPR && opcode != MULT_EXPR && opcode != BIT_XOR_EXPR
>> +      && opcode != BIT_IOR_EXPR && opcode != BIT_AND_EXPR)
>> +    return false;
>> +
>> +  hash_map<tree, v_info_ptr> v_info_map;
>> +  operand_entry *oe1;
>> +  unsigned i;
>> +
>> +  /* Find those summands from VECTOR BIT_FIELD_REF in addition chain, put the
>> +     information into map.  */
>> +  FOR_EACH_VEC_ELT (*ops, i, oe1)
>> +    {
>> +      enum tree_code dcode;
>> +      gimple *oe1def;
>> +
>> +      if (TREE_CODE (oe1->op) != SSA_NAME)
>> +	continue;
>> +      oe1def = SSA_NAME_DEF_STMT (oe1->op);
>> +      if (!is_gimple_assign (oe1def))
>> +	continue;
>> +      dcode = gimple_assign_rhs_code (oe1def);
>> +      if (dcode != BIT_FIELD_REF || !is_reassociable_op (oe1def, dcode, loop))
>> +	continue;
>> +
>> +      tree rhs = gimple_op (oe1def, 1);
>> +      tree op0 = TREE_OPERAND (rhs, 0);
>> +      tree vec_type = TREE_TYPE (op0);
>> +
>> +      if (TREE_CODE (op0) != SSA_NAME || TREE_CODE (vec_type) != VECTOR_TYPE)
>> +	continue;
>> +
>> +      tree op1 = TREE_OPERAND (rhs, 1);
>> +      tree op2 = TREE_OPERAND (rhs, 2);
>> +
>> +      tree elem_type = TREE_TYPE (vec_type);
>> +      unsigned HOST_WIDE_INT size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
>> +      if (size != TREE_INT_CST_LOW (op1))
>> +	continue;
>> +
>> +      /* Ignore it if target machine can't support this VECTOR type.  */
>> +      if (!VECTOR_MODE_P (TYPE_MODE (vec_type)))
>> +	continue;
>> +
>> +      /* Ignore it if target machine can't support this type of VECTOR
>> +         operation.  */
>> +      optab op_tab = optab_for_tree_code (opcode, vec_type, optab_vector);
>> +      if (optab_handler (op_tab, TYPE_MODE (vec_type)) == CODE_FOR_nothing)
>> +	continue;
>> +
>> +      v_info_ptr *info_ptr = v_info_map.get (op0);
>> +      if (info_ptr)
>> +	{
>> +	  v_info_ptr info = *info_ptr;
>> +	  info->offsets.safe_push (TREE_INT_CST_LOW (op2));
>> +	  info->ops_indexes.safe_push (i);
>> +	}
>> +      else
>> +	{
>> +	  v_info_ptr info = new v_info;
>> +	  info->offsets.safe_push (TREE_INT_CST_LOW (op2));
>> +	  info->ops_indexes.safe_push (i);
>> +	  v_info_map.put (op0, info);
>> +	}
>> +    }
>> +
>> +  /* At least two VECTOR to combine.  */
>> +  if (v_info_map.elements () <= 1)
>> +    {
>> +      cleanup_vinfo_map (v_info_map);
>> +      return false;
>> +    }
>> +
>> +  /* Use the first VECTOR and its information as the reference.
>> +     Firstly, we should validate it, that is:
>> +       1) sorted offsets are adjacent, no holes.
>> +       2) can fill the whole VECTOR perfectly.  */
>> +  hash_map<tree, v_info_ptr>::iterator it = v_info_map.begin ();
>> +  tree ref_vec = (*it).first;
>> +  v_info_ptr ref_info = (*it).second;
>> +  ref_info->offsets.qsort (unsigned_cmp);
>> +  tree vec_type = TREE_TYPE (ref_vec);
>> +  tree elem_type = TREE_TYPE (vec_type);
>> +  unsigned HOST_WIDE_INT elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
>> +  unsigned HOST_WIDE_INT curr;
>> +  unsigned HOST_WIDE_INT prev = ref_info->offsets[0];
>> +
>> +  /* Continous check.  */
>> +  FOR_EACH_VEC_ELT_FROM (ref_info->offsets, i, curr, 1)
>> +    {
>> +      if (curr != (prev + elem_size))
>> +	{
>> +	  cleanup_vinfo_map (v_info_map);
>> +	  return false;
>> +	}
>> +      prev = curr;
>> +    }
>> +
>> +  /* Check whether fill the whole.  */
>> +  if ((prev + elem_size) != TREE_INT_CST_LOW (TYPE_SIZE (TREE_TYPE (ref_vec))))
>> +    {
>> +      cleanup_vinfo_map (v_info_map);
>> +      return false;
>> +    }
>> +
>> +  auto_vec<tree> vectors (v_info_map.elements ());
>> +  vectors.quick_push (ref_vec);
>> +
>> +  /* Use the ref_vec to filter others.  */
>> +  for (++it; it != v_info_map.end (); ++it)
>> +    {
>> +      tree vec = (*it).first;
>> +      v_info_ptr info = (*it).second;
>> +      if (TREE_TYPE (ref_vec) != TREE_TYPE (vec))
>> +	continue;
>> +      if (ref_info->offsets.length () != info->offsets.length ())
>> +	continue;
>> +      bool same_offset = true;
>> +      info->offsets.qsort (unsigned_cmp);
>> +      for (unsigned i = 0; i < ref_info->offsets.length (); i++)
>> +	{
>> +	  if (ref_info->offsets[i] != info->offsets[i])
>> +	    {
>> +	      same_offset = false;
>> +	      break;
>> +	    }
>> +	}
>> +      if (!same_offset)
>> +	continue;
>> +      vectors.quick_push (vec);
>> +    }
>> +
>> +  if (vectors.length () < 2)
>> +    {
>> +      cleanup_vinfo_map (v_info_map);
>> +      return false;
>> +    }
>> +
>> +  tree tr;
>> +  if (dump_file && (dump_flags & TDF_DETAILS))
>> +    {
>> +      fprintf (dump_file, "The bit_field_ref vector list for undistribute: ");
>> +      FOR_EACH_VEC_ELT (vectors, i, tr)
>> +	{
>> +	  print_generic_expr (dump_file, tr);
>> +	  fprintf (dump_file, "  ");
>> +	}
>> +      fprintf (dump_file, "\n");
>> +    }
>> +
>> +  /* Build the sum for all candidate VECTORs.  */
>> +  unsigned idx;
>> +  gimple *sum = NULL;
>> +  v_info_ptr info;
>> +  tree sum_vec = ref_vec;
>> +  FOR_EACH_VEC_ELT_FROM (vectors, i, tr, 1)
>> +    {
>> +      sum = build_and_add_sum (TREE_TYPE (ref_vec), sum_vec, tr, opcode);
>> +      info = *(v_info_map.get (tr));
>> +      unsigned j;
>> +      FOR_EACH_VEC_ELT (info->ops_indexes, j, idx)
>> +	{
>> +	  gimple *def = SSA_NAME_DEF_STMT ((*ops)[idx]->op);
>> +	  gimple_set_visited (def, true);
>> +	  if (opcode == PLUS_EXPR || opcode == BIT_XOR_EXPR
>> +	      || opcode == BIT_IOR_EXPR)
>> +	    (*ops)[idx]->op = build_zero_cst (TREE_TYPE ((*ops)[idx]->op));
>> +	  else if (opcode == MULT_EXPR)
>> +	    (*ops)[idx]->op = build_one_cst (TREE_TYPE ((*ops)[idx]->op));
>> +	  else
>> +	    {
>> +	      gcc_assert (opcode == BIT_AND_EXPR);
>> +	      (*ops)[idx]->op
>> +		= build_all_ones_cst (TREE_TYPE ((*ops)[idx]->op));
>> +	    }
>> +	  (*ops)[idx]->rank = 0;
>> +	}
>> +      sum_vec = gimple_get_lhs (sum);
>> +      if (dump_file && (dump_flags & TDF_DETAILS))
>> +	{
>> +	  fprintf (dump_file, "Generating addition -> ");
>> +	  print_gimple_stmt (dump_file, sum, 0);
>> +	}
>> +    }
>> +
>> +  /* Referring to any good shape VECTOR (here using ref_vec), generate the
>> +     BIT_FIELD_REF statements accordingly.  */
>> +  info = *(v_info_map.get (ref_vec));
>> +  gcc_assert (sum);
>> +  FOR_EACH_VEC_ELT (info->ops_indexes, i, idx)
>> +    {
>> +      tree dst = make_ssa_name (elem_type);
>> +      gimple *gs
>> +	= gimple_build_assign (dst, BIT_FIELD_REF,
>> +			       build3 (BIT_FIELD_REF, elem_type, sum_vec,
>> +				       TYPE_SIZE (elem_type),
>> +				       bitsize_int (info->offsets[i])));
>> +      insert_stmt_after (gs, sum);
>> +      update_stmt (gs);
>> +      gimple *def = SSA_NAME_DEF_STMT ((*ops)[idx]->op);
>> +      gimple_set_visited (def, true);
>> +      (*ops)[idx]->op = gimple_assign_lhs (gs);
>> +      (*ops)[idx]->rank = get_rank ((*ops)[idx]->op);
>> +      if (dump_file && (dump_flags & TDF_DETAILS))
>> +	{
>> +	  fprintf (dump_file, "Generating bit_field_ref -> ");
>> +	  print_gimple_stmt (dump_file, gs, 0);
>> +	}
>> +    }
>> +
>> +  if (dump_file && (dump_flags & TDF_DETAILS))
>> +    {
>> +      fprintf (dump_file, "undistributiong bit_field_ref for vector done.\n");
>> +    }
>> +
>> +  cleanup_vinfo_map (v_info_map);
>> +
>> +  return true;
>> +}
>> +
>>  /* If OPCODE is BIT_IOR_EXPR or BIT_AND_EXPR and CURR is a comparison
>>     expression, examine the other OPS to see if any of them are comparisons
>>     of the same values, which we may be able to combine or eliminate.
>> @@ -5880,11 +6169,6 @@ reassociate_bb (basic_block bb)
>>  	  tree lhs, rhs1, rhs2;
>>  	  enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
>>  
>> -	  /* If this is not a gimple binary expression, there is
>> -	     nothing for us to do with it.  */
>> -	  if (get_gimple_rhs_class (rhs_code) != GIMPLE_BINARY_RHS)
>> -	    continue;
>> -
>>  	  /* If this was part of an already processed statement,
>>  	     we don't need to touch it again. */
>>  	  if (gimple_visited_p (stmt))
>> @@ -5911,6 +6195,11 @@ reassociate_bb (basic_block bb)
>>  	      continue;
>>  	    }
>>  
>> +	  /* If this is not a gimple binary expression, there is
>> +	     nothing for us to do with it.  */
>> +	  if (get_gimple_rhs_class (rhs_code) != GIMPLE_BINARY_RHS)
>> +	    continue;
>> +
>>  	  lhs = gimple_assign_lhs (stmt);
>>  	  rhs1 = gimple_assign_rhs1 (stmt);
>>  	  rhs2 = gimple_assign_rhs2 (stmt);
>> @@ -5950,6 +6239,13 @@ reassociate_bb (basic_block bb)
>>  		  optimize_ops_list (rhs_code, &ops);
>>  		}
>>  
>> +	      if (undistribute_bitref_for_vector (rhs_code, &ops,
>> +						  loop_containing_stmt (stmt)))
>> +		{
>> +		  ops.qsort (sort_by_operand_rank);
>> +		  optimize_ops_list (rhs_code, &ops);
>> +		}
>> +
>>  	      if (rhs_code == PLUS_EXPR
>>  		  && transform_add_to_multiply (&ops))
>>  		ops.qsort (sort_by_operand_rank);
>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PING^4] [PATCH V3] PR88497 - Extend reassoc for vector bit_field_ref
  2019-05-21  2:03   ` Kewen.Lin
@ 2019-06-11  2:46     ` Kewen.Lin
  2019-06-26  5:37       ` [PING^5] " Kewen.Lin
  0 siblings, 1 reply; 16+ messages in thread
From: Kewen.Lin @ 2019-06-11  2:46 UTC (permalink / raw)
  To: GCC Patches; +Cc: Bill Schmidt, Segher Boessenkool, Richard Guenther

Hi,

Gentle ping again.  Thanks!

Kewen

on 2019/5/21 脡脧脦莽10:02, Kewen.Lin wrote:
> Hi,
> 
> Gentle ping again.  Thanks!
> 
> 
> Kewen
> 
> on 2019/5/5 脧脗脦莽2:15, Kewen.Lin wrote:
>> Hi,
>>
>> I'd like to gentle ping for this patch:
>> https://gcc.gnu.org/ml/gcc-patches/2019-03/msg00966.html
>>
>> OK for trunk now?
>>
>> Thanks!
>>
>> on 2019/3/20 脡脧脦莽11:14, Kewen.Lin wrote:
>>> Hi,
>>>
>>> Please refer to below link for previous threads.
>>> https://gcc.gnu.org/ml/gcc-patches/2019-03/msg00348.html
>>>
>>> Comparing to patch v2, I've moved up the vector operation target 
>>> check upward together with vector type target check.  Besides, I
>>> ran bootstrap and regtest on powerpc64-linux-gnu (BE), updated 
>>> testcases' requirements and options for robustness.
>>>
>>> Is it OK for GCC10?
>>>
>>>
>>> gcc/ChangeLog
>>>
>>> 2019-03-20  Kewen Lin  <linkw@gcc.gnu.org>
>>>
>>> 	PR target/88497
>>> 	* tree-ssa-reassoc.c (reassociate_bb): Swap the positions of 
>>> 	GIMPLE_BINARY_RHS check and gimple_visited_p check, call new 
>>> 	function undistribute_bitref_for_vector.
>>> 	(undistribute_bitref_for_vector): New function.
>>> 	(cleanup_vinfo_map): Likewise.
>>> 	(unsigned_cmp): Likewise.
>>>
>>> gcc/testsuite/ChangeLog
>>>
>>> 2019-03-20  Kewen Lin  <linkw@gcc.gnu.org>
>>>
>>> 	* gcc.dg/tree-ssa/pr88497-1.c: New test.
>>> 	* gcc.dg/tree-ssa/pr88497-2.c: Likewise.
>>> 	* gcc.dg/tree-ssa/pr88497-3.c: Likewise.
>>> 	* gcc.dg/tree-ssa/pr88497-4.c: Likewise.
>>> 	* gcc.dg/tree-ssa/pr88497-5.c: Likewise.
>>>
>>> ---
>>>  gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c |  44 +++++
>>>  gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c |  33 ++++
>>>  gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c |  33 ++++
>>>  gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c |  33 ++++
>>>  gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c |  33 ++++
>>>  gcc/tree-ssa-reassoc.c                    | 306 +++++++++++++++++++++++++++++-
>>>  6 files changed, 477 insertions(+), 5 deletions(-)
>>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c
>>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c
>>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c
>>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c
>>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c
>>>
>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c
>>> new file mode 100644
>>> index 0000000..99c9af8
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c
>>> @@ -0,0 +1,44 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-require-effective-target vect_double } */
>>> +/* { dg-require-effective-target powerpc_vsx_ok { target { powerpc*-*-* } } } */
>>> +/* { dg-options "-O2 -ffast-math" } */
>>> +/* { dg-options "-O2 -ffast-math -mvsx -fdump-tree-reassoc1" { target { powerpc*-*-* } } } */
>>> +
>>> +/* To test reassoc can undistribute vector bit_field_ref summation.
>>> +
>>> +   arg1 and arg2 are two arrays whose elements of type vector double.
>>> +   Assuming:
>>> +     A0 = arg1[0], A1 = arg1[1], A2 = arg1[2], A3 = arg1[3],
>>> +     B0 = arg2[0], B1 = arg2[1], B2 = arg2[2], B3 = arg2[3],
>>> +
>>> +   Then:
>>> +     V0 = A0 * B0, V1 = A1 * B1, V2 = A2 * B2, V3 = A3 * B3,
>>> +
>>> +   reassoc transforms
>>> +
>>> +     accumulator += V0[0] + V0[1] + V1[0] + V1[1] + V2[0] + V2[1]
>>> +		  + V3[0] + V3[1];
>>> +
>>> +   into:
>>> +
>>> +     T = V0 + V1 + V2 + V3
>>> +     accumulator += T[0] + T[1];
>>> +
>>> +   Fewer bit_field_refs, only two for 128 or more bits vector.  */
>>> +
>>> +typedef double v2df __attribute__ ((vector_size (16)));
>>> +double
>>> +test (double accumulator, v2df arg1[], v2df arg2[])
>>> +{
>>> +  v2df temp;
>>> +  temp = arg1[0] * arg2[0];
>>> +  accumulator += temp[0] + temp[1];
>>> +  temp = arg1[1] * arg2[1];
>>> +  accumulator += temp[0] + temp[1];
>>> +  temp = arg1[2] * arg2[2];
>>> +  accumulator += temp[0] + temp[1];
>>> +  temp = arg1[3] * arg2[3];
>>> +  accumulator += temp[0] + temp[1];
>>> +  return accumulator;
>>> +}
>>> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 2 "reassoc1" { target { powerpc*-*-* } } } } */
>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c
>>> new file mode 100644
>>> index 0000000..61ed0bf5
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c
>>> @@ -0,0 +1,33 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-require-effective-target vect_float } */
>>> +/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */
>>> +/* { dg-options "-O2 -ffast-math" } */
>>> +/* { dg-options "-O2 -ffast-math -maltivec -fdump-tree-reassoc1" { target { powerpc*-*-* } } } */
>>> +
>>> +/* To test reassoc can undistribute vector bit_field_ref on multiplication.
>>> +
>>> +   v1, v2, v3, v4 of type vector float.
>>> +
>>> +   reassoc transforms
>>> +
>>> +     accumulator *= v1[0] * v1[1] * v1[2] * v1[3] *
>>> +                    v2[0] * v2[1] * v2[2] * v2[3] *
>>> +                    v3[0] * v3[1] * v3[2] * v3[3] *
>>> +                    v4[0] * v4[1] * v4[2] * v4[3] ;
>>> +
>>> +   into:
>>> +
>>> +     T = v1 * v2 * v3 * v4;
>>> +     accumulator *= T[0] * T[1] * T[2] * T[3];
>>> +
>>> +   Fewer bit_field_refs, only four for 128 or more bits vector.  */
>>> +
>>> +typedef float v4si __attribute__((vector_size(16)));
>>> +float test(float accumulator, v4si v1, v4si v2, v4si v3, v4si v4) {
>>> +  accumulator *= v1[0] * v1[1] * v1[2] * v1[3];
>>> +  accumulator *= v2[0] * v2[1] * v2[2] * v2[3];
>>> +  accumulator *= v3[0] * v3[1] * v3[2] * v3[3];
>>> +  accumulator *= v4[0] * v4[1] * v4[2] * v4[3];
>>> +  return accumulator;
>>> +}
>>> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 4 "reassoc1" { target { powerpc*-*-* } } } } */
>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c
>>> new file mode 100644
>>> index 0000000..3790afc
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c
>>> @@ -0,0 +1,33 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-require-effective-target vect_int } */
>>> +/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */
>>> +/* { dg-options "-O2 -ffast-math" } */
>>> +/* { dg-options "-O2 -ffast-math -maltivec -fdump-tree-reassoc1" { target { powerpc*-*-* } } } */
>>> +
>>> +/* To test reassoc can undistribute vector bit_field_ref on bitwise AND.
>>> +
>>> +   v1, v2, v3, v4 of type vector int.
>>> +
>>> +   reassoc transforms
>>> +
>>> +     accumulator &= v1[0] & v1[1] & v1[2] & v1[3] &
>>> +                    v2[0] & v2[1] & v2[2] & v2[3] &
>>> +                    v3[0] & v3[1] & v3[2] & v3[3] &
>>> +                    v4[0] & v4[1] & v4[2] & v4[3] ;
>>> +
>>> +   into:
>>> +
>>> +     T = v1 & v2 & v3 & v4;
>>> +     accumulator &= T[0] & T[1] & T[2] & T[3];
>>> +
>>> +   Fewer bit_field_refs, only four for 128 or more bits vector.  */
>>> +
>>> +typedef int v4si __attribute__((vector_size(16)));
>>> +int test(int accumulator, v4si v1, v4si v2, v4si v3, v4si v4) {
>>> +  accumulator &= v1[0] & v1[1] & v1[2] & v1[3];
>>> +  accumulator &= v2[0] & v2[1] & v2[2] & v2[3];
>>> +  accumulator &= v3[0] & v3[1] & v3[2] & v3[3];
>>> +  accumulator &= v4[0] & v4[1] & v4[2] & v4[3];
>>> +  return accumulator;
>>> +}
>>> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 4 "reassoc1" { target { powerpc*-*-* } } } } */
>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c
>>> new file mode 100644
>>> index 0000000..1864aad
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c
>>> @@ -0,0 +1,33 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-require-effective-target vect_int } */
>>> +/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */
>>> +/* { dg-options "-O2 -ffast-math" } */
>>> +/* { dg-options "-O2 -ffast-math -maltivec -fdump-tree-reassoc1" { target { powerpc*-*-* } } } */
>>> +
>>> +/* To test reassoc can undistribute vector bit_field_ref on bitwise IOR.
>>> +
>>> +   v1, v2, v3, v4 of type vector int.
>>> +
>>> +   reassoc transforms
>>> +
>>> +     accumulator |= v1[0] | v1[1] | v1[2] | v1[3] |
>>> +                    v2[0] | v2[1] | v2[2] | v2[3] |
>>> +                    v3[0] | v3[1] | v3[2] | v3[3] |
>>> +                    v4[0] | v4[1] | v4[2] | v4[3] ;
>>> +
>>> +   into:
>>> +
>>> +     T = v1 | v2 | v3 | v4;
>>> +     accumulator |= T[0] | T[1] | T[2] | T[3];
>>> +
>>> +   Fewer bit_field_refs, only four for 128 or more bits vector.  */
>>> +
>>> +typedef int v4si __attribute__((vector_size(16)));
>>> +int test(int accumulator, v4si v1, v4si v2, v4si v3, v4si v4) {
>>> +  accumulator |= v1[0] | v1[1] | v1[2] | v1[3];
>>> +  accumulator |= v2[0] | v2[1] | v2[2] | v2[3];
>>> +  accumulator |= v3[0] | v3[1] | v3[2] | v3[3];
>>> +  accumulator |= v4[0] | v4[1] | v4[2] | v4[3];
>>> +  return accumulator;
>>> +}
>>> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 4 "reassoc1" { target { powerpc*-*-* } } } } */
>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c
>>> new file mode 100644
>>> index 0000000..f747372
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c
>>> @@ -0,0 +1,33 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-require-effective-target vect_int } */
>>> +/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */
>>> +/* { dg-options "-O2 -ffast-math" } */
>>> +/* { dg-options "-O2 -ffast-math -maltivec -fdump-tree-reassoc1" { target { powerpc*-*-* } } } */
>>> +
>>> +/* To test reassoc can undistribute vector bit_field_ref on bitwise XOR.
>>> +
>>> +   v1, v2, v3, v4 of type vector int.
>>> +
>>> +   reassoc transforms
>>> +
>>> +     accumulator ^= v1[0] ^ v1[1] ^ v1[2] ^ v1[3] ^
>>> +                    v2[0] ^ v2[1] ^ v2[2] ^ v2[3] ^
>>> +                    v3[0] ^ v3[1] ^ v3[2] ^ v3[3] ^
>>> +                    v4[0] ^ v4[1] ^ v4[2] ^ v4[3] ;
>>> +
>>> +   into:
>>> +
>>> +     T = v1 ^ v2 ^ v3 ^ v4;
>>> +     accumulator ^= T[0] ^ T[1] ^ T[2] ^ T[3];
>>> +
>>> +   Fewer bit_field_refs, only four for 128 or more bits vector.  */
>>> +
>>> +typedef int v4si __attribute__((vector_size(16)));
>>> +int test(int accumulator, v4si v1, v4si v2, v4si v3, v4si v4) {
>>> +  accumulator ^= v1[0] ^ v1[1] ^ v1[2] ^ v1[3];
>>> +  accumulator ^= v2[0] ^ v2[1] ^ v2[2] ^ v2[3];
>>> +  accumulator ^= v3[0] ^ v3[1] ^ v3[2] ^ v3[3];
>>> +  accumulator ^= v4[0] ^ v4[1] ^ v4[2] ^ v4[3];
>>> +  return accumulator;
>>> +}
>>> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 4 "reassoc1" { target { powerpc*-*-* } } } } */
>>> diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
>>> index e1c4dfe..a6cd85a 100644
>>> --- a/gcc/tree-ssa-reassoc.c
>>> +++ b/gcc/tree-ssa-reassoc.c
>>> @@ -1772,6 +1772,295 @@ undistribute_ops_list (enum tree_code opcode,
>>>    return changed;
>>>  }
>>>  
>>> +/* Hold the information of one specific VECTOR_TYPE SSA_NAME.
>>> +    - offsets: for different BIT_FIELD_REF offsets accessing same VECTOR.
>>> +    - ops_indexes: the index of vec ops* for each relavant BIT_FIELD_REF.  */
>>> +struct v_info
>>> +{
>>> +  auto_vec<unsigned HOST_WIDE_INT, 32> offsets;
>>> +  auto_vec<unsigned, 32> ops_indexes;
>>> +};
>>> +
>>> +typedef struct v_info *v_info_ptr;
>>> +
>>> +/* Comparison function for qsort on unsigned BIT_FIELD_REF offsets.  */
>>> +static int
>>> +unsigned_cmp (const void *p_i, const void *p_j)
>>> +{
>>> +  if (*(const unsigned HOST_WIDE_INT *) p_i
>>> +      >= *(const unsigned HOST_WIDE_INT *) p_j)
>>> +    return 1;
>>> +  else
>>> +    return -1;
>>> +}
>>> +
>>> +/* Cleanup hash map for VECTOR information.  */
>>> +static void
>>> +cleanup_vinfo_map (hash_map<tree, v_info_ptr> &info_map)
>>> +{
>>> +  for (hash_map<tree, v_info_ptr>::iterator it = info_map.begin ();
>>> +       it != info_map.end (); ++it)
>>> +    {
>>> +      v_info_ptr info = (*it).second;
>>> +      delete info;
>>> +      (*it).second = NULL;
>>> +    }
>>> +}
>>> +
>>> +/* Perform un-distribution of BIT_FIELD_REF on VECTOR_TYPE.
>>> +     V1[0] + V1[1] + ... + V1[k] + V2[0] + V2[1] + ... + V2[k] + ... Vn[k]
>>> +   is transformed to
>>> +     Vs = (V1 + V2 + ... + Vn)
>>> +     Vs[0] + Vs[1] + ... + Vs[k]
>>> +
>>> +   The basic steps are listed below:
>>> +
>>> +    1) Check the addition chain *OPS by looking those summands coming from
>>> +       VECTOR bit_field_ref on VECTOR type. Put the information into
>>> +       v_info_map for each satisfied summand, using VECTOR SSA_NAME as key.
>>> +
>>> +    2) For each key (VECTOR SSA_NAME), validate all its BIT_FIELD_REFs are
>>> +       continous, they can cover the whole VECTOR perfectly without any holes.
>>> +       Obtain one VECTOR list which contain candidates to be transformed.
>>> +
>>> +    3) Build the addition statements for all VECTOR candidates, generate
>>> +       BIT_FIELD_REFs accordingly.
>>> +
>>> +   TODO:
>>> +    1) The current implementation restrict all candidate VECTORs should have
>>> +       the same VECTOR type, but it can be extended into different groups by
>>> +       VECTOR types in future if any profitable cases found.
>>> +    2) The current implementation requires the whole VECTORs should be fully
>>> +       covered, but it can be extended to support partial, checking adjacent
>>> +       but not fill the whole, it may need some cost model to define the
>>> +       boundary to do or not.
>>> +*/
>>> +static bool
>>> +undistribute_bitref_for_vector (enum tree_code opcode, vec<operand_entry *> *ops,
>>> +			     struct loop *loop)
>>> +{
>>> +  if (ops->length () <= 1)
>>> +    return false;
>>> +
>>> +  if (opcode != PLUS_EXPR && opcode != MULT_EXPR && opcode != BIT_XOR_EXPR
>>> +      && opcode != BIT_IOR_EXPR && opcode != BIT_AND_EXPR)
>>> +    return false;
>>> +
>>> +  hash_map<tree, v_info_ptr> v_info_map;
>>> +  operand_entry *oe1;
>>> +  unsigned i;
>>> +
>>> +  /* Find those summands from VECTOR BIT_FIELD_REF in addition chain, put the
>>> +     information into map.  */
>>> +  FOR_EACH_VEC_ELT (*ops, i, oe1)
>>> +    {
>>> +      enum tree_code dcode;
>>> +      gimple *oe1def;
>>> +
>>> +      if (TREE_CODE (oe1->op) != SSA_NAME)
>>> +	continue;
>>> +      oe1def = SSA_NAME_DEF_STMT (oe1->op);
>>> +      if (!is_gimple_assign (oe1def))
>>> +	continue;
>>> +      dcode = gimple_assign_rhs_code (oe1def);
>>> +      if (dcode != BIT_FIELD_REF || !is_reassociable_op (oe1def, dcode, loop))
>>> +	continue;
>>> +
>>> +      tree rhs = gimple_op (oe1def, 1);
>>> +      tree op0 = TREE_OPERAND (rhs, 0);
>>> +      tree vec_type = TREE_TYPE (op0);
>>> +
>>> +      if (TREE_CODE (op0) != SSA_NAME || TREE_CODE (vec_type) != VECTOR_TYPE)
>>> +	continue;
>>> +
>>> +      tree op1 = TREE_OPERAND (rhs, 1);
>>> +      tree op2 = TREE_OPERAND (rhs, 2);
>>> +
>>> +      tree elem_type = TREE_TYPE (vec_type);
>>> +      unsigned HOST_WIDE_INT size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
>>> +      if (size != TREE_INT_CST_LOW (op1))
>>> +	continue;
>>> +
>>> +      /* Ignore it if target machine can't support this VECTOR type.  */
>>> +      if (!VECTOR_MODE_P (TYPE_MODE (vec_type)))
>>> +	continue;
>>> +
>>> +      /* Ignore it if target machine can't support this type of VECTOR
>>> +         operation.  */
>>> +      optab op_tab = optab_for_tree_code (opcode, vec_type, optab_vector);
>>> +      if (optab_handler (op_tab, TYPE_MODE (vec_type)) == CODE_FOR_nothing)
>>> +	continue;
>>> +
>>> +      v_info_ptr *info_ptr = v_info_map.get (op0);
>>> +      if (info_ptr)
>>> +	{
>>> +	  v_info_ptr info = *info_ptr;
>>> +	  info->offsets.safe_push (TREE_INT_CST_LOW (op2));
>>> +	  info->ops_indexes.safe_push (i);
>>> +	}
>>> +      else
>>> +	{
>>> +	  v_info_ptr info = new v_info;
>>> +	  info->offsets.safe_push (TREE_INT_CST_LOW (op2));
>>> +	  info->ops_indexes.safe_push (i);
>>> +	  v_info_map.put (op0, info);
>>> +	}
>>> +    }
>>> +
>>> +  /* At least two VECTOR to combine.  */
>>> +  if (v_info_map.elements () <= 1)
>>> +    {
>>> +      cleanup_vinfo_map (v_info_map);
>>> +      return false;
>>> +    }
>>> +
>>> +  /* Use the first VECTOR and its information as the reference.
>>> +     Firstly, we should validate it, that is:
>>> +       1) sorted offsets are adjacent, no holes.
>>> +       2) can fill the whole VECTOR perfectly.  */
>>> +  hash_map<tree, v_info_ptr>::iterator it = v_info_map.begin ();
>>> +  tree ref_vec = (*it).first;
>>> +  v_info_ptr ref_info = (*it).second;
>>> +  ref_info->offsets.qsort (unsigned_cmp);
>>> +  tree vec_type = TREE_TYPE (ref_vec);
>>> +  tree elem_type = TREE_TYPE (vec_type);
>>> +  unsigned HOST_WIDE_INT elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
>>> +  unsigned HOST_WIDE_INT curr;
>>> +  unsigned HOST_WIDE_INT prev = ref_info->offsets[0];
>>> +
>>> +  /* Continous check.  */
>>> +  FOR_EACH_VEC_ELT_FROM (ref_info->offsets, i, curr, 1)
>>> +    {
>>> +      if (curr != (prev + elem_size))
>>> +	{
>>> +	  cleanup_vinfo_map (v_info_map);
>>> +	  return false;
>>> +	}
>>> +      prev = curr;
>>> +    }
>>> +
>>> +  /* Check whether fill the whole.  */
>>> +  if ((prev + elem_size) != TREE_INT_CST_LOW (TYPE_SIZE (TREE_TYPE (ref_vec))))
>>> +    {
>>> +      cleanup_vinfo_map (v_info_map);
>>> +      return false;
>>> +    }
>>> +
>>> +  auto_vec<tree> vectors (v_info_map.elements ());
>>> +  vectors.quick_push (ref_vec);
>>> +
>>> +  /* Use the ref_vec to filter others.  */
>>> +  for (++it; it != v_info_map.end (); ++it)
>>> +    {
>>> +      tree vec = (*it).first;
>>> +      v_info_ptr info = (*it).second;
>>> +      if (TREE_TYPE (ref_vec) != TREE_TYPE (vec))
>>> +	continue;
>>> +      if (ref_info->offsets.length () != info->offsets.length ())
>>> +	continue;
>>> +      bool same_offset = true;
>>> +      info->offsets.qsort (unsigned_cmp);
>>> +      for (unsigned i = 0; i < ref_info->offsets.length (); i++)
>>> +	{
>>> +	  if (ref_info->offsets[i] != info->offsets[i])
>>> +	    {
>>> +	      same_offset = false;
>>> +	      break;
>>> +	    }
>>> +	}
>>> +      if (!same_offset)
>>> +	continue;
>>> +      vectors.quick_push (vec);
>>> +    }
>>> +
>>> +  if (vectors.length () < 2)
>>> +    {
>>> +      cleanup_vinfo_map (v_info_map);
>>> +      return false;
>>> +    }
>>> +
>>> +  tree tr;
>>> +  if (dump_file && (dump_flags & TDF_DETAILS))
>>> +    {
>>> +      fprintf (dump_file, "The bit_field_ref vector list for undistribute: ");
>>> +      FOR_EACH_VEC_ELT (vectors, i, tr)
>>> +	{
>>> +	  print_generic_expr (dump_file, tr);
>>> +	  fprintf (dump_file, "  ");
>>> +	}
>>> +      fprintf (dump_file, "\n");
>>> +    }
>>> +
>>> +  /* Build the sum for all candidate VECTORs.  */
>>> +  unsigned idx;
>>> +  gimple *sum = NULL;
>>> +  v_info_ptr info;
>>> +  tree sum_vec = ref_vec;
>>> +  FOR_EACH_VEC_ELT_FROM (vectors, i, tr, 1)
>>> +    {
>>> +      sum = build_and_add_sum (TREE_TYPE (ref_vec), sum_vec, tr, opcode);
>>> +      info = *(v_info_map.get (tr));
>>> +      unsigned j;
>>> +      FOR_EACH_VEC_ELT (info->ops_indexes, j, idx)
>>> +	{
>>> +	  gimple *def = SSA_NAME_DEF_STMT ((*ops)[idx]->op);
>>> +	  gimple_set_visited (def, true);
>>> +	  if (opcode == PLUS_EXPR || opcode == BIT_XOR_EXPR
>>> +	      || opcode == BIT_IOR_EXPR)
>>> +	    (*ops)[idx]->op = build_zero_cst (TREE_TYPE ((*ops)[idx]->op));
>>> +	  else if (opcode == MULT_EXPR)
>>> +	    (*ops)[idx]->op = build_one_cst (TREE_TYPE ((*ops)[idx]->op));
>>> +	  else
>>> +	    {
>>> +	      gcc_assert (opcode == BIT_AND_EXPR);
>>> +	      (*ops)[idx]->op
>>> +		= build_all_ones_cst (TREE_TYPE ((*ops)[idx]->op));
>>> +	    }
>>> +	  (*ops)[idx]->rank = 0;
>>> +	}
>>> +      sum_vec = gimple_get_lhs (sum);
>>> +      if (dump_file && (dump_flags & TDF_DETAILS))
>>> +	{
>>> +	  fprintf (dump_file, "Generating addition -> ");
>>> +	  print_gimple_stmt (dump_file, sum, 0);
>>> +	}
>>> +    }
>>> +
>>> +  /* Referring to any good shape VECTOR (here using ref_vec), generate the
>>> +     BIT_FIELD_REF statements accordingly.  */
>>> +  info = *(v_info_map.get (ref_vec));
>>> +  gcc_assert (sum);
>>> +  FOR_EACH_VEC_ELT (info->ops_indexes, i, idx)
>>> +    {
>>> +      tree dst = make_ssa_name (elem_type);
>>> +      gimple *gs
>>> +	= gimple_build_assign (dst, BIT_FIELD_REF,
>>> +			       build3 (BIT_FIELD_REF, elem_type, sum_vec,
>>> +				       TYPE_SIZE (elem_type),
>>> +				       bitsize_int (info->offsets[i])));
>>> +      insert_stmt_after (gs, sum);
>>> +      update_stmt (gs);
>>> +      gimple *def = SSA_NAME_DEF_STMT ((*ops)[idx]->op);
>>> +      gimple_set_visited (def, true);
>>> +      (*ops)[idx]->op = gimple_assign_lhs (gs);
>>> +      (*ops)[idx]->rank = get_rank ((*ops)[idx]->op);
>>> +      if (dump_file && (dump_flags & TDF_DETAILS))
>>> +	{
>>> +	  fprintf (dump_file, "Generating bit_field_ref -> ");
>>> +	  print_gimple_stmt (dump_file, gs, 0);
>>> +	}
>>> +    }
>>> +
>>> +  if (dump_file && (dump_flags & TDF_DETAILS))
>>> +    {
>>> +      fprintf (dump_file, "undistributiong bit_field_ref for vector done.\n");
>>> +    }
>>> +
>>> +  cleanup_vinfo_map (v_info_map);
>>> +
>>> +  return true;
>>> +}
>>> +
>>>  /* If OPCODE is BIT_IOR_EXPR or BIT_AND_EXPR and CURR is a comparison
>>>     expression, examine the other OPS to see if any of them are comparisons
>>>     of the same values, which we may be able to combine or eliminate.
>>> @@ -5880,11 +6169,6 @@ reassociate_bb (basic_block bb)
>>>  	  tree lhs, rhs1, rhs2;
>>>  	  enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
>>>  
>>> -	  /* If this is not a gimple binary expression, there is
>>> -	     nothing for us to do with it.  */
>>> -	  if (get_gimple_rhs_class (rhs_code) != GIMPLE_BINARY_RHS)
>>> -	    continue;
>>> -
>>>  	  /* If this was part of an already processed statement,
>>>  	     we don't need to touch it again. */
>>>  	  if (gimple_visited_p (stmt))
>>> @@ -5911,6 +6195,11 @@ reassociate_bb (basic_block bb)
>>>  	      continue;
>>>  	    }
>>>  
>>> +	  /* If this is not a gimple binary expression, there is
>>> +	     nothing for us to do with it.  */
>>> +	  if (get_gimple_rhs_class (rhs_code) != GIMPLE_BINARY_RHS)
>>> +	    continue;
>>> +
>>>  	  lhs = gimple_assign_lhs (stmt);
>>>  	  rhs1 = gimple_assign_rhs1 (stmt);
>>>  	  rhs2 = gimple_assign_rhs2 (stmt);
>>> @@ -5950,6 +6239,13 @@ reassociate_bb (basic_block bb)
>>>  		  optimize_ops_list (rhs_code, &ops);
>>>  		}
>>>  
>>> +	      if (undistribute_bitref_for_vector (rhs_code, &ops,
>>> +						  loop_containing_stmt (stmt)))
>>> +		{
>>> +		  ops.qsort (sort_by_operand_rank);
>>> +		  optimize_ops_list (rhs_code, &ops);
>>> +		}
>>> +
>>>  	      if (rhs_code == PLUS_EXPR
>>>  		  && transform_add_to_multiply (&ops))
>>>  		ops.qsort (sort_by_operand_rank);
>>>
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PING^5] [PATCH V3] PR88497 - Extend reassoc for vector bit_field_ref
  2019-06-11  2:46     ` [PING^4] " Kewen.Lin
@ 2019-06-26  5:37       ` Kewen.Lin
  0 siblings, 0 replies; 16+ messages in thread
From: Kewen.Lin @ 2019-06-26  5:37 UTC (permalink / raw)
  To: GCC Patches
  Cc: Bill Schmidt, Segher Boessenkool, Richard Guenther, Jeff Law,
	Jakub Jelinek

Hi all,

Gentle ping for this patch:

https://gcc.gnu.org/ml/gcc-patches/2019-03/msg00966.html

on 2019/6/11 脡脧脦莽10:46, Kewen.Lin wrote:
> Hi,
> 
> Gentle ping again.  Thanks!
> 
> Kewen
> 
> on 2019/5/21 脡脧脦莽10:02, Kewen.Lin wrote:
>> Hi,
>>
>> Gentle ping again.  Thanks!
>>
>>
>> Kewen
>>
>> on 2019/5/5 脧脗脦莽2:15, Kewen.Lin wrote:
>>> Hi,
>>>
>>> I'd like to gentle ping for this patch:
>>> https://gcc.gnu.org/ml/gcc-patches/2019-03/msg00966.html
>>>
>>> OK for trunk now?
>>>
>>> Thanks!
>>>
>>> on 2019/3/20 脡脧脦莽11:14, Kewen.Lin wrote:
>>>> Hi,
>>>>
>>>> Please refer to below link for previous threads.
>>>> https://gcc.gnu.org/ml/gcc-patches/2019-03/msg00348.html
>>>>
>>>> Comparing to patch v2, I've moved up the vector operation target 
>>>> check upward together with vector type target check.  Besides, I
>>>> ran bootstrap and regtest on powerpc64-linux-gnu (BE), updated 
>>>> testcases' requirements and options for robustness.
>>>>
>>>> Is it OK for GCC10?
>>>>
>>>>
>>>> gcc/ChangeLog
>>>>
>>>> 2019-03-20  Kewen Lin  <linkw@gcc.gnu.org>
>>>>
>>>> 	PR target/88497
>>>> 	* tree-ssa-reassoc.c (reassociate_bb): Swap the positions of 
>>>> 	GIMPLE_BINARY_RHS check and gimple_visited_p check, call new 
>>>> 	function undistribute_bitref_for_vector.
>>>> 	(undistribute_bitref_for_vector): New function.
>>>> 	(cleanup_vinfo_map): Likewise.
>>>> 	(unsigned_cmp): Likewise.
>>>>
>>>> gcc/testsuite/ChangeLog
>>>>
>>>> 2019-03-20  Kewen Lin  <linkw@gcc.gnu.org>
>>>>
>>>> 	* gcc.dg/tree-ssa/pr88497-1.c: New test.
>>>> 	* gcc.dg/tree-ssa/pr88497-2.c: Likewise.
>>>> 	* gcc.dg/tree-ssa/pr88497-3.c: Likewise.
>>>> 	* gcc.dg/tree-ssa/pr88497-4.c: Likewise.
>>>> 	* gcc.dg/tree-ssa/pr88497-5.c: Likewise.
>>>>
>>>> ---
>>>>  gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c |  44 +++++
>>>>  gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c |  33 ++++
>>>>  gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c |  33 ++++
>>>>  gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c |  33 ++++
>>>>  gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c |  33 ++++
>>>>  gcc/tree-ssa-reassoc.c                    | 306 +++++++++++++++++++++++++++++-
>>>>  6 files changed, 477 insertions(+), 5 deletions(-)
>>>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c
>>>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c
>>>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c
>>>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c
>>>>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c
>>>>
>>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c
>>>> new file mode 100644
>>>> index 0000000..99c9af8
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c
>>>> @@ -0,0 +1,44 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-require-effective-target vect_double } */
>>>> +/* { dg-require-effective-target powerpc_vsx_ok { target { powerpc*-*-* } } } */
>>>> +/* { dg-options "-O2 -ffast-math" } */
>>>> +/* { dg-options "-O2 -ffast-math -mvsx -fdump-tree-reassoc1" { target { powerpc*-*-* } } } */
>>>> +
>>>> +/* To test reassoc can undistribute vector bit_field_ref summation.
>>>> +
>>>> +   arg1 and arg2 are two arrays whose elements of type vector double.
>>>> +   Assuming:
>>>> +     A0 = arg1[0], A1 = arg1[1], A2 = arg1[2], A3 = arg1[3],
>>>> +     B0 = arg2[0], B1 = arg2[1], B2 = arg2[2], B3 = arg2[3],
>>>> +
>>>> +   Then:
>>>> +     V0 = A0 * B0, V1 = A1 * B1, V2 = A2 * B2, V3 = A3 * B3,
>>>> +
>>>> +   reassoc transforms
>>>> +
>>>> +     accumulator += V0[0] + V0[1] + V1[0] + V1[1] + V2[0] + V2[1]
>>>> +		  + V3[0] + V3[1];
>>>> +
>>>> +   into:
>>>> +
>>>> +     T = V0 + V1 + V2 + V3
>>>> +     accumulator += T[0] + T[1];
>>>> +
>>>> +   Fewer bit_field_refs, only two for 128 or more bits vector.  */
>>>> +
>>>> +typedef double v2df __attribute__ ((vector_size (16)));
>>>> +double
>>>> +test (double accumulator, v2df arg1[], v2df arg2[])
>>>> +{
>>>> +  v2df temp;
>>>> +  temp = arg1[0] * arg2[0];
>>>> +  accumulator += temp[0] + temp[1];
>>>> +  temp = arg1[1] * arg2[1];
>>>> +  accumulator += temp[0] + temp[1];
>>>> +  temp = arg1[2] * arg2[2];
>>>> +  accumulator += temp[0] + temp[1];
>>>> +  temp = arg1[3] * arg2[3];
>>>> +  accumulator += temp[0] + temp[1];
>>>> +  return accumulator;
>>>> +}
>>>> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 2 "reassoc1" { target { powerpc*-*-* } } } } */
>>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c
>>>> new file mode 100644
>>>> index 0000000..61ed0bf5
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c
>>>> @@ -0,0 +1,33 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-require-effective-target vect_float } */
>>>> +/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */
>>>> +/* { dg-options "-O2 -ffast-math" } */
>>>> +/* { dg-options "-O2 -ffast-math -maltivec -fdump-tree-reassoc1" { target { powerpc*-*-* } } } */
>>>> +
>>>> +/* To test reassoc can undistribute vector bit_field_ref on multiplication.
>>>> +
>>>> +   v1, v2, v3, v4 of type vector float.
>>>> +
>>>> +   reassoc transforms
>>>> +
>>>> +     accumulator *= v1[0] * v1[1] * v1[2] * v1[3] *
>>>> +                    v2[0] * v2[1] * v2[2] * v2[3] *
>>>> +                    v3[0] * v3[1] * v3[2] * v3[3] *
>>>> +                    v4[0] * v4[1] * v4[2] * v4[3] ;
>>>> +
>>>> +   into:
>>>> +
>>>> +     T = v1 * v2 * v3 * v4;
>>>> +     accumulator *= T[0] * T[1] * T[2] * T[3];
>>>> +
>>>> +   Fewer bit_field_refs, only four for 128 or more bits vector.  */
>>>> +
>>>> +typedef float v4si __attribute__((vector_size(16)));
>>>> +float test(float accumulator, v4si v1, v4si v2, v4si v3, v4si v4) {
>>>> +  accumulator *= v1[0] * v1[1] * v1[2] * v1[3];
>>>> +  accumulator *= v2[0] * v2[1] * v2[2] * v2[3];
>>>> +  accumulator *= v3[0] * v3[1] * v3[2] * v3[3];
>>>> +  accumulator *= v4[0] * v4[1] * v4[2] * v4[3];
>>>> +  return accumulator;
>>>> +}
>>>> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 4 "reassoc1" { target { powerpc*-*-* } } } } */
>>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c
>>>> new file mode 100644
>>>> index 0000000..3790afc
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c
>>>> @@ -0,0 +1,33 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-require-effective-target vect_int } */
>>>> +/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */
>>>> +/* { dg-options "-O2 -ffast-math" } */
>>>> +/* { dg-options "-O2 -ffast-math -maltivec -fdump-tree-reassoc1" { target { powerpc*-*-* } } } */
>>>> +
>>>> +/* To test reassoc can undistribute vector bit_field_ref on bitwise AND.
>>>> +
>>>> +   v1, v2, v3, v4 of type vector int.
>>>> +
>>>> +   reassoc transforms
>>>> +
>>>> +     accumulator &= v1[0] & v1[1] & v1[2] & v1[3] &
>>>> +                    v2[0] & v2[1] & v2[2] & v2[3] &
>>>> +                    v3[0] & v3[1] & v3[2] & v3[3] &
>>>> +                    v4[0] & v4[1] & v4[2] & v4[3] ;
>>>> +
>>>> +   into:
>>>> +
>>>> +     T = v1 & v2 & v3 & v4;
>>>> +     accumulator &= T[0] & T[1] & T[2] & T[3];
>>>> +
>>>> +   Fewer bit_field_refs, only four for 128 or more bits vector.  */
>>>> +
>>>> +typedef int v4si __attribute__((vector_size(16)));
>>>> +int test(int accumulator, v4si v1, v4si v2, v4si v3, v4si v4) {
>>>> +  accumulator &= v1[0] & v1[1] & v1[2] & v1[3];
>>>> +  accumulator &= v2[0] & v2[1] & v2[2] & v2[3];
>>>> +  accumulator &= v3[0] & v3[1] & v3[2] & v3[3];
>>>> +  accumulator &= v4[0] & v4[1] & v4[2] & v4[3];
>>>> +  return accumulator;
>>>> +}
>>>> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 4 "reassoc1" { target { powerpc*-*-* } } } } */
>>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c
>>>> new file mode 100644
>>>> index 0000000..1864aad
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c
>>>> @@ -0,0 +1,33 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-require-effective-target vect_int } */
>>>> +/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */
>>>> +/* { dg-options "-O2 -ffast-math" } */
>>>> +/* { dg-options "-O2 -ffast-math -maltivec -fdump-tree-reassoc1" { target { powerpc*-*-* } } } */
>>>> +
>>>> +/* To test reassoc can undistribute vector bit_field_ref on bitwise IOR.
>>>> +
>>>> +   v1, v2, v3, v4 of type vector int.
>>>> +
>>>> +   reassoc transforms
>>>> +
>>>> +     accumulator |= v1[0] | v1[1] | v1[2] | v1[3] |
>>>> +                    v2[0] | v2[1] | v2[2] | v2[3] |
>>>> +                    v3[0] | v3[1] | v3[2] | v3[3] |
>>>> +                    v4[0] | v4[1] | v4[2] | v4[3] ;
>>>> +
>>>> +   into:
>>>> +
>>>> +     T = v1 | v2 | v3 | v4;
>>>> +     accumulator |= T[0] | T[1] | T[2] | T[3];
>>>> +
>>>> +   Fewer bit_field_refs, only four for 128 or more bits vector.  */
>>>> +
>>>> +typedef int v4si __attribute__((vector_size(16)));
>>>> +int test(int accumulator, v4si v1, v4si v2, v4si v3, v4si v4) {
>>>> +  accumulator |= v1[0] | v1[1] | v1[2] | v1[3];
>>>> +  accumulator |= v2[0] | v2[1] | v2[2] | v2[3];
>>>> +  accumulator |= v3[0] | v3[1] | v3[2] | v3[3];
>>>> +  accumulator |= v4[0] | v4[1] | v4[2] | v4[3];
>>>> +  return accumulator;
>>>> +}
>>>> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 4 "reassoc1" { target { powerpc*-*-* } } } } */
>>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c
>>>> new file mode 100644
>>>> index 0000000..f747372
>>>> --- /dev/null
>>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c
>>>> @@ -0,0 +1,33 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-require-effective-target vect_int } */
>>>> +/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */
>>>> +/* { dg-options "-O2 -ffast-math" } */
>>>> +/* { dg-options "-O2 -ffast-math -maltivec -fdump-tree-reassoc1" { target { powerpc*-*-* } } } */
>>>> +
>>>> +/* To test reassoc can undistribute vector bit_field_ref on bitwise XOR.
>>>> +
>>>> +   v1, v2, v3, v4 of type vector int.
>>>> +
>>>> +   reassoc transforms
>>>> +
>>>> +     accumulator ^= v1[0] ^ v1[1] ^ v1[2] ^ v1[3] ^
>>>> +                    v2[0] ^ v2[1] ^ v2[2] ^ v2[3] ^
>>>> +                    v3[0] ^ v3[1] ^ v3[2] ^ v3[3] ^
>>>> +                    v4[0] ^ v4[1] ^ v4[2] ^ v4[3] ;
>>>> +
>>>> +   into:
>>>> +
>>>> +     T = v1 ^ v2 ^ v3 ^ v4;
>>>> +     accumulator ^= T[0] ^ T[1] ^ T[2] ^ T[3];
>>>> +
>>>> +   Fewer bit_field_refs, only four for 128 or more bits vector.  */
>>>> +
>>>> +typedef int v4si __attribute__((vector_size(16)));
>>>> +int test(int accumulator, v4si v1, v4si v2, v4si v3, v4si v4) {
>>>> +  accumulator ^= v1[0] ^ v1[1] ^ v1[2] ^ v1[3];
>>>> +  accumulator ^= v2[0] ^ v2[1] ^ v2[2] ^ v2[3];
>>>> +  accumulator ^= v3[0] ^ v3[1] ^ v3[2] ^ v3[3];
>>>> +  accumulator ^= v4[0] ^ v4[1] ^ v4[2] ^ v4[3];
>>>> +  return accumulator;
>>>> +}
>>>> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 4 "reassoc1" { target { powerpc*-*-* } } } } */
>>>> diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
>>>> index e1c4dfe..a6cd85a 100644
>>>> --- a/gcc/tree-ssa-reassoc.c
>>>> +++ b/gcc/tree-ssa-reassoc.c
>>>> @@ -1772,6 +1772,295 @@ undistribute_ops_list (enum tree_code opcode,
>>>>    return changed;
>>>>  }
>>>>  
>>>> +/* Hold the information of one specific VECTOR_TYPE SSA_NAME.
>>>> +    - offsets: for different BIT_FIELD_REF offsets accessing same VECTOR.
>>>> +    - ops_indexes: the index of vec ops* for each relavant BIT_FIELD_REF.  */
>>>> +struct v_info
>>>> +{
>>>> +  auto_vec<unsigned HOST_WIDE_INT, 32> offsets;
>>>> +  auto_vec<unsigned, 32> ops_indexes;
>>>> +};
>>>> +
>>>> +typedef struct v_info *v_info_ptr;
>>>> +
>>>> +/* Comparison function for qsort on unsigned BIT_FIELD_REF offsets.  */
>>>> +static int
>>>> +unsigned_cmp (const void *p_i, const void *p_j)
>>>> +{
>>>> +  if (*(const unsigned HOST_WIDE_INT *) p_i
>>>> +      >= *(const unsigned HOST_WIDE_INT *) p_j)
>>>> +    return 1;
>>>> +  else
>>>> +    return -1;
>>>> +}
>>>> +
>>>> +/* Cleanup hash map for VECTOR information.  */
>>>> +static void
>>>> +cleanup_vinfo_map (hash_map<tree, v_info_ptr> &info_map)
>>>> +{
>>>> +  for (hash_map<tree, v_info_ptr>::iterator it = info_map.begin ();
>>>> +       it != info_map.end (); ++it)
>>>> +    {
>>>> +      v_info_ptr info = (*it).second;
>>>> +      delete info;
>>>> +      (*it).second = NULL;
>>>> +    }
>>>> +}
>>>> +
>>>> +/* Perform un-distribution of BIT_FIELD_REF on VECTOR_TYPE.
>>>> +     V1[0] + V1[1] + ... + V1[k] + V2[0] + V2[1] + ... + V2[k] + ... Vn[k]
>>>> +   is transformed to
>>>> +     Vs = (V1 + V2 + ... + Vn)
>>>> +     Vs[0] + Vs[1] + ... + Vs[k]
>>>> +
>>>> +   The basic steps are listed below:
>>>> +
>>>> +    1) Check the addition chain *OPS by looking those summands coming from
>>>> +       VECTOR bit_field_ref on VECTOR type. Put the information into
>>>> +       v_info_map for each satisfied summand, using VECTOR SSA_NAME as key.
>>>> +
>>>> +    2) For each key (VECTOR SSA_NAME), validate all its BIT_FIELD_REFs are
>>>> +       continous, they can cover the whole VECTOR perfectly without any holes.
>>>> +       Obtain one VECTOR list which contain candidates to be transformed.
>>>> +
>>>> +    3) Build the addition statements for all VECTOR candidates, generate
>>>> +       BIT_FIELD_REFs accordingly.
>>>> +
>>>> +   TODO:
>>>> +    1) The current implementation restrict all candidate VECTORs should have
>>>> +       the same VECTOR type, but it can be extended into different groups by
>>>> +       VECTOR types in future if any profitable cases found.
>>>> +    2) The current implementation requires the whole VECTORs should be fully
>>>> +       covered, but it can be extended to support partial, checking adjacent
>>>> +       but not fill the whole, it may need some cost model to define the
>>>> +       boundary to do or not.
>>>> +*/
>>>> +static bool
>>>> +undistribute_bitref_for_vector (enum tree_code opcode, vec<operand_entry *> *ops,
>>>> +			     struct loop *loop)
>>>> +{
>>>> +  if (ops->length () <= 1)
>>>> +    return false;
>>>> +
>>>> +  if (opcode != PLUS_EXPR && opcode != MULT_EXPR && opcode != BIT_XOR_EXPR
>>>> +      && opcode != BIT_IOR_EXPR && opcode != BIT_AND_EXPR)
>>>> +    return false;
>>>> +
>>>> +  hash_map<tree, v_info_ptr> v_info_map;
>>>> +  operand_entry *oe1;
>>>> +  unsigned i;
>>>> +
>>>> +  /* Find those summands from VECTOR BIT_FIELD_REF in addition chain, put the
>>>> +     information into map.  */
>>>> +  FOR_EACH_VEC_ELT (*ops, i, oe1)
>>>> +    {
>>>> +      enum tree_code dcode;
>>>> +      gimple *oe1def;
>>>> +
>>>> +      if (TREE_CODE (oe1->op) != SSA_NAME)
>>>> +	continue;
>>>> +      oe1def = SSA_NAME_DEF_STMT (oe1->op);
>>>> +      if (!is_gimple_assign (oe1def))
>>>> +	continue;
>>>> +      dcode = gimple_assign_rhs_code (oe1def);
>>>> +      if (dcode != BIT_FIELD_REF || !is_reassociable_op (oe1def, dcode, loop))
>>>> +	continue;
>>>> +
>>>> +      tree rhs = gimple_op (oe1def, 1);
>>>> +      tree op0 = TREE_OPERAND (rhs, 0);
>>>> +      tree vec_type = TREE_TYPE (op0);
>>>> +
>>>> +      if (TREE_CODE (op0) != SSA_NAME || TREE_CODE (vec_type) != VECTOR_TYPE)
>>>> +	continue;
>>>> +
>>>> +      tree op1 = TREE_OPERAND (rhs, 1);
>>>> +      tree op2 = TREE_OPERAND (rhs, 2);
>>>> +
>>>> +      tree elem_type = TREE_TYPE (vec_type);
>>>> +      unsigned HOST_WIDE_INT size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
>>>> +      if (size != TREE_INT_CST_LOW (op1))
>>>> +	continue;
>>>> +
>>>> +      /* Ignore it if target machine can't support this VECTOR type.  */
>>>> +      if (!VECTOR_MODE_P (TYPE_MODE (vec_type)))
>>>> +	continue;
>>>> +
>>>> +      /* Ignore it if target machine can't support this type of VECTOR
>>>> +         operation.  */
>>>> +      optab op_tab = optab_for_tree_code (opcode, vec_type, optab_vector);
>>>> +      if (optab_handler (op_tab, TYPE_MODE (vec_type)) == CODE_FOR_nothing)
>>>> +	continue;
>>>> +
>>>> +      v_info_ptr *info_ptr = v_info_map.get (op0);
>>>> +      if (info_ptr)
>>>> +	{
>>>> +	  v_info_ptr info = *info_ptr;
>>>> +	  info->offsets.safe_push (TREE_INT_CST_LOW (op2));
>>>> +	  info->ops_indexes.safe_push (i);
>>>> +	}
>>>> +      else
>>>> +	{
>>>> +	  v_info_ptr info = new v_info;
>>>> +	  info->offsets.safe_push (TREE_INT_CST_LOW (op2));
>>>> +	  info->ops_indexes.safe_push (i);
>>>> +	  v_info_map.put (op0, info);
>>>> +	}
>>>> +    }
>>>> +
>>>> +  /* At least two VECTOR to combine.  */
>>>> +  if (v_info_map.elements () <= 1)
>>>> +    {
>>>> +      cleanup_vinfo_map (v_info_map);
>>>> +      return false;
>>>> +    }
>>>> +
>>>> +  /* Use the first VECTOR and its information as the reference.
>>>> +     Firstly, we should validate it, that is:
>>>> +       1) sorted offsets are adjacent, no holes.
>>>> +       2) can fill the whole VECTOR perfectly.  */
>>>> +  hash_map<tree, v_info_ptr>::iterator it = v_info_map.begin ();
>>>> +  tree ref_vec = (*it).first;
>>>> +  v_info_ptr ref_info = (*it).second;
>>>> +  ref_info->offsets.qsort (unsigned_cmp);
>>>> +  tree vec_type = TREE_TYPE (ref_vec);
>>>> +  tree elem_type = TREE_TYPE (vec_type);
>>>> +  unsigned HOST_WIDE_INT elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
>>>> +  unsigned HOST_WIDE_INT curr;
>>>> +  unsigned HOST_WIDE_INT prev = ref_info->offsets[0];
>>>> +
>>>> +  /* Continous check.  */
>>>> +  FOR_EACH_VEC_ELT_FROM (ref_info->offsets, i, curr, 1)
>>>> +    {
>>>> +      if (curr != (prev + elem_size))
>>>> +	{
>>>> +	  cleanup_vinfo_map (v_info_map);
>>>> +	  return false;
>>>> +	}
>>>> +      prev = curr;
>>>> +    }
>>>> +
>>>> +  /* Check whether fill the whole.  */
>>>> +  if ((prev + elem_size) != TREE_INT_CST_LOW (TYPE_SIZE (TREE_TYPE (ref_vec))))
>>>> +    {
>>>> +      cleanup_vinfo_map (v_info_map);
>>>> +      return false;
>>>> +    }
>>>> +
>>>> +  auto_vec<tree> vectors (v_info_map.elements ());
>>>> +  vectors.quick_push (ref_vec);
>>>> +
>>>> +  /* Use the ref_vec to filter others.  */
>>>> +  for (++it; it != v_info_map.end (); ++it)
>>>> +    {
>>>> +      tree vec = (*it).first;
>>>> +      v_info_ptr info = (*it).second;
>>>> +      if (TREE_TYPE (ref_vec) != TREE_TYPE (vec))
>>>> +	continue;
>>>> +      if (ref_info->offsets.length () != info->offsets.length ())
>>>> +	continue;
>>>> +      bool same_offset = true;
>>>> +      info->offsets.qsort (unsigned_cmp);
>>>> +      for (unsigned i = 0; i < ref_info->offsets.length (); i++)
>>>> +	{
>>>> +	  if (ref_info->offsets[i] != info->offsets[i])
>>>> +	    {
>>>> +	      same_offset = false;
>>>> +	      break;
>>>> +	    }
>>>> +	}
>>>> +      if (!same_offset)
>>>> +	continue;
>>>> +      vectors.quick_push (vec);
>>>> +    }
>>>> +
>>>> +  if (vectors.length () < 2)
>>>> +    {
>>>> +      cleanup_vinfo_map (v_info_map);
>>>> +      return false;
>>>> +    }
>>>> +
>>>> +  tree tr;
>>>> +  if (dump_file && (dump_flags & TDF_DETAILS))
>>>> +    {
>>>> +      fprintf (dump_file, "The bit_field_ref vector list for undistribute: ");
>>>> +      FOR_EACH_VEC_ELT (vectors, i, tr)
>>>> +	{
>>>> +	  print_generic_expr (dump_file, tr);
>>>> +	  fprintf (dump_file, "  ");
>>>> +	}
>>>> +      fprintf (dump_file, "\n");
>>>> +    }
>>>> +
>>>> +  /* Build the sum for all candidate VECTORs.  */
>>>> +  unsigned idx;
>>>> +  gimple *sum = NULL;
>>>> +  v_info_ptr info;
>>>> +  tree sum_vec = ref_vec;
>>>> +  FOR_EACH_VEC_ELT_FROM (vectors, i, tr, 1)
>>>> +    {
>>>> +      sum = build_and_add_sum (TREE_TYPE (ref_vec), sum_vec, tr, opcode);
>>>> +      info = *(v_info_map.get (tr));
>>>> +      unsigned j;
>>>> +      FOR_EACH_VEC_ELT (info->ops_indexes, j, idx)
>>>> +	{
>>>> +	  gimple *def = SSA_NAME_DEF_STMT ((*ops)[idx]->op);
>>>> +	  gimple_set_visited (def, true);
>>>> +	  if (opcode == PLUS_EXPR || opcode == BIT_XOR_EXPR
>>>> +	      || opcode == BIT_IOR_EXPR)
>>>> +	    (*ops)[idx]->op = build_zero_cst (TREE_TYPE ((*ops)[idx]->op));
>>>> +	  else if (opcode == MULT_EXPR)
>>>> +	    (*ops)[idx]->op = build_one_cst (TREE_TYPE ((*ops)[idx]->op));
>>>> +	  else
>>>> +	    {
>>>> +	      gcc_assert (opcode == BIT_AND_EXPR);
>>>> +	      (*ops)[idx]->op
>>>> +		= build_all_ones_cst (TREE_TYPE ((*ops)[idx]->op));
>>>> +	    }
>>>> +	  (*ops)[idx]->rank = 0;
>>>> +	}
>>>> +      sum_vec = gimple_get_lhs (sum);
>>>> +      if (dump_file && (dump_flags & TDF_DETAILS))
>>>> +	{
>>>> +	  fprintf (dump_file, "Generating addition -> ");
>>>> +	  print_gimple_stmt (dump_file, sum, 0);
>>>> +	}
>>>> +    }
>>>> +
>>>> +  /* Referring to any good shape VECTOR (here using ref_vec), generate the
>>>> +     BIT_FIELD_REF statements accordingly.  */
>>>> +  info = *(v_info_map.get (ref_vec));
>>>> +  gcc_assert (sum);
>>>> +  FOR_EACH_VEC_ELT (info->ops_indexes, i, idx)
>>>> +    {
>>>> +      tree dst = make_ssa_name (elem_type);
>>>> +      gimple *gs
>>>> +	= gimple_build_assign (dst, BIT_FIELD_REF,
>>>> +			       build3 (BIT_FIELD_REF, elem_type, sum_vec,
>>>> +				       TYPE_SIZE (elem_type),
>>>> +				       bitsize_int (info->offsets[i])));
>>>> +      insert_stmt_after (gs, sum);
>>>> +      update_stmt (gs);
>>>> +      gimple *def = SSA_NAME_DEF_STMT ((*ops)[idx]->op);
>>>> +      gimple_set_visited (def, true);
>>>> +      (*ops)[idx]->op = gimple_assign_lhs (gs);
>>>> +      (*ops)[idx]->rank = get_rank ((*ops)[idx]->op);
>>>> +      if (dump_file && (dump_flags & TDF_DETAILS))
>>>> +	{
>>>> +	  fprintf (dump_file, "Generating bit_field_ref -> ");
>>>> +	  print_gimple_stmt (dump_file, gs, 0);
>>>> +	}
>>>> +    }
>>>> +
>>>> +  if (dump_file && (dump_flags & TDF_DETAILS))
>>>> +    {
>>>> +      fprintf (dump_file, "undistributiong bit_field_ref for vector done.\n");
>>>> +    }
>>>> +
>>>> +  cleanup_vinfo_map (v_info_map);
>>>> +
>>>> +  return true;
>>>> +}
>>>> +
>>>>  /* If OPCODE is BIT_IOR_EXPR or BIT_AND_EXPR and CURR is a comparison
>>>>     expression, examine the other OPS to see if any of them are comparisons
>>>>     of the same values, which we may be able to combine or eliminate.
>>>> @@ -5880,11 +6169,6 @@ reassociate_bb (basic_block bb)
>>>>  	  tree lhs, rhs1, rhs2;
>>>>  	  enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
>>>>  
>>>> -	  /* If this is not a gimple binary expression, there is
>>>> -	     nothing for us to do with it.  */
>>>> -	  if (get_gimple_rhs_class (rhs_code) != GIMPLE_BINARY_RHS)
>>>> -	    continue;
>>>> -
>>>>  	  /* If this was part of an already processed statement,
>>>>  	     we don't need to touch it again. */
>>>>  	  if (gimple_visited_p (stmt))
>>>> @@ -5911,6 +6195,11 @@ reassociate_bb (basic_block bb)
>>>>  	      continue;
>>>>  	    }
>>>>  
>>>> +	  /* If this is not a gimple binary expression, there is
>>>> +	     nothing for us to do with it.  */
>>>> +	  if (get_gimple_rhs_class (rhs_code) != GIMPLE_BINARY_RHS)
>>>> +	    continue;
>>>> +
>>>>  	  lhs = gimple_assign_lhs (stmt);
>>>>  	  rhs1 = gimple_assign_rhs1 (stmt);
>>>>  	  rhs2 = gimple_assign_rhs2 (stmt);
>>>> @@ -5950,6 +6239,13 @@ reassociate_bb (basic_block bb)
>>>>  		  optimize_ops_list (rhs_code, &ops);
>>>>  		}
>>>>  
>>>> +	      if (undistribute_bitref_for_vector (rhs_code, &ops,
>>>> +						  loop_containing_stmt (stmt)))
>>>> +		{
>>>> +		  ops.qsort (sort_by_operand_rank);
>>>> +		  optimize_ops_list (rhs_code, &ops);
>>>> +		}
>>>> +
>>>>  	      if (rhs_code == PLUS_EXPR
>>>>  		  && transform_add_to_multiply (&ops))
>>>>  		ops.qsort (sort_by_operand_rank);
>>>>
>>
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V3] PR88497 - Extend reassoc for vector bit_field_ref
  2019-03-20  3:33 [PATCH V3] PR88497 - Extend reassoc for vector bit_field_ref Kewen.Lin
  2019-04-03 22:00 ` [PING] " Kewen.Lin
  2019-05-05  6:15 ` Kewen.Lin
@ 2019-07-02 12:43 ` Richard Biener
  2019-07-03  3:20   ` Kewen.Lin
  2 siblings, 1 reply; 16+ messages in thread
From: Richard Biener @ 2019-07-02 12:43 UTC (permalink / raw)
  To: Kewen.Lin; +Cc: GCC Patches, Bill Schmidt, Segher Boessenkool

[-- Attachment #1: Type: text/plain, Size: 24221 bytes --]

On Wed, 20 Mar 2019, Kewen.Lin wrote:

> Hi,
> 
> Please refer to below link for previous threads.
> https://gcc.gnu.org/ml/gcc-patches/2019-03/msg00348.html
> 
> Comparing to patch v2, I've moved up the vector operation target 
> check upward together with vector type target check.  Besides, I
> ran bootstrap and regtest on powerpc64-linux-gnu (BE), updated 
> testcases' requirements and options for robustness.
> 
> Is it OK for GCC10?
> 
> 
> gcc/ChangeLog
> 
> 2019-03-20  Kewen Lin  <linkw@gcc.gnu.org>
> 
> 	PR target/88497
> 	* tree-ssa-reassoc.c (reassociate_bb): Swap the positions of 
> 	GIMPLE_BINARY_RHS check and gimple_visited_p check, call new 
> 	function undistribute_bitref_for_vector.
> 	(undistribute_bitref_for_vector): New function.
> 	(cleanup_vinfo_map): Likewise.
> 	(unsigned_cmp): Likewise.
> 
> gcc/testsuite/ChangeLog
> 
> 2019-03-20  Kewen Lin  <linkw@gcc.gnu.org>
> 
> 	* gcc.dg/tree-ssa/pr88497-1.c: New test.
> 	* gcc.dg/tree-ssa/pr88497-2.c: Likewise.
> 	* gcc.dg/tree-ssa/pr88497-3.c: Likewise.
> 	* gcc.dg/tree-ssa/pr88497-4.c: Likewise.
> 	* gcc.dg/tree-ssa/pr88497-5.c: Likewise.
> 
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c |  44 +++++
>  gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c |  33 ++++
>  gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c |  33 ++++
>  gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c |  33 ++++
>  gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c |  33 ++++
>  gcc/tree-ssa-reassoc.c                    | 306 +++++++++++++++++++++++++++++-
>  6 files changed, 477 insertions(+), 5 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c
> 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c
> new file mode 100644
> index 0000000..99c9af8
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c
> @@ -0,0 +1,44 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_double } */
> +/* { dg-require-effective-target powerpc_vsx_ok { target { powerpc*-*-* } } } */
> +/* { dg-options "-O2 -ffast-math" } */
> +/* { dg-options "-O2 -ffast-math -mvsx -fdump-tree-reassoc1" { target { powerpc*-*-* } } } */

Use

/* { dg-additional-options "-mvsx" { target { powerpc...

that saves duplicate typing.  I guess that x86_64/i?86 also works
if you enable SSE2 - can you check that and do adjustments accordingly?

> +
> +/* To test reassoc can undistribute vector bit_field_ref summation.
> +
> +   arg1 and arg2 are two arrays whose elements of type vector double.
> +   Assuming:
> +     A0 = arg1[0], A1 = arg1[1], A2 = arg1[2], A3 = arg1[3],
> +     B0 = arg2[0], B1 = arg2[1], B2 = arg2[2], B3 = arg2[3],
> +
> +   Then:
> +     V0 = A0 * B0, V1 = A1 * B1, V2 = A2 * B2, V3 = A3 * B3,
> +
> +   reassoc transforms
> +
> +     accumulator += V0[0] + V0[1] + V1[0] + V1[1] + V2[0] + V2[1]
> +		  + V3[0] + V3[1];
> +
> +   into:
> +
> +     T = V0 + V1 + V2 + V3
> +     accumulator += T[0] + T[1];
> +
> +   Fewer bit_field_refs, only two for 128 or more bits vector.  */
> +
> +typedef double v2df __attribute__ ((vector_size (16)));
> +double
> +test (double accumulator, v2df arg1[], v2df arg2[])
> +{
> +  v2df temp;
> +  temp = arg1[0] * arg2[0];
> +  accumulator += temp[0] + temp[1];
> +  temp = arg1[1] * arg2[1];
> +  accumulator += temp[0] + temp[1];
> +  temp = arg1[2] * arg2[2];
> +  accumulator += temp[0] + temp[1];
> +  temp = arg1[3] * arg2[3];
> +  accumulator += temp[0] + temp[1];
> +  return accumulator;
> +}
> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 2 "reassoc1" { target { powerpc*-*-* } } } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c
> new file mode 100644
> index 0000000..61ed0bf5
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_float } */
> +/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */
> +/* { dg-options "-O2 -ffast-math" } */
> +/* { dg-options "-O2 -ffast-math -maltivec -fdump-tree-reassoc1" { target { powerpc*-*-* } } } */
> +
> +/* To test reassoc can undistribute vector bit_field_ref on multiplication.
> +
> +   v1, v2, v3, v4 of type vector float.
> +
> +   reassoc transforms
> +
> +     accumulator *= v1[0] * v1[1] * v1[2] * v1[3] *
> +                    v2[0] * v2[1] * v2[2] * v2[3] *
> +                    v3[0] * v3[1] * v3[2] * v3[3] *
> +                    v4[0] * v4[1] * v4[2] * v4[3] ;
> +
> +   into:
> +
> +     T = v1 * v2 * v3 * v4;
> +     accumulator *= T[0] * T[1] * T[2] * T[3];
> +
> +   Fewer bit_field_refs, only four for 128 or more bits vector.  */
> +
> +typedef float v4si __attribute__((vector_size(16)));
> +float test(float accumulator, v4si v1, v4si v2, v4si v3, v4si v4) {
> +  accumulator *= v1[0] * v1[1] * v1[2] * v1[3];
> +  accumulator *= v2[0] * v2[1] * v2[2] * v2[3];
> +  accumulator *= v3[0] * v3[1] * v3[2] * v3[3];
> +  accumulator *= v4[0] * v4[1] * v4[2] * v4[3];
> +  return accumulator;
> +}
> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 4 "reassoc1" { target { powerpc*-*-* } } } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c
> new file mode 100644
> index 0000000..3790afc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */
> +/* { dg-options "-O2 -ffast-math" } */
> +/* { dg-options "-O2 -ffast-math -maltivec -fdump-tree-reassoc1" { target { powerpc*-*-* } } } */
> +
> +/* To test reassoc can undistribute vector bit_field_ref on bitwise AND.
> +
> +   v1, v2, v3, v4 of type vector int.
> +
> +   reassoc transforms
> +
> +     accumulator &= v1[0] & v1[1] & v1[2] & v1[3] &
> +                    v2[0] & v2[1] & v2[2] & v2[3] &
> +                    v3[0] & v3[1] & v3[2] & v3[3] &
> +                    v4[0] & v4[1] & v4[2] & v4[3] ;
> +
> +   into:
> +
> +     T = v1 & v2 & v3 & v4;
> +     accumulator &= T[0] & T[1] & T[2] & T[3];
> +
> +   Fewer bit_field_refs, only four for 128 or more bits vector.  */
> +
> +typedef int v4si __attribute__((vector_size(16)));
> +int test(int accumulator, v4si v1, v4si v2, v4si v3, v4si v4) {
> +  accumulator &= v1[0] & v1[1] & v1[2] & v1[3];
> +  accumulator &= v2[0] & v2[1] & v2[2] & v2[3];
> +  accumulator &= v3[0] & v3[1] & v3[2] & v3[3];
> +  accumulator &= v4[0] & v4[1] & v4[2] & v4[3];
> +  return accumulator;
> +}
> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 4 "reassoc1" { target { powerpc*-*-* } } } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c
> new file mode 100644
> index 0000000..1864aad
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */
> +/* { dg-options "-O2 -ffast-math" } */
> +/* { dg-options "-O2 -ffast-math -maltivec -fdump-tree-reassoc1" { target { powerpc*-*-* } } } */
> +
> +/* To test reassoc can undistribute vector bit_field_ref on bitwise IOR.
> +
> +   v1, v2, v3, v4 of type vector int.
> +
> +   reassoc transforms
> +
> +     accumulator |= v1[0] | v1[1] | v1[2] | v1[3] |
> +                    v2[0] | v2[1] | v2[2] | v2[3] |
> +                    v3[0] | v3[1] | v3[2] | v3[3] |
> +                    v4[0] | v4[1] | v4[2] | v4[3] ;
> +
> +   into:
> +
> +     T = v1 | v2 | v3 | v4;
> +     accumulator |= T[0] | T[1] | T[2] | T[3];
> +
> +   Fewer bit_field_refs, only four for 128 or more bits vector.  */
> +
> +typedef int v4si __attribute__((vector_size(16)));
> +int test(int accumulator, v4si v1, v4si v2, v4si v3, v4si v4) {
> +  accumulator |= v1[0] | v1[1] | v1[2] | v1[3];
> +  accumulator |= v2[0] | v2[1] | v2[2] | v2[3];
> +  accumulator |= v3[0] | v3[1] | v3[2] | v3[3];
> +  accumulator |= v4[0] | v4[1] | v4[2] | v4[3];
> +  return accumulator;
> +}
> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 4 "reassoc1" { target { powerpc*-*-* } } } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c
> new file mode 100644
> index 0000000..f747372
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */
> +/* { dg-options "-O2 -ffast-math" } */
> +/* { dg-options "-O2 -ffast-math -maltivec -fdump-tree-reassoc1" { target { powerpc*-*-* } } } */
> +
> +/* To test reassoc can undistribute vector bit_field_ref on bitwise XOR.
> +
> +   v1, v2, v3, v4 of type vector int.
> +
> +   reassoc transforms
> +
> +     accumulator ^= v1[0] ^ v1[1] ^ v1[2] ^ v1[3] ^
> +                    v2[0] ^ v2[1] ^ v2[2] ^ v2[3] ^
> +                    v3[0] ^ v3[1] ^ v3[2] ^ v3[3] ^
> +                    v4[0] ^ v4[1] ^ v4[2] ^ v4[3] ;
> +
> +   into:
> +
> +     T = v1 ^ v2 ^ v3 ^ v4;
> +     accumulator ^= T[0] ^ T[1] ^ T[2] ^ T[3];
> +
> +   Fewer bit_field_refs, only four for 128 or more bits vector.  */
> +
> +typedef int v4si __attribute__((vector_size(16)));
> +int test(int accumulator, v4si v1, v4si v2, v4si v3, v4si v4) {
> +  accumulator ^= v1[0] ^ v1[1] ^ v1[2] ^ v1[3];
> +  accumulator ^= v2[0] ^ v2[1] ^ v2[2] ^ v2[3];
> +  accumulator ^= v3[0] ^ v3[1] ^ v3[2] ^ v3[3];
> +  accumulator ^= v4[0] ^ v4[1] ^ v4[2] ^ v4[3];
> +  return accumulator;
> +}
> +/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 4 "reassoc1" { target { powerpc*-*-* } } } } */
> diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
> index e1c4dfe..a6cd85a 100644
> --- a/gcc/tree-ssa-reassoc.c
> +++ b/gcc/tree-ssa-reassoc.c
> @@ -1772,6 +1772,295 @@ undistribute_ops_list (enum tree_code opcode,
>    return changed;
>  }
>  
> +/* Hold the information of one specific VECTOR_TYPE SSA_NAME.
> +    - offsets: for different BIT_FIELD_REF offsets accessing same VECTOR.
> +    - ops_indexes: the index of vec ops* for each relavant BIT_FIELD_REF.  */
> +struct v_info
> +{
> +  auto_vec<unsigned HOST_WIDE_INT, 32> offsets;

with SVE this probably needs to be poly_int64 or so

> +  auto_vec<unsigned, 32> ops_indexes;
> +};

To have less allocations you could use a

     auto_vec<std::pair<unsigned HOST_WIDE_INT, unsigned>, 32> elements;

or even

 hash_map<tree, vec<std::pair<unsigned HOST_WIDE_INT, unsigned> > >

where you use .release() in the cleanup and .create (TYPE_VECTOR_SUBPARTS 
(vector_type)) during collecting and then can use quick_push ()
(ah, no - duplicates...).

> +
> +typedef struct v_info *v_info_ptr;
> +
> +/* Comparison function for qsort on unsigned BIT_FIELD_REF offsets.  */
> +static int
> +unsigned_cmp (const void *p_i, const void *p_j)
> +{
> +  if (*(const unsigned HOST_WIDE_INT *) p_i
> +      >= *(const unsigned HOST_WIDE_INT *) p_j)
> +    return 1;
> +  else
> +    return -1;

That's an issue with some qsort implementations comparing
an element against itself.

I think so you should add the equality case.

> +}
> +
> +/* Cleanup hash map for VECTOR information.  */
> +static void
> +cleanup_vinfo_map (hash_map<tree, v_info_ptr> &info_map)
> +{
> +  for (hash_map<tree, v_info_ptr>::iterator it = info_map.begin ();
> +       it != info_map.end (); ++it)
> +    {
> +      v_info_ptr info = (*it).second;
> +      delete info;
> +      (*it).second = NULL;
> +    }
> +}
> +
> +/* Perform un-distribution of BIT_FIELD_REF on VECTOR_TYPE.
> +     V1[0] + V1[1] + ... + V1[k] + V2[0] + V2[1] + ... + V2[k] + ... Vn[k]
> +   is transformed to
> +     Vs = (V1 + V2 + ... + Vn)
> +     Vs[0] + Vs[1] + ... + Vs[k]
> +
> +   The basic steps are listed below:
> +
> +    1) Check the addition chain *OPS by looking those summands coming from
> +       VECTOR bit_field_ref on VECTOR type. Put the information into
> +       v_info_map for each satisfied summand, using VECTOR SSA_NAME as key.
> +
> +    2) For each key (VECTOR SSA_NAME), validate all its BIT_FIELD_REFs are
> +       continous, they can cover the whole VECTOR perfectly without any holes.
> +       Obtain one VECTOR list which contain candidates to be transformed.
> +
> +    3) Build the addition statements for all VECTOR candidates, generate
> +       BIT_FIELD_REFs accordingly.
> +
> +   TODO:
> +    1) The current implementation restrict all candidate VECTORs should have
> +       the same VECTOR type, but it can be extended into different groups by
> +       VECTOR types in future if any profitable cases found.
> +    2) The current implementation requires the whole VECTORs should be fully
> +       covered, but it can be extended to support partial, checking adjacent
> +       but not fill the whole, it may need some cost model to define the
> +       boundary to do or not.
> +*/
> +static bool
> +undistribute_bitref_for_vector (enum tree_code opcode, vec<operand_entry *> *ops,
> +			     struct loop *loop)
> +{
> +  if (ops->length () <= 1)
> +    return false;
> +
> +  if (opcode != PLUS_EXPR && opcode != MULT_EXPR && opcode != BIT_XOR_EXPR
> +      && opcode != BIT_IOR_EXPR && opcode != BIT_AND_EXPR)
> +    return false;
> +
> +  hash_map<tree, v_info_ptr> v_info_map;
> +  operand_entry *oe1;
> +  unsigned i;
> +
> +  /* Find those summands from VECTOR BIT_FIELD_REF in addition chain, put the
> +     information into map.  */
> +  FOR_EACH_VEC_ELT (*ops, i, oe1)
> +    {
> +      enum tree_code dcode;
> +      gimple *oe1def;
> +
> +      if (TREE_CODE (oe1->op) != SSA_NAME)
> +	continue;
> +      oe1def = SSA_NAME_DEF_STMT (oe1->op);
> +      if (!is_gimple_assign (oe1def))
> +	continue;
> +      dcode = gimple_assign_rhs_code (oe1def);
> +      if (dcode != BIT_FIELD_REF || !is_reassociable_op (oe1def, dcode, loop))
> +	continue;
> +
> +      tree rhs = gimple_op (oe1def, 1);

gimple_assign_rhs1 (oe1def);

> +      tree op0 = TREE_OPERAND (rhs, 0);
> +      tree vec_type = TREE_TYPE (op0);
> +
> +      if (TREE_CODE (op0) != SSA_NAME || TREE_CODE (vec_type) != VECTOR_TYPE)
> +	continue;

VECTOR_TYPE_P

> +      tree op1 = TREE_OPERAND (rhs, 1);
> +      tree op2 = TREE_OPERAND (rhs, 2);

instead of op0/1/2 use more descriptive names for the temporaries?
bf_name, bf_size, bf_offset?

> +
> +      tree elem_type = TREE_TYPE (vec_type);
> +      unsigned HOST_WIDE_INT size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
> +      if (size != TREE_INT_CST_LOW (op1))

  if (!tree_int_cst_equal (TYPE_SIZE (elem_type), op1))

avoids some of the TREE_INT_CST_LOW we like to avoid.

You miss a check for op2 % op1 being zero.  Given you store a 
HOST_WIDE_INT offset you also want to check for INTEGER_CST op1/op2
(beware of SVE...).

There's also a poly-int friendly multiple_p predicate so you could
have the initial checks SVE friendly but bail out on non-INTEGER_CST
offset later.

> +	continue;
> +
> +      /* Ignore it if target machine can't support this VECTOR type.  */
> +      if (!VECTOR_MODE_P (TYPE_MODE (vec_type)))
> +	continue;
> +
> +      /* Ignore it if target machine can't support this type of VECTOR
> +         operation.  */
> +      optab op_tab = optab_for_tree_code (opcode, vec_type, optab_vector);
> +      if (optab_handler (op_tab, TYPE_MODE (vec_type)) == CODE_FOR_nothing)
> +	continue;
> +
> +      v_info_ptr *info_ptr = v_info_map.get (op0);
> +      if (info_ptr)
> +	{
> +	  v_info_ptr info = *info_ptr;
> +	  info->offsets.safe_push (TREE_INT_CST_LOW (op2));
> +	  info->ops_indexes.safe_push (i);
> +	}
> +      else
> +	{
> +	  v_info_ptr info = new v_info;
> +	  info->offsets.safe_push (TREE_INT_CST_LOW (op2));
> +	  info->ops_indexes.safe_push (i);
> +	  v_info_map.put (op0, info);
> +	}
> +    }
> +
> +  /* At least two VECTOR to combine.  */
> +  if (v_info_map.elements () <= 1)
> +    {
> +      cleanup_vinfo_map (v_info_map);
> +      return false;
> +    }
> +
> +  /* Use the first VECTOR and its information as the reference.
> +     Firstly, we should validate it, that is:
> +       1) sorted offsets are adjacent, no holes.
> +       2) can fill the whole VECTOR perfectly.  */
> +  hash_map<tree, v_info_ptr>::iterator it = v_info_map.begin ();
> +  tree ref_vec = (*it).first;
> +  v_info_ptr ref_info = (*it).second;
> +  ref_info->offsets.qsort (unsigned_cmp);
> +  tree vec_type = TREE_TYPE (ref_vec);
> +  tree elem_type = TREE_TYPE (vec_type);
> +  unsigned HOST_WIDE_INT elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));

please use tree_to_uhwi (TYPE_SIZE (elem_type)) so we will ICE instead
of miscompile will it ever not fit.

> +  unsigned HOST_WIDE_INT curr;
> +  unsigned HOST_WIDE_INT prev = ref_info->offsets[0];
> +
> +  /* Continous check.  */
> +  FOR_EACH_VEC_ELT_FROM (ref_info->offsets, i, curr, 1)
> +    {
> +      if (curr != (prev + elem_size))
> +	{
> +	  cleanup_vinfo_map (v_info_map);
> +	  return false;
> +	}
> +      prev = curr;
> +    }
> +
> +  /* Check whether fill the whole.  */
> +  if ((prev + elem_size) != TREE_INT_CST_LOW (TYPE_SIZE (TREE_TYPE (ref_vec))))

compare_tree_int () might be handy here.

> +    {
> +      cleanup_vinfo_map (v_info_map);
> +      return false;
> +    }
> +
> +  auto_vec<tree> vectors (v_info_map.elements ());
> +  vectors.quick_push (ref_vec);
> +
> +  /* Use the ref_vec to filter others.  */
> +  for (++it; it != v_info_map.end (); ++it)
> +    {
> +      tree vec = (*it).first;
> +      v_info_ptr info = (*it).second;
> +      if (TREE_TYPE (ref_vec) != TREE_TYPE (vec))

types_compatible_p () otherwise you reject const v2df vs. v2df for
example.

> +	continue;
> +      if (ref_info->offsets.length () != info->offsets.length ())
> +	continue;
> +      bool same_offset = true;
> +      info->offsets.qsort (unsigned_cmp);
> +      for (unsigned i = 0; i < ref_info->offsets.length (); i++)
> +	{
> +	  if (ref_info->offsets[i] != info->offsets[i])
> +	    {
> +	      same_offset = false;
> +	      break;
> +	    }
> +	}
> +      if (!same_offset)
> +	continue;
> +      vectors.quick_push (vec);
> +    }
> +
> +  if (vectors.length () < 2)
> +    {
> +      cleanup_vinfo_map (v_info_map);
> +      return false;
> +    }

Since you are using a hashtable keyed off SSA name pointers code
generation will depend on host addresses here if you consider
there's one mismatching vector type that might become ref_vec
dependent on the order of elements in the hashtable.  That's
a no-no.

Even if doing it like above looks to possibly save compile-time
by avoiding qsort calls I think the appropriate thing to do is
to partition the vector candidates into sets of compatible
vectors (consider summing two v2df and two v4df vectors for example)
and then take out ones you disqualify because they do not cover
the vector in full.

I think whether the vector is fully covered can be done way cheaper
as well by using a sbitmap and clearing a bit whenever its 
corresponding lane is in the vector (and bailing out if the bit
was already clear since you then got a duplicate).  So start
with bitmap_ones () and at the end check bitmap_empty_p ().
I don't think you depend on the vector being sorted later?


> +  tree tr;
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    {
> +      fprintf (dump_file, "The bit_field_ref vector list for undistribute: ");
> +      FOR_EACH_VEC_ELT (vectors, i, tr)
> +	{
> +	  print_generic_expr (dump_file, tr);
> +	  fprintf (dump_file, "  ");
> +	}
> +      fprintf (dump_file, "\n");
> +    }
> +
> +  /* Build the sum for all candidate VECTORs.  */
> +  unsigned idx;
> +  gimple *sum = NULL;
> +  v_info_ptr info;
> +  tree sum_vec = ref_vec;
> +  FOR_EACH_VEC_ELT_FROM (vectors, i, tr, 1)
> +    {
> +      sum = build_and_add_sum (TREE_TYPE (ref_vec), sum_vec, tr, opcode);
> +      info = *(v_info_map.get (tr));
> +      unsigned j;
> +      FOR_EACH_VEC_ELT (info->ops_indexes, j, idx)
> +	{
> +	  gimple *def = SSA_NAME_DEF_STMT ((*ops)[idx]->op);
> +	  gimple_set_visited (def, true);

you set the visited flag to get the ops definition DCEd eventually, right?
Note this in a comment.

> +	  if (opcode == PLUS_EXPR || opcode == BIT_XOR_EXPR
> +	      || opcode == BIT_IOR_EXPR)
> +	    (*ops)[idx]->op = build_zero_cst (TREE_TYPE ((*ops)[idx]->op));
> +	  else if (opcode == MULT_EXPR)
> +	    (*ops)[idx]->op = build_one_cst (TREE_TYPE ((*ops)[idx]->op));
> +	  else
> +	    {
> +	      gcc_assert (opcode == BIT_AND_EXPR);
> +	      (*ops)[idx]->op
> +		= build_all_ones_cst (TREE_TYPE ((*ops)[idx]->op));
> +	    }
> +	  (*ops)[idx]->rank = 0;
> +	}
> +      sum_vec = gimple_get_lhs (sum);
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +	{
> +	  fprintf (dump_file, "Generating addition -> ");
> +	  print_gimple_stmt (dump_file, sum, 0);
> +	}
> +    }
> +
> +  /* Referring to any good shape VECTOR (here using ref_vec), generate the
> +     BIT_FIELD_REF statements accordingly.  */
> +  info = *(v_info_map.get (ref_vec));
> +  gcc_assert (sum);
> +  FOR_EACH_VEC_ELT (info->ops_indexes, i, idx)
> +    {
> +      tree dst = make_ssa_name (elem_type);
> +      gimple *gs
> +	= gimple_build_assign (dst, BIT_FIELD_REF,
> +			       build3 (BIT_FIELD_REF, elem_type, sum_vec,
> +				       TYPE_SIZE (elem_type),
> +				       bitsize_int (info->offsets[i])));
> +      insert_stmt_after (gs, sum);
> +      update_stmt (gs);

The update_stmt is redundant.

> +      gimple *def = SSA_NAME_DEF_STMT ((*ops)[idx]->op);
> +      gimple_set_visited (def, true);

Same here - for DCE?

Otherwise looks OK to me.

> +      (*ops)[idx]->op = gimple_assign_lhs (gs);
> +      (*ops)[idx]->rank = get_rank ((*ops)[idx]->op);
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +	{
> +	  fprintf (dump_file, "Generating bit_field_ref -> ");
> +	  print_gimple_stmt (dump_file, gs, 0);
> +	}
> +    }
> +
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    {
> +      fprintf (dump_file, "undistributiong bit_field_ref for vector done.\n");
> +    }
> +
> +  cleanup_vinfo_map (v_info_map);
> +
> +  return true;
> +}
> +
>  /* If OPCODE is BIT_IOR_EXPR or BIT_AND_EXPR and CURR is a comparison
>     expression, examine the other OPS to see if any of them are comparisons
>     of the same values, which we may be able to combine or eliminate.
> @@ -5880,11 +6169,6 @@ reassociate_bb (basic_block bb)
>  	  tree lhs, rhs1, rhs2;
>  	  enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
>  
> -	  /* If this is not a gimple binary expression, there is
> -	     nothing for us to do with it.  */
> -	  if (get_gimple_rhs_class (rhs_code) != GIMPLE_BINARY_RHS)
> -	    continue;
> -
>  	  /* If this was part of an already processed statement,
>  	     we don't need to touch it again. */
>  	  if (gimple_visited_p (stmt))
> @@ -5911,6 +6195,11 @@ reassociate_bb (basic_block bb)
>  	      continue;
>  	    }
>  
> +	  /* If this is not a gimple binary expression, there is
> +	     nothing for us to do with it.  */
> +	  if (get_gimple_rhs_class (rhs_code) != GIMPLE_BINARY_RHS)
> +	    continue;
> +
>  	  lhs = gimple_assign_lhs (stmt);
>  	  rhs1 = gimple_assign_rhs1 (stmt);
>  	  rhs2 = gimple_assign_rhs2 (stmt);
> @@ -5950,6 +6239,13 @@ reassociate_bb (basic_block bb)
>  		  optimize_ops_list (rhs_code, &ops);
>  		}
>  
> +	      if (undistribute_bitref_for_vector (rhs_code, &ops,
> +						  loop_containing_stmt (stmt)))
> +		{
> +		  ops.qsort (sort_by_operand_rank);
> +		  optimize_ops_list (rhs_code, &ops);
> +		}
> +
>  	      if (rhs_code == PLUS_EXPR
>  		  && transform_add_to_multiply (&ops))
>  		ops.qsort (sort_by_operand_rank);
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG NÌrnberg)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V3] PR88497 - Extend reassoc for vector bit_field_ref
  2019-07-02 12:43 ` Richard Biener
@ 2019-07-03  3:20   ` Kewen.Lin
  2019-07-03 12:21     ` Richard Biener
  0 siblings, 1 reply; 16+ messages in thread
From: Kewen.Lin @ 2019-07-03  3:20 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Bill Schmidt, Segher Boessenkool

Hi Richard,

Thanks very much for reviewing my patch.  I'll update it as your comments.
Before sending the next version, I've several questions embedded for further check.

on 2019/7/2 下午8:43, Richard Biener wrote:
> On Wed, 20 Mar 2019, Kewen.Lin wrote:
> 
>> +/* { dg-require-effective-target vect_double } */
>> +/* { dg-require-effective-target powerpc_vsx_ok { target { powerpc*-*-* } } } */
>> +/* { dg-options "-O2 -ffast-math" } */
>> +/* { dg-options "-O2 -ffast-math -mvsx -fdump-tree-reassoc1" { target { powerpc*-*-* } } } */
> 
> Use
> 
> /* { dg-additional-options "-mvsx" { target { powerpc...
> 
> that saves duplicate typing.  I guess that x86_64/i?86 also works
> if you enable SSE2 - can you check that and do adjustments accordingly?
> 

OK, I'll learn SSE2 and update it. 

>> +/* Hold the information of one specific VECTOR_TYPE SSA_NAME.
>> +    - offsets: for different BIT_FIELD_REF offsets accessing same VECTOR.
>> +    - ops_indexes: the index of vec ops* for each relavant BIT_FIELD_REF.  */
>> +struct v_info
>> +{
>> +  auto_vec<unsigned HOST_WIDE_INT, 32> offsets;
> 
> with SVE this probably needs to be poly_int64 or so
> 

OK, will extend it to poly_int64 (can it be negative? or poly_uint64 better?)

>> +  auto_vec<unsigned, 32> ops_indexes;
>> +};
> 
> To have less allocations you could use a
> 
>      auto_vec<std::pair<unsigned HOST_WIDE_INT, unsigned>, 32> elements;
> 
> or even
> 
>  hash_map<tree, vec<std::pair<unsigned HOST_WIDE_INT, unsigned> > >
> 
> where you use .release() in the cleanup and .create (TYPE_VECTOR_SUBPARTS 
> (vector_type)) during collecting and then can use quick_push ()
> (ah, no - duplicates...).
> 

Good idea!

>> +
>> +typedef struct v_info *v_info_ptr;
>> +
>> +/* Comparison function for qsort on unsigned BIT_FIELD_REF offsets.  */
>> +static int
>> +unsigned_cmp (const void *p_i, const void *p_j)
>> +{
>> +  if (*(const unsigned HOST_WIDE_INT *) p_i
>> +      >= *(const unsigned HOST_WIDE_INT *) p_j)
>> +    return 1;
>> +  else
>> +    return -1;
> 
> That's an issue with some qsort implementations comparing
> an element against itself.
> 
> I think so you should add the equality case.
> 

The equality case seems already involved in ">=".
Do you mean that I need to make it equality case explicitly? 
Or return zero for "=="? like:

   
   const unsigned HOST_WIDE_INT val_i = *(const unsigned HOST_WIDE_INT *) p_i;
   const unsigned HOST_WIDE_INT val_j = *(const unsigned HOST_WIDE_INT *) p_j;
   if (val_i != val_j)
     return val_i > val_j? 1: -1;
   else
     return 0;

>> +
>> +   TODO:
>> +    1) The current implementation restrict all candidate VECTORs should have
>> +       the same VECTOR type, but it can be extended into different groups by
>> +       VECTOR types in future if any profitable cases found.
>> +    2) The current implementation requires the whole VECTORs should be fully
>> +       covered, but it can be extended to support partial, checking adjacent
>> +       but not fill the whole, it may need some cost model to define the
>> +       boundary to do or not.
>> +

>> +      tree elem_type = TREE_TYPE (vec_type);
>> +      unsigned HOST_WIDE_INT size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
>> +      if (size != TREE_INT_CST_LOW (op1))
> 
>   if (!tree_int_cst_equal (TYPE_SIZE (elem_type), op1))
> 
> avoids some of the TREE_INT_CST_LOW we like to avoid.
> 
> You miss a check for op2 % op1 being zero.  Given you store a 
> HOST_WIDE_INT offset you also want to check for INTEGER_CST op1/op2
> (beware of SVE...).

OK, thanks!  For op2 % op1 == zero, I thought it's a must for granted, I'll fix it.
I think it can be constructed in IR artificially, but since I have almost none knowledge 
on other targets vector support, I'm curious that does it exist in real world codes?

btw, BIT_FIELD_REF in tree.def says the op1/op2 is constant, it looks need to be updated
due to SVE?

> 
> There's also a poly-int friendly multiple_p predicate so you could
> have the initial checks SVE friendly but bail out on non-INTEGER_CST
> offset later.
> 

Got it, thanks!

> 
> Since you are using a hashtable keyed off SSA name pointers code
> generation will depend on host addresses here if you consider
> there's one mismatching vector type that might become ref_vec
> dependent on the order of elements in the hashtable.  That's
> a no-no.
> 
> Even if doing it like above looks to possibly save compile-time
> by avoiding qsort calls I think the appropriate thing to do is
> to partition the vector candidates into sets of compatible
> vectors (consider summing two v2df and two v4df vectors for example)
> and then take out ones you disqualify because they do not cover
> the vector in full.
> 

You are absolutely right, the randomness of hashtable keys order could 
be a problem here.  The partition part is TODO 1).  Since Power has only
one kind whole vector width now, doesn't have v2df and v4df co-existence,
it's put into TODO.  I will extend it in the next version of patch, add
one more hashtable from vector type mode to v_info_map, feel free to
correct me if you have any concerns.

> I think whether the vector is fully covered can be done way cheaper
> as well by using a sbitmap and clearing a bit whenever its 
> corresponding lane is in the vector (and bailing out if the bit
> was already clear since you then got a duplicate).  So start
> with bitmap_ones () and at the end check bitmap_empty_p ().
> I don't think you depend on the vector being sorted later?
> 

Good idea, yes, it doesn't depend on sorted or not.

>> +    {
>> +      sum = build_and_add_sum (TREE_TYPE (ref_vec), sum_vec, tr, opcode);
>> +      info = *(v_info_map.get (tr));
>> +      unsigned j;
>> +      FOR_EACH_VEC_ELT (info->ops_indexes, j, idx)
>> +	{
>> +	  gimple *def = SSA_NAME_DEF_STMT ((*ops)[idx]->op);
>> +	  gimple_set_visited (def, true);
> 
> you set the visited flag to get the ops definition DCEd eventually, right?
> Note this in a comment.
> 

Yes, it's for that.  Will add the comment, thanks!


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V3] PR88497 - Extend reassoc for vector bit_field_ref
  2019-07-03  3:20   ` Kewen.Lin
@ 2019-07-03 12:21     ` Richard Biener
  2019-07-08  8:14       ` [PATCH V4] " Kewen.Lin
  0 siblings, 1 reply; 16+ messages in thread
From: Richard Biener @ 2019-07-03 12:21 UTC (permalink / raw)
  To: Kewen.Lin; +Cc: GCC Patches, Bill Schmidt, Segher Boessenkool

[-- Attachment #1: Type: text/plain, Size: 7372 bytes --]

On Wed, 3 Jul 2019, Kewen.Lin wrote:

> Hi Richard,
> 
> Thanks very much for reviewing my patch.  I'll update it as your comments.
> Before sending the next version, I've several questions embedded for further check.
> 
> on 2019/7/2 下午8:43, Richard Biener wrote:
> > On Wed, 20 Mar 2019, Kewen.Lin wrote:
> > 
> >> +/* { dg-require-effective-target vect_double } */
> >> +/* { dg-require-effective-target powerpc_vsx_ok { target { powerpc*-*-* } } } */
> >> +/* { dg-options "-O2 -ffast-math" } */
> >> +/* { dg-options "-O2 -ffast-math -mvsx -fdump-tree-reassoc1" { target { powerpc*-*-* } } } */
> > 
> > Use
> > 
> > /* { dg-additional-options "-mvsx" { target { powerpc...
> > 
> > that saves duplicate typing.  I guess that x86_64/i?86 also works
> > if you enable SSE2 - can you check that and do adjustments accordingly?
> > 
> 
> OK, I'll learn SSE2 and update it. 

I think most testcases should just pass with SSE2.

> >> +/* Hold the information of one specific VECTOR_TYPE SSA_NAME.
> >> +    - offsets: for different BIT_FIELD_REF offsets accessing same VECTOR.
> >> +    - ops_indexes: the index of vec ops* for each relavant BIT_FIELD_REF.  */
> >> +struct v_info
> >> +{
> >> +  auto_vec<unsigned HOST_WIDE_INT, 32> offsets;
> > 
> > with SVE this probably needs to be poly_int64 or so
> > 
> 
> OK, will extend it to poly_int64 (can it be negative? or poly_uint64 better?)
> 
> >> +  auto_vec<unsigned, 32> ops_indexes;
> >> +};
> > 
> > To have less allocations you could use a
> > 
> >      auto_vec<std::pair<unsigned HOST_WIDE_INT, unsigned>, 32> elements;
> > 
> > or even
> > 
> >  hash_map<tree, vec<std::pair<unsigned HOST_WIDE_INT, unsigned> > >
> > 
> > where you use .release() in the cleanup and .create (TYPE_VECTOR_SUBPARTS 
> > (vector_type)) during collecting and then can use quick_push ()
> > (ah, no - duplicates...).
> > 
> 
> Good idea!
> 
> >> +
> >> +typedef struct v_info *v_info_ptr;
> >> +
> >> +/* Comparison function for qsort on unsigned BIT_FIELD_REF offsets.  */
> >> +static int
> >> +unsigned_cmp (const void *p_i, const void *p_j)
> >> +{
> >> +  if (*(const unsigned HOST_WIDE_INT *) p_i
> >> +      >= *(const unsigned HOST_WIDE_INT *) p_j)
> >> +    return 1;
> >> +  else
> >> +    return -1;
> > 
> > That's an issue with some qsort implementations comparing
> > an element against itself.
> > 
> > I think so you should add the equality case.
> > 
> 
> The equality case seems already involved in ">=".
> Do you mean that I need to make it equality case explicitly? 
> Or return zero for "=="? like:
> 
>    
>    const unsigned HOST_WIDE_INT val_i = *(const unsigned HOST_WIDE_INT *) p_i;
>    const unsigned HOST_WIDE_INT val_j = *(const unsigned HOST_WIDE_INT *) p_j;
>    if (val_i != val_j)
>      return val_i > val_j? 1: -1;
>    else
>      return 0;

Yes.  It needs to return zero, otherwise some qsort will endlessly
swap two same elements.

> >> +
> >> +   TODO:
> >> +    1) The current implementation restrict all candidate VECTORs should have
> >> +       the same VECTOR type, but it can be extended into different groups by
> >> +       VECTOR types in future if any profitable cases found.
> >> +    2) The current implementation requires the whole VECTORs should be fully
> >> +       covered, but it can be extended to support partial, checking adjacent
> >> +       but not fill the whole, it may need some cost model to define the
> >> +       boundary to do or not.
> >> +
> 
> >> +      tree elem_type = TREE_TYPE (vec_type);
> >> +      unsigned HOST_WIDE_INT size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
> >> +      if (size != TREE_INT_CST_LOW (op1))
> > 
> >   if (!tree_int_cst_equal (TYPE_SIZE (elem_type), op1))
> > 
> > avoids some of the TREE_INT_CST_LOW we like to avoid.
> > 
> > You miss a check for op2 % op1 being zero.  Given you store a 
> > HOST_WIDE_INT offset you also want to check for INTEGER_CST op1/op2
> > (beware of SVE...).
> 
> OK, thanks!  For op2 % op1 == zero, I thought it's a must for granted, I'll fix it.
> I think it can be constructed in IR artificially, but since I have almost none knowledge 
> on other targets vector support, I'm curious that does it exist in real world codes?

BIT_FIELD_REF is quite unconstrained, you could build a testcase
for the GIMPLE frontend quite easily.  Note that the first reassoc
runs before vector lowering so vector code written via vector
extensions does not necessarily have target support but will be lowered
later.

> btw, BIT_FIELD_REF in tree.def says the op1/op2 is constant, it looks need to be updated
> due to SVE?

A POLY_CONST_INT is also "constant" in some sense ;)
 
> > 
> > There's also a poly-int friendly multiple_p predicate so you could
> > have the initial checks SVE friendly but bail out on non-INTEGER_CST
> > offset later.
> > 
> 
> Got it, thanks!
> 
> > 
> > Since you are using a hashtable keyed off SSA name pointers code
> > generation will depend on host addresses here if you consider
> > there's one mismatching vector type that might become ref_vec
> > dependent on the order of elements in the hashtable.  That's
> > a no-no.
> > 
> > Even if doing it like above looks to possibly save compile-time
> > by avoiding qsort calls I think the appropriate thing to do is
> > to partition the vector candidates into sets of compatible
> > vectors (consider summing two v2df and two v4df vectors for example)
> > and then take out ones you disqualify because they do not cover
> > the vector in full.
> > 
> 
> You are absolutely right, the randomness of hashtable keys order could 
> be a problem here.  The partition part is TODO 1).  Since Power has only
> one kind whole vector width now, doesn't have v2df and v4df co-existence,
> it's put into TODO.  I will extend it in the next version of patch, add
> one more hashtable from vector type mode to v_info_map, feel free to
> correct me if you have any concerns.

I think avoiding the hashtable ordering issue is enough for now,
you could simply remember the first vector you insert (thus
pick the first candiate from the original ops[] vector).

You could also do sth like

 // move hashtable elements to a vector
 for (hashtable)
   v.push (element);
 v.qsort (sort-after-mode);

and then you have chunks of candidates with the same mode.  You
can further discriminate the candidates by their SSA name version
after the mode to get a stable sort.

Richard.

> > I think whether the vector is fully covered can be done way cheaper
> > as well by using a sbitmap and clearing a bit whenever its 
> > corresponding lane is in the vector (and bailing out if the bit
> > was already clear since you then got a duplicate).  So start
> > with bitmap_ones () and at the end check bitmap_empty_p ().
> > I don't think you depend on the vector being sorted later?
> > 
> 
> Good idea, yes, it doesn't depend on sorted or not.
> 
> >> +    {
> >> +      sum = build_and_add_sum (TREE_TYPE (ref_vec), sum_vec, tr, opcode);
> >> +      info = *(v_info_map.get (tr));
> >> +      unsigned j;
> >> +      FOR_EACH_VEC_ELT (info->ops_indexes, j, idx)
> >> +	{
> >> +	  gimple *def = SSA_NAME_DEF_STMT ((*ops)[idx]->op);
> >> +	  gimple_set_visited (def, true);
> > 
> > you set the visited flag to get the ops definition DCEd eventually, right?
> > Note this in a comment.
> > 
> 
> Yes, it's for that.  Will add the comment, thanks!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH V4] PR88497 - Extend reassoc for vector bit_field_ref
  2019-07-03 12:21     ` Richard Biener
@ 2019-07-08  8:14       ` Kewen.Lin
  2019-07-08 16:56         ` Segher Boessenkool
  2019-07-10 11:54         ` Richard Biener
  0 siblings, 2 replies; 16+ messages in thread
From: Kewen.Lin @ 2019-07-08  8:14 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Bill Schmidt, Segher Boessenkool

[-- Attachment #1: Type: text/plain, Size: 1313 bytes --]

Hi Richard,

Thanks a lot for your review and reply, I've updated the patch accordingly.

Main changes compared to previous one:
  - Consider SVE (poly_int) on bit_field_ref size/offset.
  - Filter valid candidates with sbitmap first.
  - Support multiple modes (TODO 1).
  - Test cases: add SSE2 support for 1..5, update run result check for 1,
    add two more cases (5,6)to verify multiple mode support (x86).

Bootstrapped and regression test passed on powerpc64le-unknown-linux-gnu
and x86_64-linux-gnu.

Could you help to review it again?  Thanks in advance!

Kewen

----

gcc/ChangeLog

2019-07-08  Kewen Lin  <linkw@gcc.gnu.org>

	PR tree-optimization/88497
	* tree-ssa-reassoc.c (reassociate_bb): Swap the positions of 
	GIMPLE_BINARY_RHS check and gimple_visited_p check, call new 
	function undistribute_bitref_for_vector.
	(undistribute_bitref_for_vector): New function.
	(cleanup_vinfo_map): Likewise.
	(sort_by_mach_mode): Likewise.

gcc/testsuite/ChangeLog

2019-07-08  Kewen Lin  <linkw@gcc.gnu.org>

	* gcc.dg/tree-ssa/pr88497-1.c: New test.
	* gcc.dg/tree-ssa/pr88497-2.c: Likewise.
	* gcc.dg/tree-ssa/pr88497-3.c: Likewise.
	* gcc.dg/tree-ssa/pr88497-4.c: Likewise.
	* gcc.dg/tree-ssa/pr88497-5.c: Likewise.
	* gcc.dg/tree-ssa/pr88497-6.c: Likewise.
	* gcc.dg/tree-ssa/pr88497-7.c: Likewise.



[-- Attachment #2: reassoc_v4.patch --]
[-- Type: text/plain, Size: 26108 bytes --]

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c
new file mode 100644
index 0000000..d17de42
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c
@@ -0,0 +1,60 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_double } */
+/* { dg-require-effective-target powerpc_vsx_ok { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target sse2_runtime { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-O2 -ffast-math -fdump-tree-reassoc1" } */
+/* { dg-additional-options "-mvsx" { target { powerpc*-*-* } } } */
+/* { dg-additional-options "-msse2" { target { i?86-*-* x86_64-*-* } } } */
+
+/* To test reassoc can undistribute vector bit_field_ref summation.
+
+   arg1 and arg2 are two arrays whose elements of type vector double.
+   Assuming:
+     A0 = arg1[0], A1 = arg1[1], A2 = arg1[2], A3 = arg1[3],
+     B0 = arg2[0], B1 = arg2[1], B2 = arg2[2], B3 = arg2[3],
+
+   Then:
+     V0 = A0 * B0, V1 = A1 * B1, V2 = A2 * B2, V3 = A3 * B3,
+
+   reassoc transforms
+
+     accumulator += V0[0] + V0[1] + V1[0] + V1[1] + V2[0] + V2[1]
+          + V3[0] + V3[1];
+
+   into:
+
+     T = V0 + V1 + V2 + V3
+     accumulator += T[0] + T[1];
+
+   Fewer bit_field_refs, only two for 128 or more bits vector.  */
+
+typedef double v2df __attribute__ ((vector_size (16)));
+__attribute__ ((noinline)) double
+test (double accumulator, v2df arg1[], v2df arg2[])
+{
+  v2df temp;
+  temp = arg1[0] * arg2[0];
+  accumulator += temp[0] + temp[1];
+  temp = arg1[1] * arg2[1];
+  accumulator += temp[0] + temp[1];
+  temp = arg1[2] * arg2[2];
+  accumulator += temp[0] + temp[1];
+  temp = arg1[3] * arg2[3];
+  accumulator += temp[0] + temp[1];
+  return accumulator;
+}
+
+extern void abort (void);
+
+int
+main ()
+{
+  v2df v2[4] = {{1.0, 2.0}, {4.0, 8.0}, {1.0, 3.0}, {9.0, 27.0}};
+  v2df v3[4] = {{1.0, 4.0}, {16.0, 64.0}, {1.0, 2.0}, {3.0, 4.0}};
+  double acc = 100.0;
+  double res = test (acc, v2, v3);
+  if (res != 827.0)
+    abort ();
+  return 0;
+}
+/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 2 "reassoc1" { target { powerpc*-*-* i?86-*-* x86_64-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c
new file mode 100644
index 0000000..ce34c7a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_float } */
+/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target sse2_runtime { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-O2 -ffast-math -fdump-tree-reassoc1" } */
+/* { dg-additional-options "-maltivec" { target { powerpc*-*-* } } } */
+/* { dg-additional-options "-msse2" { target { i?86-*-* x86_64-*-* } } } */
+
+/* To test reassoc can undistribute vector bit_field_ref on multiplication.
+
+   v1, v2, v3, v4 of type vector float.
+
+   reassoc transforms
+
+     accumulator *= v1[0] * v1[1] * v1[2] * v1[3] *
+                    v2[0] * v2[1] * v2[2] * v2[3] *
+                    v3[0] * v3[1] * v3[2] * v3[3] *
+                    v4[0] * v4[1] * v4[2] * v4[3] ;
+
+   into:
+
+     T = v1 * v2 * v3 * v4;
+     accumulator *= T[0] * T[1] * T[2] * T[3];
+
+   Fewer bit_field_refs, only four for 128 or more bits vector.  */
+
+typedef float v4sf __attribute__ ((vector_size (16)));
+float
+test (float accumulator, v4sf v1, v4sf v2, v4sf v3, v4sf v4)
+{
+  accumulator *= v1[0] * v1[1] * v1[2] * v1[3];
+  accumulator *= v2[0] * v2[1] * v2[2] * v2[3];
+  accumulator *= v3[0] * v3[1] * v3[2] * v3[3];
+  accumulator *= v4[0] * v4[1] * v4[2] * v4[3];
+  return accumulator;
+}
+/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 4 "reassoc1" { target { powerpc*-*-* i?86-*-* x86_64-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c
new file mode 100644
index 0000000..3627aa6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target sse2_runtime { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-O2 -ffast-math -fdump-tree-reassoc1" } */
+/* { dg-additional-options "-maltivec" { target { powerpc*-*-* } } } */
+/* { dg-additional-options "-msse2" { target { i?86-*-* x86_64-*-* } } } */
+
+/* To test reassoc can undistribute vector bit_field_ref on bitwise AND.
+
+   v1, v2, v3, v4 of type vector int.
+
+   reassoc transforms
+
+     accumulator &= v1[0] & v1[1] & v1[2] & v1[3] &
+                    v2[0] & v2[1] & v2[2] & v2[3] &
+                    v3[0] & v3[1] & v3[2] & v3[3] &
+                    v4[0] & v4[1] & v4[2] & v4[3] ;
+
+   into:
+
+     T = v1 & v2 & v3 & v4;
+     accumulator &= T[0] & T[1] & T[2] & T[3];
+
+   Fewer bit_field_refs, only four for 128 or more bits vector.  */
+
+typedef int v4si __attribute__ ((vector_size (16)));
+int
+test (int accumulator, v4si v1, v4si v2, v4si v3, v4si v4)
+{
+  accumulator &= v1[0] & v1[1] & v1[2] & v1[3];
+  accumulator &= v2[0] & v2[1] & v2[2] & v2[3];
+  accumulator &= v3[0] & v3[1] & v3[2] & v3[3];
+  accumulator &= v4[0] & v4[1] & v4[2] & v4[3];
+  return accumulator;
+}
+/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 4 "reassoc1" { target { powerpc*-*-* i?86-*-* x86_64-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c
new file mode 100644
index 0000000..6acfb60
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target sse2_runtime { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-O2 -ffast-math -fdump-tree-reassoc1" } */
+/* { dg-additional-options "-maltivec" { target { powerpc*-*-* } } } */
+/* { dg-additional-options "-msse2" { target { i?86-*-* x86_64-*-* } } } */
+
+/* To test reassoc can undistribute vector bit_field_ref on bitwise IOR.
+
+   v1, v2, v3, v4 of type vector int.
+
+   reassoc transforms
+
+     accumulator |= v1[0] | v1[1] | v1[2] | v1[3] |
+                    v2[0] | v2[1] | v2[2] | v2[3] |
+                    v3[0] | v3[1] | v3[2] | v3[3] |
+                    v4[0] | v4[1] | v4[2] | v4[3] ;
+
+   into:
+
+     T = v1 | v2 | v3 | v4;
+     accumulator |= T[0] | T[1] | T[2] | T[3];
+
+   Fewer bit_field_refs, only four for 128 or more bits vector.  */
+
+typedef int v4si __attribute__ ((vector_size (16)));
+int
+test (int accumulator, v4si v1, v4si v2, v4si v3, v4si v4)
+{
+  accumulator |= v1[0] | v1[1] | v1[2] | v1[3];
+  accumulator |= v2[0] | v2[1] | v2[2] | v2[3];
+  accumulator |= v3[0] | v3[1] | v3[2] | v3[3];
+  accumulator |= v4[0] | v4[1] | v4[2] | v4[3];
+  return accumulator;
+}
+/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 4 "reassoc1" { target { powerpc*-*-* i?86-*-* x86_64-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c
new file mode 100644
index 0000000..ee3041f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target sse2_runtime { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-O2 -ffast-math -fdump-tree-reassoc1" } */
+/* { dg-additional-options "-maltivec" { target { powerpc*-*-* } } } */
+/* { dg-additional-options "-msse2" { target { i?86-*-* x86_64-*-* } } } */
+
+/* To test reassoc can undistribute vector bit_field_ref on bitwise XOR.
+
+   v1, v2, v3, v4 of type vector int.
+
+   reassoc transforms
+
+     accumulator ^= v1[0] ^ v1[1] ^ v1[2] ^ v1[3] ^
+                    v2[0] ^ v2[1] ^ v2[2] ^ v2[3] ^
+                    v3[0] ^ v3[1] ^ v3[2] ^ v3[3] ^
+                    v4[0] ^ v4[1] ^ v4[2] ^ v4[3] ;
+
+   into:
+
+     T = v1 ^ v2 ^ v3 ^ v4;
+     accumulator ^= T[0] ^ T[1] ^ T[2] ^ T[3];
+
+   Fewer bit_field_refs, only four for 128 or more bits vector.  */
+
+typedef int v4si __attribute__ ((vector_size (16)));
+int
+test (int accumulator, v4si v1, v4si v2, v4si v3, v4si v4)
+{
+  accumulator ^= v1[0] ^ v1[1] ^ v1[2] ^ v1[3];
+  accumulator ^= v2[0] ^ v2[1] ^ v2[2] ^ v2[3];
+  accumulator ^= v3[0] ^ v3[1] ^ v3[2] ^ v3[3];
+  accumulator ^= v4[0] ^ v4[1] ^ v4[2] ^ v4[3];
+  return accumulator;
+}
+/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 4 "reassoc1" { target { powerpc*-*-* i?86-*-* x86_64-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-6.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-6.c
new file mode 100644
index 0000000..0146518
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-6.c
@@ -0,0 +1,65 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target avx512f } */
+/* { dg-options "-O2 -mavx512f -ffast-math -fdump-tree-reassoc1" } */
+
+/* To test reassoc can undistribute vector bit_field_ref on multiple
+   vector machine modes.
+
+   v1, v2 of type vector 4 x float
+   v3, v4 of type vector 8 x float
+   v5, v6 of type vector 16 x float
+
+   reassoc transforms
+
+     accumulator  +=  v1[0]  + v1[1]  + v1[2]  + v1[3]  +
+                      v2[0]  + v2[1]  + v2[2]  + v2[3]  +
+                      v3[0]  + v3[1]  + v3[2]  + v3[3]  +
+                      v3[4]  + v3[5]  + v3[6]  + v3[7]  +
+                      v4[0]  + v4[1]  + v4[2]  + v4[3]  +
+                      v4[4]  + v4[5]  + v4[6]  + v4[7]  +
+                      v5[0]  + v5[1]  + v5[2]  + v5[3]  +
+                      v5[4]  + v5[5]  + v5[6]  + v5[7]  +
+                      v5[8]  + v5[9]  + v5[10] + v5[11] +
+                      v5[12] + v5[13] + v5[14] + v5[15] +
+                      v6[0]  + v6[1]  + v6[2]  + v6[3]  +
+                      v6[4]  + v6[5]  + v6[6]  + v6[7]  +
+                      v6[8]  + v6[9]  + v6[10] + v6[11] +
+                      v6[12] + v6[13] + v6[14] + v6[15] ;
+
+   into:
+
+     T12 = v1 + v2;
+     T34 = v3 + v4;
+     T56 = v5 + v6;
+     accumulator += T12[0]  + T12[1]  + T12[2]  + T12[3]  +
+     accumulator += T34[0]  + T34[1]  + T34[2]  + T34[3]  +
+     accumulator += T34[4]  + T34[5]  + T34[6]  + T34[7]  +
+     accumulator += T56[0]  + T56[1]  + T56[2]  + T56[3]  +
+     accumulator += T56[4]  + T56[5]  + T56[6]  + T56[7]  +
+     accumulator += T56[8]  + T56[9]  + T56[10] + T56[11] +
+     accumulator += T56[12] + T56[13] + T56[14] + T56[15] ;  */
+
+typedef float v4sf __attribute__((vector_size(16)));
+typedef float v8sf __attribute__((vector_size(32)));
+typedef float v16sf __attribute__((vector_size(64)));
+
+float
+test (float accumulator, v4sf v1, v4sf v2, v8sf v3, v8sf v4, v16sf v5, v16sf v6)
+{
+  accumulator += v1[0] + v1[1] + v1[2] + v1[3];
+  accumulator += v2[0] + v2[1] + v2[2] + v2[3];
+  accumulator += v3[0] + v3[1] + v3[2] + v3[3];
+  accumulator += v3[4] + v3[5] + v3[6] + v3[7];
+  accumulator += v4[0] + v4[1] + v4[2] + v4[3];
+  accumulator += v4[4] + v4[5] + v4[6] + v4[7];
+  accumulator += v5[0] + v5[1] + v5[2] + v5[3];
+  accumulator += v5[4] + v5[5] + v5[6] + v5[7];
+  accumulator += v5[8] + v5[9] + v5[10] + v5[11];
+  accumulator += v5[12] + v5[13] + v5[14] + v5[15];
+  accumulator += v6[0] + v6[1] + v6[2] + v6[3];
+  accumulator += v6[4] + v6[5] + v6[6] + v6[7];
+  accumulator += v6[8] + v6[9] + v6[10] + v6[11];
+  accumulator += v6[12] + v6[13] + v6[14] + v6[15];
+  return accumulator;
+}
+/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 28 "reassoc1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-7.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-7.c
new file mode 100644
index 0000000..0445878
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-7.c
@@ -0,0 +1,77 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512f_runtime } */
+/* { dg-options "-O2 -mavx512f -ffast-math -fdump-tree-reassoc1" } */
+
+/* To test reassoc can undistribute vector bit_field_ref on multiple
+   vector machine modes, bypass those modes with only one candidate.
+
+   v1, v2 of type vector 4 x float
+   v3     of type vector 8 x float
+   v5, v6 of type vector 16 x float
+
+   reassoc transforms
+
+     accumulator  +=  v1[0]  + v1[1]  + v1[2]  + v1[3]  +
+                      v2[0]  + v2[1]  + v2[2]  + v2[3]  +
+                      v3[0]  + v3[1]  + v3[2]  + v3[3]  +
+                      v3[4]  + v3[5]  + v3[6]  + v3[7]  +
+                      v5[0]  + v5[1]  + v5[2]  + v5[3]  +
+                      v5[4]  + v5[5]  + v5[6]  + v5[7]  +
+                      v5[8]  + v5[9]  + v5[10] + v5[11] +
+                      v5[12] + v5[13] + v5[14] + v5[15] +
+                      v6[0]  + v6[1]  + v6[2]  + v6[3]  +
+                      v6[4]  + v6[5]  + v6[6]  + v6[7]  +
+                      v6[8]  + v6[9]  + v6[10] + v6[11] +
+                      v6[12] + v6[13] + v6[14] + v6[15] ;
+
+   into:
+
+     T12 = v1 + v2;
+     T56 = v5 + v6;
+     accumulator += T12[0]  + T12[1]  + T12[2]  + T12[3]  +
+     accumulator += v3[0]   + v3[1]   + v3[2]   + v3[3]   +
+     accumulator += v3[4]   + v3[5]   + v3[6]   + v3[7]   +
+     accumulator += T56[0]  + T56[1]  + T56[2]  + T56[3]  +
+     accumulator += T56[4]  + T56[5]  + T56[6]  + T56[7]  +
+     accumulator += T56[8]  + T56[9]  + T56[10] + T56[11] +
+     accumulator += T56[12] + T56[13] + T56[14] + T56[15] ;  */
+
+typedef float v4sf __attribute__((vector_size(16)));
+typedef float v8sf __attribute__((vector_size(32)));
+typedef float v16sf __attribute__((vector_size(64)));
+
+__attribute__ ((noinline))
+float test(float accumulator, v4sf v1, v4sf v2, v8sf v3, v16sf v5, v16sf v6) {
+  accumulator += v1[0] + v1[1] + v1[2] + v1[3];
+  accumulator += v2[0] + v2[1] + v2[2] + v2[3];
+  accumulator += v3[0] + v3[1] + v3[2] + v3[3];
+  accumulator += v3[4] + v3[5] + v3[6] + v3[7];
+  accumulator += v5[0] + v5[1] + v5[2] + v5[3];
+  accumulator += v5[4] + v5[5] + v5[6] + v5[7];
+  accumulator += v5[8] + v5[9] + v5[10] + v5[11];
+  accumulator += v5[12] + v5[13] + v5[14] + v5[15];
+  accumulator += v6[0] + v6[1] + v6[2] + v6[3];
+  accumulator += v6[4] + v6[5] + v6[6] + v6[7];
+  accumulator += v6[8] + v6[9] + v6[10] + v6[11];
+  accumulator += v6[12] + v6[13] + v6[14] + v6[15];
+  return accumulator;
+}
+
+extern void abort (void);
+
+int
+main ()
+{
+  v4sf v1 = {1.0, 2.0, 3.0, 4.0 };
+  v4sf v2 = {5.0, 6.0, 7.0, 8.0 };
+  v8sf v3 = {9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0 };
+  v16sf v5 = {17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 32.0};
+  v16sf v6 = {33.0, 34.0, 35.0, 36.0, 37.0, 38.0, 39.0, 40.0, 41.0, 42.0, 43.0, 44.0, 45.0, 46.0, 47.0, 48.0};
+  float acc = 24.0;
+  double res = test (acc, v1, v2, v3, v5, v6);
+  if (res != 1200.0)
+    abort();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 28 "reassoc1" } } */
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index 32bff97..e3c5264 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -1772,6 +1772,282 @@ undistribute_ops_list (enum tree_code opcode,
   return changed;
 }
 
+/* Pair to hold the information of one specific VECTOR_TYPE SSA_NAME:
+   first: element index for each relevant BIT_FIELD_REF.
+   second: the index of vec ops* for each relevant BIT_FIELD_REF.  */
+typedef std::pair<unsigned, unsigned> v_info_elem;
+typedef auto_vec<v_info_elem, 32> v_info;
+typedef v_info *v_info_ptr;
+
+/* Comparison function for qsort on VECTOR SSA_NAME trees by machine mode.  */
+static int
+sort_by_mach_mode (const void *p_i, const void *p_j)
+{
+  const tree tr1 = *((const tree *) p_i);
+  const tree tr2 = *((const tree *) p_j);
+  unsigned int mode1 = TYPE_MODE (TREE_TYPE (tr1));
+  unsigned int mode2 = TYPE_MODE (TREE_TYPE (tr2));
+  if (mode1 > mode2)
+    return 1;
+  else if (mode1 < mode2)
+    return -1;
+  else
+    return 0;
+}
+
+/* Cleanup hash map for VECTOR information.  */
+static void
+cleanup_vinfo_map (hash_map<tree, v_info_ptr> &info_map)
+{
+  for (hash_map<tree, v_info_ptr>::iterator it = info_map.begin ();
+       it != info_map.end (); ++it)
+    {
+      v_info_ptr info = (*it).second;
+      delete info;
+      (*it).second = NULL;
+    }
+}
+
+/* Perform un-distribution of BIT_FIELD_REF on VECTOR_TYPE.
+     V1[0] + V1[1] + ... + V1[k] + V2[0] + V2[1] + ... + V2[k] + ... Vn[k]
+   is transformed to
+     Vs = (V1 + V2 + ... + Vn)
+     Vs[0] + Vs[1] + ... + Vs[k]
+
+   The basic steps are listed below:
+
+    1) Check the addition chain *OPS by looking those summands coming from
+       VECTOR bit_field_ref on VECTOR type.  Put the information into
+       v_info_map for each satisfied summand, using VECTOR SSA_NAME as key.
+
+    2) For each key (VECTOR SSA_NAME), validate all its BIT_FIELD_REFs are
+       continuous, they can cover the whole VECTOR perfectly without any holes.
+       Obtain one VECTOR list which contain candidates to be transformed.
+
+    3) Sort the VECTOR list by machine mode of VECTOR type, for each group of
+       candidates with same mode, build the addition statements for them and
+       generate BIT_FIELD_REFs accordingly.
+
+   TODO:
+       The current implementation requires the whole VECTORs should be fully
+       covered, but it can be extended to support partial, checking adjacent
+       but not fill the whole, it may need some cost model to define the
+       boundary to do or not.
+*/
+static bool
+undistribute_bitref_for_vector (enum tree_code opcode,
+				vec<operand_entry *> *ops, struct loop *loop)
+{
+  if (ops->length () <= 1)
+    return false;
+
+  if (opcode != PLUS_EXPR && opcode != MULT_EXPR && opcode != BIT_XOR_EXPR
+      && opcode != BIT_IOR_EXPR && opcode != BIT_AND_EXPR)
+    return false;
+
+  hash_map<tree, v_info_ptr> v_info_map;
+  operand_entry *oe1;
+  unsigned i;
+
+  /* Find those summands from VECTOR BIT_FIELD_REF in addition chain, put the
+     information into map.  */
+  FOR_EACH_VEC_ELT (*ops, i, oe1)
+    {
+      enum tree_code dcode;
+      gimple *oe1def;
+
+      if (TREE_CODE (oe1->op) != SSA_NAME)
+	continue;
+      oe1def = SSA_NAME_DEF_STMT (oe1->op);
+      if (!is_gimple_assign (oe1def))
+	continue;
+      dcode = gimple_assign_rhs_code (oe1def);
+      if (dcode != BIT_FIELD_REF || !is_reassociable_op (oe1def, dcode, loop))
+	continue;
+
+      tree rhs = gimple_assign_rhs1 (oe1def);
+      tree vec = TREE_OPERAND (rhs, 0);
+      tree vec_type = TREE_TYPE (vec);
+
+      if (TREE_CODE (vec) != SSA_NAME || !VECTOR_TYPE_P (vec_type))
+	continue;
+
+      /* Check const vector type, constrain BIT_FIELD_REF offset and size.  */
+      if (!TYPE_VECTOR_SUBPARTS (vec_type).is_constant ())
+	continue;
+
+      tree elem_type = TREE_TYPE (vec_type);
+      unsigned HOST_WIDE_INT elem_size
+	= TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
+      if (maybe_ne (bit_field_size (rhs), elem_size))
+	continue;
+
+      unsigned idx;
+      if (!constant_multiple_p (bit_field_offset (rhs), elem_size, &idx))
+	continue;
+
+      /* Ignore it if target machine can't support this VECTOR type.  */
+      if (!VECTOR_MODE_P (TYPE_MODE (vec_type)))
+	continue;
+
+      /* Ignore it if target machine can't support this type of VECTOR
+	 operation.  */
+      optab op_tab = optab_for_tree_code (opcode, vec_type, optab_vector);
+      if (optab_handler (op_tab, TYPE_MODE (vec_type)) == CODE_FOR_nothing)
+	continue;
+
+      v_info_ptr *info_ptr = v_info_map.get (vec);
+      if (info_ptr)
+	{
+	  v_info_ptr info = *info_ptr;
+	  info->safe_push (std::make_pair (idx, i));
+	}
+      else
+	{
+	  v_info_ptr info = new v_info;
+	  info->safe_push (std::make_pair (idx, i));
+	  v_info_map.put (vec, info);
+	}
+    }
+
+  /* At least two VECTOR to combine.  */
+  if (v_info_map.elements () <= 1)
+    {
+      cleanup_vinfo_map (v_info_map);
+      return false;
+    }
+
+  /* Verify all VECTOR candidates by checking two conditions:
+       1) sorted offsets are adjacent, no holes.
+       2) can fill the whole VECTOR perfectly.
+     And add the valid candidates to a vector for further handling.  */
+  auto_vec<tree> valid_vecs (v_info_map.elements ());
+  unsigned mode_to_total[NUM_MACHINE_MODES];
+  memset (mode_to_total, '\0', sizeof (mode_to_total));
+  for (hash_map<tree, v_info_ptr>::iterator it = v_info_map.begin ();
+       it != v_info_map.end (); ++it)
+    {
+      tree cand_vec = (*it).first;
+      v_info_ptr cand_info = (*it).second;
+      unsigned int num_elems = VECTOR_CST_NELTS (cand_vec).to_constant ();
+      sbitmap holes = sbitmap_alloc (num_elems);
+      bitmap_ones (holes);
+      bool valid = true;
+      v_info_elem *curr;
+      FOR_EACH_VEC_ELT (*cand_info, i, curr)
+	{
+	  if (!bitmap_bit_p (holes, curr->first))
+	    {
+	      valid = false;
+	      break;
+	    }
+	  else
+	    bitmap_clear_bit (holes, curr->first);
+	}
+      if (valid && bitmap_empty_p (holes))
+	{
+	  valid_vecs.quick_push (cand_vec);
+	  /* Update the total number of valid candidates with same mode.  */
+	  mode_to_total[(int) TYPE_MODE (TREE_TYPE (cand_vec))]++;
+	}
+      sbitmap_free (holes);
+    }
+
+  valid_vecs.qsort (sort_by_mach_mode);
+  /* Go through all candidates by machine mode order, query the mode_to_total
+     to get the total number for each mode and skip the single one.  */
+  tree tvec;
+  FOR_EACH_VEC_ELT (valid_vecs, i, tvec)
+    {
+      unsigned int mode = TYPE_MODE (TREE_TYPE (tvec));
+      unsigned int count = mode_to_total[mode];
+
+      /* Build the sum for all candidates with same mode.  */
+      if (count >= 2)
+	{
+	  unsigned int limit = i + count;
+	  gcc_assert (limit <= valid_vecs.length ());
+	  unsigned int idx, j, k;
+	  gimple *sum = NULL;
+	  v_info_ptr info_ptr;
+	  tree i_vec = valid_vecs[i];
+	  tree sum_vec = i_vec;
+	  v_info_elem *elem;
+	  for (j = i + 1; j < limit; j++)
+	    {
+	      sum = build_and_add_sum (TREE_TYPE (sum_vec), sum_vec,
+				       valid_vecs[j], opcode);
+	      sum_vec = gimple_get_lhs (sum);
+	      info_ptr = *(v_info_map.get (valid_vecs[j]));
+	      /* Update those related ops of current candidate VECTOR.  */
+	      FOR_EACH_VEC_ELT (*info_ptr, k, elem)
+		{
+		  idx = elem->second;
+		  gimple *def = SSA_NAME_DEF_STMT ((*ops)[idx]->op);
+		  /* Set this then op definition will get DCEd later.  */
+		  gimple_set_visited (def, true);
+		  if (opcode == PLUS_EXPR || opcode == BIT_XOR_EXPR
+		      || opcode == BIT_IOR_EXPR)
+		    (*ops)[idx]->op
+		      = build_zero_cst (TREE_TYPE ((*ops)[idx]->op));
+		  else if (opcode == MULT_EXPR)
+		    (*ops)[idx]->op
+		      = build_one_cst (TREE_TYPE ((*ops)[idx]->op));
+		  else
+		    {
+		      gcc_assert (opcode == BIT_AND_EXPR);
+		      (*ops)[idx]->op
+			= build_all_ones_cst (TREE_TYPE ((*ops)[idx]->op));
+		    }
+		  (*ops)[idx]->rank = 0;
+		}
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		{
+		  fprintf (dump_file, "Generating addition -> ");
+		  print_gimple_stmt (dump_file, sum, 0);
+		}
+	    }
+	  /* Referring to any good shape VECTOR (here using i_vec), generate
+	     the BIT_FIELD_REF statements accordingly.  */
+	  info_ptr = *(v_info_map.get (i_vec));
+	  gcc_assert (sum);
+	  tree elem_type = TREE_TYPE (TREE_TYPE (i_vec));
+	  FOR_EACH_VEC_ELT (*info_ptr, k, elem)
+	    {
+	      idx = elem->second;
+	      tree dst = make_ssa_name (elem_type);
+	      gimple *gs = gimple_build_assign (
+		dst, BIT_FIELD_REF,
+		build3 (BIT_FIELD_REF, elem_type, sum_vec,
+			TYPE_SIZE (elem_type),
+			bitsize_int (elem->first
+				     * tree_to_uhwi (TYPE_SIZE (elem_type)))));
+	      insert_stmt_after (gs, sum);
+	      gimple *def = SSA_NAME_DEF_STMT ((*ops)[idx]->op);
+	      /* Set this then op definition will get DCEd later.  */
+	      gimple_set_visited (def, true);
+	      (*ops)[idx]->op = gimple_assign_lhs (gs);
+	      (*ops)[idx]->rank = get_rank ((*ops)[idx]->op);
+	      if (dump_file && (dump_flags & TDF_DETAILS))
+		{
+		  fprintf (dump_file, "Generating bit_field_ref -> ");
+		  print_gimple_stmt (dump_file, gs, 0);
+		}
+	    }
+	  i = limit - 1;
+	}
+    }
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "undistributiong bit_field_ref for vector done.\n");
+    }
+
+  cleanup_vinfo_map (v_info_map);
+
+  return true;
+}
+
 /* If OPCODE is BIT_IOR_EXPR or BIT_AND_EXPR and CURR is a comparison
    expression, examine the other OPS to see if any of them are comparisons
    of the same values, which we may be able to combine or eliminate.
@@ -5881,11 +6157,6 @@ reassociate_bb (basic_block bb)
 	  tree lhs, rhs1, rhs2;
 	  enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
 
-	  /* If this is not a gimple binary expression, there is
-	     nothing for us to do with it.  */
-	  if (get_gimple_rhs_class (rhs_code) != GIMPLE_BINARY_RHS)
-	    continue;
-
 	  /* If this was part of an already processed statement,
 	     we don't need to touch it again. */
 	  if (gimple_visited_p (stmt))
@@ -5912,6 +6183,11 @@ reassociate_bb (basic_block bb)
 	      continue;
 	    }
 
+	  /* If this is not a gimple binary expression, there is
+	     nothing for us to do with it.  */
+	  if (get_gimple_rhs_class (rhs_code) != GIMPLE_BINARY_RHS)
+	    continue;
+
 	  lhs = gimple_assign_lhs (stmt);
 	  rhs1 = gimple_assign_rhs1 (stmt);
 	  rhs2 = gimple_assign_rhs2 (stmt);
@@ -5950,7 +6226,12 @@ reassociate_bb (basic_block bb)
 		  ops.qsort (sort_by_operand_rank);
 		  optimize_ops_list (rhs_code, &ops);
 		}
-
+	      if (undistribute_bitref_for_vector (rhs_code, &ops,
+						  loop_containing_stmt (stmt)))
+		{
+		  ops.qsort (sort_by_operand_rank);
+		  optimize_ops_list (rhs_code, &ops);
+		}
 	      if (rhs_code == PLUS_EXPR
 		  && transform_add_to_multiply (&ops))
 		ops.qsort (sort_by_operand_rank);


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V4] PR88497 - Extend reassoc for vector bit_field_ref
  2019-07-08  8:14       ` [PATCH V4] " Kewen.Lin
@ 2019-07-08 16:56         ` Segher Boessenkool
  2019-07-09  2:37           ` Kewen.Lin
  2019-07-10 11:54         ` Richard Biener
  1 sibling, 1 reply; 16+ messages in thread
From: Segher Boessenkool @ 2019-07-08 16:56 UTC (permalink / raw)
  To: Kewen.Lin; +Cc: Richard Biener, GCC Patches, Bill Schmidt

Hi Kewen,

On Mon, Jul 08, 2019 at 04:07:00PM +0800, Kewen.Lin wrote:
> gcc/ChangeLog

(You have trailing spaces in the changelog, fwiw).

> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c
> @@ -0,0 +1,60 @@
> +/* { dg-do run } */
> +/* { dg-require-effective-target vect_double } */
> +/* { dg-require-effective-target powerpc_vsx_ok { target { powerpc*-*-* } } } */

For "dg-do run" tests, you need "powerpc_vsx_hw".  "_ok" only tests if
the assembler can handle VSX instructions, not whether the test system
can run them.  (powerpc_vsx_ok is what you need for "dg-do assemble" or
"dg-do link" tests.  It also tests if you can use -mvsx, but that doesn't
do what you might hope it does: you can use -mvsx together with a -mcpu=
that doesn't support VSX, for example).

> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c
> @@ -0,0 +1,37 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_float } */
> +/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */

This one is fine, and the rest of the tests as well.


Segher

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V4] PR88497 - Extend reassoc for vector bit_field_ref
  2019-07-08 16:56         ` Segher Boessenkool
@ 2019-07-09  2:37           ` Kewen.Lin
  2019-07-09 16:51             ` Segher Boessenkool
  0 siblings, 1 reply; 16+ messages in thread
From: Kewen.Lin @ 2019-07-09  2:37 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Richard Biener, GCC Patches, Bill Schmidt

Hi Segher,

on 2019/7/9 脡脧脦莽12:32, Segher Boessenkool wrote:
> Hi Kewen,
> 
> On Mon, Jul 08, 2019 at 04:07:00PM +0800, Kewen.Lin wrote:
>> gcc/ChangeLog
> 
> (You have trailing spaces in the changelog, fwiw).
> 

Thanks for catching!

>> --- /dev/null
>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c
>> @@ -0,0 +1,60 @@
>> +/* { dg-do run } */
>> +/* { dg-require-effective-target vect_double } */
>> +/* { dg-require-effective-target powerpc_vsx_ok { target { powerpc*-*-* } } } */
> 
> For "dg-do run" tests, you need "powerpc_vsx_hw".  "_ok" only tests if
> the assembler can handle VSX instructions, not whether the test system
> can run them.  (powerpc_vsx_ok is what you need for "dg-do assemble" or
> "dg-do link" tests.  It also tests if you can use -mvsx, but that doesn't
> do what you might hope it does: you can use -mvsx together with a -mcpu=
> that doesn't support VSX, for example).
> 

Thanks, I will update it.  But sorry that I can't find "powerpc_vsx_hw" but 
"vsx_hw_available".  I guess it's the one you are referring to?  And I happened
to find the vect_double will force powerpc to check vsx_hw_available.

This reminds me I should use sse2 instead of sse2-runtime for case 2-5.

>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c
>> @@ -0,0 +1,37 @@
>> +/* { dg-do compile } */
>> +/* { dg-require-effective-target vect_float } */
>> +/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */
> 
> This one is fine, and the rest of the tests as well.
> 
> 
> Segher
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V4] PR88497 - Extend reassoc for vector bit_field_ref
  2019-07-09  2:37           ` Kewen.Lin
@ 2019-07-09 16:51             ` Segher Boessenkool
  0 siblings, 0 replies; 16+ messages in thread
From: Segher Boessenkool @ 2019-07-09 16:51 UTC (permalink / raw)
  To: Kewen.Lin; +Cc: Richard Biener, GCC Patches, Bill Schmidt

On Tue, Jul 09, 2019 at 10:28:06AM +0800, Kewen.Lin wrote:
> on 2019/7/9 上午12:32, Segher Boessenkool wrote:
> > On Mon, Jul 08, 2019 at 04:07:00PM +0800, Kewen.Lin wrote:
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c
> >> @@ -0,0 +1,60 @@
> >> +/* { dg-do run } */
> >> +/* { dg-require-effective-target vect_double } */
> >> +/* { dg-require-effective-target powerpc_vsx_ok { target { powerpc*-*-* } } } */
> > 
> > For "dg-do run" tests, you need "powerpc_vsx_hw".  "_ok" only tests if
> > the assembler can handle VSX instructions, not whether the test system
> > can run them.  (powerpc_vsx_ok is what you need for "dg-do assemble" or
> > "dg-do link" tests.  It also tests if you can use -mvsx, but that doesn't
> > do what you might hope it does: you can use -mvsx together with a -mcpu=
> > that doesn't support VSX, for example).
> 
> Thanks, I will update it.  But sorry that I can't find "powerpc_vsx_hw" but 
> "vsx_hw_available".  I guess it's the one you are referring to?

Yeah, sorry.  You can also use just "vsx_hw".

> And I happened
> to find the vect_double will force powerpc to check vsx_hw_available.

Yes :-)  So this whole line is unnecessary.


Segher

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V4] PR88497 - Extend reassoc for vector bit_field_ref
  2019-07-08  8:14       ` [PATCH V4] " Kewen.Lin
  2019-07-08 16:56         ` Segher Boessenkool
@ 2019-07-10 11:54         ` Richard Biener
  2019-07-11 13:51           ` [PATCH V5] " Kewen.Lin
  1 sibling, 1 reply; 16+ messages in thread
From: Richard Biener @ 2019-07-10 11:54 UTC (permalink / raw)
  To: Kewen.Lin; +Cc: GCC Patches, Bill Schmidt, Segher Boessenkool

[-- Attachment #1: Type: text/plain, Size: 3344 bytes --]

On Mon, 8 Jul 2019, Kewen.Lin wrote:

> Hi Richard,
> 
> Thanks a lot for your review and reply, I've updated the patch accordingly.
> 
> Main changes compared to previous one:
>   - Consider SVE (poly_int) on bit_field_ref size/offset.
>   - Filter valid candidates with sbitmap first.
>   - Support multiple modes (TODO 1).
>   - Test cases: add SSE2 support for 1..5, update run result check for 1,
>     add two more cases (5,6)to verify multiple mode support (x86).
> 
> Bootstrapped and regression test passed on powerpc64le-unknown-linux-gnu
> and x86_64-linux-gnu.
> 
> Could you help to review it again?  Thanks in advance!

+      tree rhs = gimple_assign_rhs1 (oe1def);
+      tree vec = TREE_OPERAND (rhs, 0);
+      tree vec_type = TREE_TYPE (vec);
+
+      if (TREE_CODE (vec) != SSA_NAME || !VECTOR_TYPE_P (vec_type))
+       continue;
...
+      /* Ignore it if target machine can't support this VECTOR type.  */
+      if (!VECTOR_MODE_P (TYPE_MODE (vec_type)))
+       continue;

put the check below next to the one above, it's cheaper than the
poly-int stuff that currently preceeds it.

+      v_info_ptr *info_ptr = v_info_map.get (vec);
+      if (info_ptr)
+       {

there is get_or_insert which enables you to mostly merge the two cases
with a preceeding

  if (!existed)
    v_info_ptr = new v_info;

also eliding the extra hash-lookup the put operation otherwise requires.

+  for (hash_map<tree, v_info_ptr>::iterator it = v_info_map.begin ();
+       it != v_info_map.end (); ++it)
+    {
+      tree cand_vec = (*it).first;
+      v_info_ptr cand_info = (*it).second;
+      unsigned int num_elems = VECTOR_CST_NELTS (cand_vec).to_constant 
();

please add a quick out like

      if (cand_info->length () != num_elems)
        continue;

since to have no holes and no overlap you need exactly that many.

I think you can avoid the somewhat ugly mode_to_total counter array.
Since you have sorted valid_vecs after modes simply do

  for (unsigned i = 0; i < valid_vecs.length () - 1; ++i)
    {
      tree tvec = valid_vecs[i];
      enum machine_mode mode = TYPE_MODE (TREE_TYPE (tvec));

(please don't use unsigned int for mode!)

      /* Skip modes with only a single candidate.  */
      if (TYPE_MODE (TREE_TYPE (valid_vecs[i+1])) != mode)
        continue;

      do
        {
...
        }
      while (...)

Richard.

> Kewen
> 
> ----
> 
> gcc/ChangeLog
> 
> 2019-07-08  Kewen Lin  <linkw@gcc.gnu.org>
> 
> 	PR tree-optimization/88497
> 	* tree-ssa-reassoc.c (reassociate_bb): Swap the positions of 
> 	GIMPLE_BINARY_RHS check and gimple_visited_p check, call new 
> 	function undistribute_bitref_for_vector.
> 	(undistribute_bitref_for_vector): New function.
> 	(cleanup_vinfo_map): Likewise.
> 	(sort_by_mach_mode): Likewise.
> 
> gcc/testsuite/ChangeLog
> 
> 2019-07-08  Kewen Lin  <linkw@gcc.gnu.org>
> 
> 	* gcc.dg/tree-ssa/pr88497-1.c: New test.
> 	* gcc.dg/tree-ssa/pr88497-2.c: Likewise.
> 	* gcc.dg/tree-ssa/pr88497-3.c: Likewise.
> 	* gcc.dg/tree-ssa/pr88497-4.c: Likewise.
> 	* gcc.dg/tree-ssa/pr88497-5.c: Likewise.
> 	* gcc.dg/tree-ssa/pr88497-6.c: Likewise.
> 	* gcc.dg/tree-ssa/pr88497-7.c: Likewise.
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH V5] PR88497 - Extend reassoc for vector bit_field_ref
  2019-07-10 11:54         ` Richard Biener
@ 2019-07-11 13:51           ` Kewen.Lin
  2019-07-12 10:07             ` Richard Biener
  0 siblings, 1 reply; 16+ messages in thread
From: Kewen.Lin @ 2019-07-11 13:51 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Bill Schmidt, Segher Boessenkool

[-- Attachment #1: Type: text/plain, Size: 2957 bytes --]

Hi Richard,

on 2019/7/10 下午7:50, Richard Biener wrote:
> On Mon, 8 Jul 2019, Kewen.Lin wrote:
> 
> 
> +      tree rhs = gimple_assign_rhs1 (oe1def);
> +      tree vec = TREE_OPERAND (rhs, 0);
> +      tree vec_type = TREE_TYPE (vec);
> +
> +      if (TREE_CODE (vec) != SSA_NAME || !VECTOR_TYPE_P (vec_type))
> +       continue;
> ...
> +      /* Ignore it if target machine can't support this VECTOR type.  */
> +      if (!VECTOR_MODE_P (TYPE_MODE (vec_type)))
> +       continue;
> 
> put the check below next to the one above, it's cheaper than the
> poly-int stuff that currently preceeds it.
> 
> +      v_info_ptr *info_ptr = v_info_map.get (vec);
> +      if (info_ptr)
> +       {
> 
> there is get_or_insert which enables you to mostly merge the two cases
> with a preceeding
> 
>   if (!existed)
>     v_info_ptr = new v_info;
> 
> also eliding the extra hash-lookup the put operation otherwise requires.
> 
> +  for (hash_map<tree, v_info_ptr>::iterator it = v_info_map.begin ();
> +       it != v_info_map.end (); ++it)
> +    {
> +      tree cand_vec = (*it).first;
> +      v_info_ptr cand_info = (*it).second;
> +      unsigned int num_elems = VECTOR_CST_NELTS (cand_vec).to_constant 
> ();
> 
> please add a quick out like
> 
>       if (cand_info->length () != num_elems)
>         continue;
> 
> since to have no holes and no overlap you need exactly that many.
> 
> I think you can avoid the somewhat ugly mode_to_total counter array.
> Since you have sorted valid_vecs after modes simply do
> 
>   for (unsigned i = 0; i < valid_vecs.length () - 1; ++i)
>     {
>       tree tvec = valid_vecs[i];
>       enum machine_mode mode = TYPE_MODE (TREE_TYPE (tvec));
> 
> (please don't use unsigned int for mode!)
> 
>       /* Skip modes with only a single candidate.  */
>       if (TYPE_MODE (TREE_TYPE (valid_vecs[i+1])) != mode)
>         continue;
> 
>       do
>         {
> ...
>         }
>       while (...)
> 
> Richard.
> 

Thanks a lot for your comments, I've updated the patch accordingly (also
including Segher's comments on test cases).

Bootstrapped and regression test passed on powerpc64le-unknown-linux-gnu
and x86_64-linux-gnu.

Is it ok for trunk? 

Thanks,
Kewen

-----------

gcc/ChangeLog

2019-07-11  Kewen Lin  <linkw@gcc.gnu.org>

	PR tree-optimization/88497
	* tree-ssa-reassoc.c (reassociate_bb): Swap the positions of
	GIMPLE_BINARY_RHS check and gimple_visited_p check, call new
	function undistribute_bitref_for_vector.
	(undistribute_bitref_for_vector): New function.
	(cleanup_vinfo_map): Likewise.
	(sort_by_mach_mode): Likewise.

gcc/testsuite/ChangeLog

2019-07-11  Kewen Lin  <linkw@gcc.gnu.org>

	* gcc.dg/tree-ssa/pr88497-1.c: New test.
	* gcc.dg/tree-ssa/pr88497-2.c: Likewise.
	* gcc.dg/tree-ssa/pr88497-3.c: Likewise.
	* gcc.dg/tree-ssa/pr88497-4.c: Likewise.
	* gcc.dg/tree-ssa/pr88497-5.c: Likewise.
	* gcc.dg/tree-ssa/pr88497-6.c: Likewise.
	* gcc.dg/tree-ssa/pr88497-7.c: Likewise.



[-- Attachment #2: reassoc_v5.patch --]
[-- Type: text/plain, Size: 25919 bytes --]

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c
new file mode 100644
index 0000000..b6dd7ba
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-1.c
@@ -0,0 +1,60 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vect_double } */
+/* { dg-require-effective-target vsx_hw { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target sse2_runtime { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-O2 -ffast-math -fdump-tree-reassoc1" } */
+/* { dg-additional-options "-mvsx" { target { powerpc*-*-* } } } */
+/* { dg-additional-options "-msse2" { target { i?86-*-* x86_64-*-* } } } */
+
+/* To test reassoc can undistribute vector bit_field_ref summation.
+
+   arg1 and arg2 are two arrays whose elements of type vector double.
+   Assuming:
+     A0 = arg1[0], A1 = arg1[1], A2 = arg1[2], A3 = arg1[3],
+     B0 = arg2[0], B1 = arg2[1], B2 = arg2[2], B3 = arg2[3],
+
+   Then:
+     V0 = A0 * B0, V1 = A1 * B1, V2 = A2 * B2, V3 = A3 * B3,
+
+   reassoc transforms
+
+     accumulator += V0[0] + V0[1] + V1[0] + V1[1] + V2[0] + V2[1]
+          + V3[0] + V3[1];
+
+   into:
+
+     T = V0 + V1 + V2 + V3
+     accumulator += T[0] + T[1];
+
+   Fewer bit_field_refs, only two for 128 or more bits vector.  */
+
+typedef double v2df __attribute__ ((vector_size (16)));
+__attribute__ ((noinline)) double
+test (double accumulator, v2df arg1[], v2df arg2[])
+{
+  v2df temp;
+  temp = arg1[0] * arg2[0];
+  accumulator += temp[0] + temp[1];
+  temp = arg1[1] * arg2[1];
+  accumulator += temp[0] + temp[1];
+  temp = arg1[2] * arg2[2];
+  accumulator += temp[0] + temp[1];
+  temp = arg1[3] * arg2[3];
+  accumulator += temp[0] + temp[1];
+  return accumulator;
+}
+
+extern void abort (void);
+
+int
+main ()
+{
+  v2df v2[4] = {{1.0, 2.0}, {4.0, 8.0}, {1.0, 3.0}, {9.0, 27.0}};
+  v2df v3[4] = {{1.0, 4.0}, {16.0, 64.0}, {1.0, 2.0}, {3.0, 4.0}};
+  double acc = 100.0;
+  double res = test (acc, v2, v3);
+  if (res != 827.0)
+    abort ();
+  return 0;
+}
+/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 2 "reassoc1" { target { powerpc*-*-* i?86-*-* x86_64-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c
new file mode 100644
index 0000000..8da5920
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-2.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_float } */
+/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target sse2 { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-O2 -ffast-math -fdump-tree-reassoc1" } */
+/* { dg-additional-options "-maltivec" { target { powerpc*-*-* } } } */
+/* { dg-additional-options "-msse2" { target { i?86-*-* x86_64-*-* } } } */
+
+/* To test reassoc can undistribute vector bit_field_ref on multiplication.
+
+   v1, v2, v3, v4 of type vector float.
+
+   reassoc transforms
+
+     accumulator *= v1[0] * v1[1] * v1[2] * v1[3] *
+                    v2[0] * v2[1] * v2[2] * v2[3] *
+                    v3[0] * v3[1] * v3[2] * v3[3] *
+                    v4[0] * v4[1] * v4[2] * v4[3] ;
+
+   into:
+
+     T = v1 * v2 * v3 * v4;
+     accumulator *= T[0] * T[1] * T[2] * T[3];
+
+   Fewer bit_field_refs, only four for 128 or more bits vector.  */
+
+typedef float v4sf __attribute__ ((vector_size (16)));
+float
+test (float accumulator, v4sf v1, v4sf v2, v4sf v3, v4sf v4)
+{
+  accumulator *= v1[0] * v1[1] * v1[2] * v1[3];
+  accumulator *= v2[0] * v2[1] * v2[2] * v2[3];
+  accumulator *= v3[0] * v3[1] * v3[2] * v3[3];
+  accumulator *= v4[0] * v4[1] * v4[2] * v4[3];
+  return accumulator;
+}
+/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 4 "reassoc1" { target { powerpc*-*-* i?86-*-* x86_64-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c
new file mode 100644
index 0000000..449f282
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-3.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target sse2 { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-O2 -ffast-math -fdump-tree-reassoc1" } */
+/* { dg-additional-options "-maltivec" { target { powerpc*-*-* } } } */
+/* { dg-additional-options "-msse2" { target { i?86-*-* x86_64-*-* } } } */
+
+/* To test reassoc can undistribute vector bit_field_ref on bitwise AND.
+
+   v1, v2, v3, v4 of type vector int.
+
+   reassoc transforms
+
+     accumulator &= v1[0] & v1[1] & v1[2] & v1[3] &
+                    v2[0] & v2[1] & v2[2] & v2[3] &
+                    v3[0] & v3[1] & v3[2] & v3[3] &
+                    v4[0] & v4[1] & v4[2] & v4[3] ;
+
+   into:
+
+     T = v1 & v2 & v3 & v4;
+     accumulator &= T[0] & T[1] & T[2] & T[3];
+
+   Fewer bit_field_refs, only four for 128 or more bits vector.  */
+
+typedef int v4si __attribute__ ((vector_size (16)));
+int
+test (int accumulator, v4si v1, v4si v2, v4si v3, v4si v4)
+{
+  accumulator &= v1[0] & v1[1] & v1[2] & v1[3];
+  accumulator &= v2[0] & v2[1] & v2[2] & v2[3];
+  accumulator &= v3[0] & v3[1] & v3[2] & v3[3];
+  accumulator &= v4[0] & v4[1] & v4[2] & v4[3];
+  return accumulator;
+}
+/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 4 "reassoc1" { target { powerpc*-*-* i?86-*-* x86_64-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c
new file mode 100644
index 0000000..f7b6c91
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-4.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target sse2 { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-O2 -ffast-math -fdump-tree-reassoc1" } */
+/* { dg-additional-options "-maltivec" { target { powerpc*-*-* } } } */
+/* { dg-additional-options "-msse2" { target { i?86-*-* x86_64-*-* } } } */
+
+/* To test reassoc can undistribute vector bit_field_ref on bitwise IOR.
+
+   v1, v2, v3, v4 of type vector int.
+
+   reassoc transforms
+
+     accumulator |= v1[0] | v1[1] | v1[2] | v1[3] |
+                    v2[0] | v2[1] | v2[2] | v2[3] |
+                    v3[0] | v3[1] | v3[2] | v3[3] |
+                    v4[0] | v4[1] | v4[2] | v4[3] ;
+
+   into:
+
+     T = v1 | v2 | v3 | v4;
+     accumulator |= T[0] | T[1] | T[2] | T[3];
+
+   Fewer bit_field_refs, only four for 128 or more bits vector.  */
+
+typedef int v4si __attribute__ ((vector_size (16)));
+int
+test (int accumulator, v4si v1, v4si v2, v4si v3, v4si v4)
+{
+  accumulator |= v1[0] | v1[1] | v1[2] | v1[3];
+  accumulator |= v2[0] | v2[1] | v2[2] | v2[3];
+  accumulator |= v3[0] | v3[1] | v3[2] | v3[3];
+  accumulator |= v4[0] | v4[1] | v4[2] | v4[3];
+  return accumulator;
+}
+/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 4 "reassoc1" { target { powerpc*-*-* i?86-*-* x86_64-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c
new file mode 100644
index 0000000..cbd1224
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-5.c
@@ -0,0 +1,37 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+/* { dg-require-effective-target powerpc_altivec_ok { target { powerpc*-*-* } } } */
+/* { dg-require-effective-target sse2 { target { i?86-*-* x86_64-*-* } } } */
+/* { dg-options "-O2 -ffast-math -fdump-tree-reassoc1" } */
+/* { dg-additional-options "-maltivec" { target { powerpc*-*-* } } } */
+/* { dg-additional-options "-msse2" { target { i?86-*-* x86_64-*-* } } } */
+
+/* To test reassoc can undistribute vector bit_field_ref on bitwise XOR.
+
+   v1, v2, v3, v4 of type vector int.
+
+   reassoc transforms
+
+     accumulator ^= v1[0] ^ v1[1] ^ v1[2] ^ v1[3] ^
+                    v2[0] ^ v2[1] ^ v2[2] ^ v2[3] ^
+                    v3[0] ^ v3[1] ^ v3[2] ^ v3[3] ^
+                    v4[0] ^ v4[1] ^ v4[2] ^ v4[3] ;
+
+   into:
+
+     T = v1 ^ v2 ^ v3 ^ v4;
+     accumulator ^= T[0] ^ T[1] ^ T[2] ^ T[3];
+
+   Fewer bit_field_refs, only four for 128 or more bits vector.  */
+
+typedef int v4si __attribute__ ((vector_size (16)));
+int
+test (int accumulator, v4si v1, v4si v2, v4si v3, v4si v4)
+{
+  accumulator ^= v1[0] ^ v1[1] ^ v1[2] ^ v1[3];
+  accumulator ^= v2[0] ^ v2[1] ^ v2[2] ^ v2[3];
+  accumulator ^= v3[0] ^ v3[1] ^ v3[2] ^ v3[3];
+  accumulator ^= v4[0] ^ v4[1] ^ v4[2] ^ v4[3];
+  return accumulator;
+}
+/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 4 "reassoc1" { target { powerpc*-*-* i?86-*-* x86_64-*-* } } } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-6.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-6.c
new file mode 100644
index 0000000..0146518
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-6.c
@@ -0,0 +1,65 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target avx512f } */
+/* { dg-options "-O2 -mavx512f -ffast-math -fdump-tree-reassoc1" } */
+
+/* To test reassoc can undistribute vector bit_field_ref on multiple
+   vector machine modes.
+
+   v1, v2 of type vector 4 x float
+   v3, v4 of type vector 8 x float
+   v5, v6 of type vector 16 x float
+
+   reassoc transforms
+
+     accumulator  +=  v1[0]  + v1[1]  + v1[2]  + v1[3]  +
+                      v2[0]  + v2[1]  + v2[2]  + v2[3]  +
+                      v3[0]  + v3[1]  + v3[2]  + v3[3]  +
+                      v3[4]  + v3[5]  + v3[6]  + v3[7]  +
+                      v4[0]  + v4[1]  + v4[2]  + v4[3]  +
+                      v4[4]  + v4[5]  + v4[6]  + v4[7]  +
+                      v5[0]  + v5[1]  + v5[2]  + v5[3]  +
+                      v5[4]  + v5[5]  + v5[6]  + v5[7]  +
+                      v5[8]  + v5[9]  + v5[10] + v5[11] +
+                      v5[12] + v5[13] + v5[14] + v5[15] +
+                      v6[0]  + v6[1]  + v6[2]  + v6[3]  +
+                      v6[4]  + v6[5]  + v6[6]  + v6[7]  +
+                      v6[8]  + v6[9]  + v6[10] + v6[11] +
+                      v6[12] + v6[13] + v6[14] + v6[15] ;
+
+   into:
+
+     T12 = v1 + v2;
+     T34 = v3 + v4;
+     T56 = v5 + v6;
+     accumulator += T12[0]  + T12[1]  + T12[2]  + T12[3]  +
+     accumulator += T34[0]  + T34[1]  + T34[2]  + T34[3]  +
+     accumulator += T34[4]  + T34[5]  + T34[6]  + T34[7]  +
+     accumulator += T56[0]  + T56[1]  + T56[2]  + T56[3]  +
+     accumulator += T56[4]  + T56[5]  + T56[6]  + T56[7]  +
+     accumulator += T56[8]  + T56[9]  + T56[10] + T56[11] +
+     accumulator += T56[12] + T56[13] + T56[14] + T56[15] ;  */
+
+typedef float v4sf __attribute__((vector_size(16)));
+typedef float v8sf __attribute__((vector_size(32)));
+typedef float v16sf __attribute__((vector_size(64)));
+
+float
+test (float accumulator, v4sf v1, v4sf v2, v8sf v3, v8sf v4, v16sf v5, v16sf v6)
+{
+  accumulator += v1[0] + v1[1] + v1[2] + v1[3];
+  accumulator += v2[0] + v2[1] + v2[2] + v2[3];
+  accumulator += v3[0] + v3[1] + v3[2] + v3[3];
+  accumulator += v3[4] + v3[5] + v3[6] + v3[7];
+  accumulator += v4[0] + v4[1] + v4[2] + v4[3];
+  accumulator += v4[4] + v4[5] + v4[6] + v4[7];
+  accumulator += v5[0] + v5[1] + v5[2] + v5[3];
+  accumulator += v5[4] + v5[5] + v5[6] + v5[7];
+  accumulator += v5[8] + v5[9] + v5[10] + v5[11];
+  accumulator += v5[12] + v5[13] + v5[14] + v5[15];
+  accumulator += v6[0] + v6[1] + v6[2] + v6[3];
+  accumulator += v6[4] + v6[5] + v6[6] + v6[7];
+  accumulator += v6[8] + v6[9] + v6[10] + v6[11];
+  accumulator += v6[12] + v6[13] + v6[14] + v6[15];
+  return accumulator;
+}
+/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 28 "reassoc1" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr88497-7.c b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-7.c
new file mode 100644
index 0000000..0445878
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr88497-7.c
@@ -0,0 +1,77 @@
+/* { dg-do run } */
+/* { dg-require-effective-target avx512f_runtime } */
+/* { dg-options "-O2 -mavx512f -ffast-math -fdump-tree-reassoc1" } */
+
+/* To test reassoc can undistribute vector bit_field_ref on multiple
+   vector machine modes, bypass those modes with only one candidate.
+
+   v1, v2 of type vector 4 x float
+   v3     of type vector 8 x float
+   v5, v6 of type vector 16 x float
+
+   reassoc transforms
+
+     accumulator  +=  v1[0]  + v1[1]  + v1[2]  + v1[3]  +
+                      v2[0]  + v2[1]  + v2[2]  + v2[3]  +
+                      v3[0]  + v3[1]  + v3[2]  + v3[3]  +
+                      v3[4]  + v3[5]  + v3[6]  + v3[7]  +
+                      v5[0]  + v5[1]  + v5[2]  + v5[3]  +
+                      v5[4]  + v5[5]  + v5[6]  + v5[7]  +
+                      v5[8]  + v5[9]  + v5[10] + v5[11] +
+                      v5[12] + v5[13] + v5[14] + v5[15] +
+                      v6[0]  + v6[1]  + v6[2]  + v6[3]  +
+                      v6[4]  + v6[5]  + v6[6]  + v6[7]  +
+                      v6[8]  + v6[9]  + v6[10] + v6[11] +
+                      v6[12] + v6[13] + v6[14] + v6[15] ;
+
+   into:
+
+     T12 = v1 + v2;
+     T56 = v5 + v6;
+     accumulator += T12[0]  + T12[1]  + T12[2]  + T12[3]  +
+     accumulator += v3[0]   + v3[1]   + v3[2]   + v3[3]   +
+     accumulator += v3[4]   + v3[5]   + v3[6]   + v3[7]   +
+     accumulator += T56[0]  + T56[1]  + T56[2]  + T56[3]  +
+     accumulator += T56[4]  + T56[5]  + T56[6]  + T56[7]  +
+     accumulator += T56[8]  + T56[9]  + T56[10] + T56[11] +
+     accumulator += T56[12] + T56[13] + T56[14] + T56[15] ;  */
+
+typedef float v4sf __attribute__((vector_size(16)));
+typedef float v8sf __attribute__((vector_size(32)));
+typedef float v16sf __attribute__((vector_size(64)));
+
+__attribute__ ((noinline))
+float test(float accumulator, v4sf v1, v4sf v2, v8sf v3, v16sf v5, v16sf v6) {
+  accumulator += v1[0] + v1[1] + v1[2] + v1[3];
+  accumulator += v2[0] + v2[1] + v2[2] + v2[3];
+  accumulator += v3[0] + v3[1] + v3[2] + v3[3];
+  accumulator += v3[4] + v3[5] + v3[6] + v3[7];
+  accumulator += v5[0] + v5[1] + v5[2] + v5[3];
+  accumulator += v5[4] + v5[5] + v5[6] + v5[7];
+  accumulator += v5[8] + v5[9] + v5[10] + v5[11];
+  accumulator += v5[12] + v5[13] + v5[14] + v5[15];
+  accumulator += v6[0] + v6[1] + v6[2] + v6[3];
+  accumulator += v6[4] + v6[5] + v6[6] + v6[7];
+  accumulator += v6[8] + v6[9] + v6[10] + v6[11];
+  accumulator += v6[12] + v6[13] + v6[14] + v6[15];
+  return accumulator;
+}
+
+extern void abort (void);
+
+int
+main ()
+{
+  v4sf v1 = {1.0, 2.0, 3.0, 4.0 };
+  v4sf v2 = {5.0, 6.0, 7.0, 8.0 };
+  v8sf v3 = {9.0, 10.0, 11.0, 12.0, 13.0, 14.0, 15.0, 16.0 };
+  v16sf v5 = {17.0, 18.0, 19.0, 20.0, 21.0, 22.0, 23.0, 24.0, 25.0, 26.0, 27.0, 28.0, 29.0, 30.0, 31.0, 32.0};
+  v16sf v6 = {33.0, 34.0, 35.0, 36.0, 37.0, 38.0, 39.0, 40.0, 41.0, 42.0, 43.0, 44.0, 45.0, 46.0, 47.0, 48.0};
+  float acc = 24.0;
+  double res = test (acc, v1, v2, v3, v5, v6);
+  if (res != 1200.0)
+    abort();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "BIT_FIELD_REF" 28 "reassoc1" } } */
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index 32bff97..fdb974f 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -1772,6 +1772,274 @@ undistribute_ops_list (enum tree_code opcode,
   return changed;
 }
 
+/* Pair to hold the information of one specific VECTOR_TYPE SSA_NAME:
+   first: element index for each relevant BIT_FIELD_REF.
+   second: the index of vec ops* for each relevant BIT_FIELD_REF.  */
+typedef std::pair<unsigned, unsigned> v_info_elem;
+typedef auto_vec<v_info_elem, 32> v_info;
+typedef v_info *v_info_ptr;
+
+/* Comparison function for qsort on VECTOR SSA_NAME trees by machine mode.  */
+static int
+sort_by_mach_mode (const void *p_i, const void *p_j)
+{
+  const tree tr1 = *((const tree *) p_i);
+  const tree tr2 = *((const tree *) p_j);
+  unsigned int mode1 = TYPE_MODE (TREE_TYPE (tr1));
+  unsigned int mode2 = TYPE_MODE (TREE_TYPE (tr2));
+  if (mode1 > mode2)
+    return 1;
+  else if (mode1 < mode2)
+    return -1;
+  else
+    return 0;
+}
+
+/* Cleanup hash map for VECTOR information.  */
+static void
+cleanup_vinfo_map (hash_map<tree, v_info_ptr> &info_map)
+{
+  for (hash_map<tree, v_info_ptr>::iterator it = info_map.begin ();
+       it != info_map.end (); ++it)
+    {
+      v_info_ptr info = (*it).second;
+      delete info;
+      (*it).second = NULL;
+    }
+}
+
+/* Perform un-distribution of BIT_FIELD_REF on VECTOR_TYPE.
+     V1[0] + V1[1] + ... + V1[k] + V2[0] + V2[1] + ... + V2[k] + ... Vn[k]
+   is transformed to
+     Vs = (V1 + V2 + ... + Vn)
+     Vs[0] + Vs[1] + ... + Vs[k]
+
+   The basic steps are listed below:
+
+    1) Check the addition chain *OPS by looking those summands coming from
+       VECTOR bit_field_ref on VECTOR type.  Put the information into
+       v_info_map for each satisfied summand, using VECTOR SSA_NAME as key.
+
+    2) For each key (VECTOR SSA_NAME), validate all its BIT_FIELD_REFs are
+       continuous, they can cover the whole VECTOR perfectly without any holes.
+       Obtain one VECTOR list which contain candidates to be transformed.
+
+    3) Sort the VECTOR list by machine mode of VECTOR type, for each group of
+       candidates with same mode, build the addition statements for them and
+       generate BIT_FIELD_REFs accordingly.
+
+   TODO:
+       The current implementation requires the whole VECTORs should be fully
+       covered, but it can be extended to support partial, checking adjacent
+       but not fill the whole, it may need some cost model to define the
+       boundary to do or not.
+*/
+static bool
+undistribute_bitref_for_vector (enum tree_code opcode,
+				vec<operand_entry *> *ops, struct loop *loop)
+{
+  if (ops->length () <= 1)
+    return false;
+
+  if (opcode != PLUS_EXPR && opcode != MULT_EXPR && opcode != BIT_XOR_EXPR
+      && opcode != BIT_IOR_EXPR && opcode != BIT_AND_EXPR)
+    return false;
+
+  hash_map<tree, v_info_ptr> v_info_map;
+  operand_entry *oe1;
+  unsigned i;
+
+  /* Find those summands from VECTOR BIT_FIELD_REF in addition chain, put the
+     information into map.  */
+  FOR_EACH_VEC_ELT (*ops, i, oe1)
+    {
+      enum tree_code dcode;
+      gimple *oe1def;
+
+      if (TREE_CODE (oe1->op) != SSA_NAME)
+	continue;
+      oe1def = SSA_NAME_DEF_STMT (oe1->op);
+      if (!is_gimple_assign (oe1def))
+	continue;
+      dcode = gimple_assign_rhs_code (oe1def);
+      if (dcode != BIT_FIELD_REF || !is_reassociable_op (oe1def, dcode, loop))
+	continue;
+
+      tree rhs = gimple_assign_rhs1 (oe1def);
+      tree vec = TREE_OPERAND (rhs, 0);
+      tree vec_type = TREE_TYPE (vec);
+
+      if (TREE_CODE (vec) != SSA_NAME || !VECTOR_TYPE_P (vec_type))
+	continue;
+
+      /* Ignore it if target machine can't support this VECTOR type.  */
+      if (!VECTOR_MODE_P (TYPE_MODE (vec_type)))
+	continue;
+
+      /* Check const vector type, constrain BIT_FIELD_REF offset and size.  */
+      if (!TYPE_VECTOR_SUBPARTS (vec_type).is_constant ())
+	continue;
+
+      tree elem_type = TREE_TYPE (vec_type);
+      unsigned HOST_WIDE_INT elem_size
+	= TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
+      if (maybe_ne (bit_field_size (rhs), elem_size))
+	continue;
+
+      unsigned idx;
+      if (!constant_multiple_p (bit_field_offset (rhs), elem_size, &idx))
+	continue;
+
+      /* Ignore it if target machine can't support this type of VECTOR
+         operation.  */
+      optab op_tab = optab_for_tree_code (opcode, vec_type, optab_vector);
+      if (optab_handler (op_tab, TYPE_MODE (vec_type)) == CODE_FOR_nothing)
+	continue;
+
+      bool existed;
+      v_info_ptr &info = v_info_map.get_or_insert (vec, &existed);
+      if (!existed)
+	info = new v_info;
+      info->safe_push (std::make_pair (idx, i));
+    }
+
+  /* At least two VECTOR to combine.  */
+  if (v_info_map.elements () <= 1)
+    {
+      cleanup_vinfo_map (v_info_map);
+      return false;
+    }
+
+  /* Verify all VECTOR candidates by checking two conditions:
+       1) sorted offsets are adjacent, no holes.
+       2) can fill the whole VECTOR perfectly.
+     And add the valid candidates to a vector for further handling.  */
+  auto_vec<tree> valid_vecs (v_info_map.elements ());
+  for (hash_map<tree, v_info_ptr>::iterator it = v_info_map.begin ();
+       it != v_info_map.end (); ++it)
+    {
+      tree cand_vec = (*it).first;
+      v_info_ptr cand_info = (*it).second;
+      unsigned int num_elems = VECTOR_CST_NELTS (cand_vec).to_constant ();
+      if (cand_info->length () != num_elems)
+	continue;
+      sbitmap holes = sbitmap_alloc (num_elems);
+      bitmap_ones (holes);
+      bool valid = true;
+      v_info_elem *curr;
+      FOR_EACH_VEC_ELT (*cand_info, i, curr)
+	{
+	  if (!bitmap_bit_p (holes, curr->first))
+	    {
+	      valid = false;
+	      break;
+	    }
+	  else
+	    bitmap_clear_bit (holes, curr->first);
+	}
+      if (valid && bitmap_empty_p (holes))
+	valid_vecs.quick_push (cand_vec);
+      sbitmap_free (holes);
+    }
+
+  /* At least two VECTOR to combine.  */
+  if (valid_vecs.length () <= 1)
+    {
+      cleanup_vinfo_map (v_info_map);
+      return false;
+    }
+
+  valid_vecs.qsort (sort_by_mach_mode);
+  /* Go through all candidates by machine mode order, query the mode_to_total
+     to get the total number for each mode and skip the single one.  */
+  for (unsigned i = 0; i < valid_vecs.length () - 1; ++i)
+    {
+      tree tvec = valid_vecs[i];
+      enum machine_mode mode = TYPE_MODE (TREE_TYPE (tvec));
+
+      /* Skip modes with only a single candidate.  */
+      if (TYPE_MODE (TREE_TYPE (valid_vecs[i + 1])) != mode)
+	continue;
+
+      unsigned int idx, j;
+      gimple *sum = NULL;
+      v_info_ptr info_ptr;
+      tree sum_vec = tvec;
+      v_info_elem *elem;
+
+      /* Build the sum for all candidates with same mode.  */
+      do
+	{
+	  sum = build_and_add_sum (TREE_TYPE (sum_vec), sum_vec,
+				   valid_vecs[i + 1], opcode);
+	  sum_vec = gimple_get_lhs (sum);
+	  info_ptr = *(v_info_map.get (valid_vecs[i + 1]));
+	  /* Update those related ops of current candidate VECTOR.  */
+	  FOR_EACH_VEC_ELT (*info_ptr, j, elem)
+	    {
+	      idx = elem->second;
+	      gimple *def = SSA_NAME_DEF_STMT ((*ops)[idx]->op);
+	      /* Set this then op definition will get DCEd later.  */
+	      gimple_set_visited (def, true);
+	      if (opcode == PLUS_EXPR || opcode == BIT_XOR_EXPR
+		  || opcode == BIT_IOR_EXPR)
+		(*ops)[idx]->op = build_zero_cst (TREE_TYPE ((*ops)[idx]->op));
+	      else if (opcode == MULT_EXPR)
+		(*ops)[idx]->op = build_one_cst (TREE_TYPE ((*ops)[idx]->op));
+	      else
+		{
+		  gcc_assert (opcode == BIT_AND_EXPR);
+		  (*ops)[idx]->op
+		    = build_all_ones_cst (TREE_TYPE ((*ops)[idx]->op));
+		}
+	      (*ops)[idx]->rank = 0;
+	    }
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    {
+	      fprintf (dump_file, "Generating addition -> ");
+	      print_gimple_stmt (dump_file, sum, 0);
+	    }
+	  i++;
+	}
+      while ((i < valid_vecs.length () - 1)
+	     && TYPE_MODE (TREE_TYPE (valid_vecs[i + 1])) == mode);
+
+      /* Referring to first valid VECTOR with this mode, generate the
+         BIT_FIELD_REF statements accordingly.  */
+      info_ptr = *(v_info_map.get (tvec));
+      gcc_assert (sum);
+      tree elem_type = TREE_TYPE (TREE_TYPE (tvec));
+      FOR_EACH_VEC_ELT (*info_ptr, j, elem)
+	{
+	  idx = elem->second;
+	  tree dst = make_ssa_name (elem_type);
+	  gimple *gs = gimple_build_assign (
+	    dst, BIT_FIELD_REF,
+	    build3 (BIT_FIELD_REF, elem_type, sum_vec, TYPE_SIZE (elem_type),
+		    bitsize_int (elem->first
+				 * tree_to_uhwi (TYPE_SIZE (elem_type)))));
+	  insert_stmt_after (gs, sum);
+	  gimple *def = SSA_NAME_DEF_STMT ((*ops)[idx]->op);
+	  /* Set this then op definition will get DCEd later.  */
+	  gimple_set_visited (def, true);
+	  (*ops)[idx]->op = gimple_assign_lhs (gs);
+	  (*ops)[idx]->rank = get_rank ((*ops)[idx]->op);
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    {
+	      fprintf (dump_file, "Generating bit_field_ref -> ");
+	      print_gimple_stmt (dump_file, gs, 0);
+	    }
+	}
+    }
+
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    fprintf (dump_file, "undistributiong bit_field_ref for vector done.\n");
+
+  cleanup_vinfo_map (v_info_map);
+
+  return true;
+}
+
 /* If OPCODE is BIT_IOR_EXPR or BIT_AND_EXPR and CURR is a comparison
    expression, examine the other OPS to see if any of them are comparisons
    of the same values, which we may be able to combine or eliminate.
@@ -5881,11 +6149,6 @@ reassociate_bb (basic_block bb)
 	  tree lhs, rhs1, rhs2;
 	  enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
 
-	  /* If this is not a gimple binary expression, there is
-	     nothing for us to do with it.  */
-	  if (get_gimple_rhs_class (rhs_code) != GIMPLE_BINARY_RHS)
-	    continue;
-
 	  /* If this was part of an already processed statement,
 	     we don't need to touch it again. */
 	  if (gimple_visited_p (stmt))
@@ -5912,6 +6175,11 @@ reassociate_bb (basic_block bb)
 	      continue;
 	    }
 
+	  /* If this is not a gimple binary expression, there is
+	     nothing for us to do with it.  */
+	  if (get_gimple_rhs_class (rhs_code) != GIMPLE_BINARY_RHS)
+	    continue;
+
 	  lhs = gimple_assign_lhs (stmt);
 	  rhs1 = gimple_assign_rhs1 (stmt);
 	  rhs2 = gimple_assign_rhs2 (stmt);
@@ -5950,7 +6218,12 @@ reassociate_bb (basic_block bb)
 		  ops.qsort (sort_by_operand_rank);
 		  optimize_ops_list (rhs_code, &ops);
 		}
-
+	      if (undistribute_bitref_for_vector (rhs_code, &ops,
+						  loop_containing_stmt (stmt)))
+		{
+		  ops.qsort (sort_by_operand_rank);
+		  optimize_ops_list (rhs_code, &ops);
+		}
 	      if (rhs_code == PLUS_EXPR
 		  && transform_add_to_multiply (&ops))
 		ops.qsort (sort_by_operand_rank);


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH V5] PR88497 - Extend reassoc for vector bit_field_ref
  2019-07-11 13:51           ` [PATCH V5] " Kewen.Lin
@ 2019-07-12 10:07             ` Richard Biener
  0 siblings, 0 replies; 16+ messages in thread
From: Richard Biener @ 2019-07-12 10:07 UTC (permalink / raw)
  To: Kewen.Lin; +Cc: GCC Patches, Bill Schmidt, Segher Boessenkool

[-- Attachment #1: Type: text/plain, Size: 3423 bytes --]

On Thu, 11 Jul 2019, Kewen.Lin wrote:

> Hi Richard,
> 
> on 2019/7/10 下午7:50, Richard Biener wrote:
> > On Mon, 8 Jul 2019, Kewen.Lin wrote:
> > 
> > 
> > +      tree rhs = gimple_assign_rhs1 (oe1def);
> > +      tree vec = TREE_OPERAND (rhs, 0);
> > +      tree vec_type = TREE_TYPE (vec);
> > +
> > +      if (TREE_CODE (vec) != SSA_NAME || !VECTOR_TYPE_P (vec_type))
> > +       continue;
> > ...
> > +      /* Ignore it if target machine can't support this VECTOR type.  */
> > +      if (!VECTOR_MODE_P (TYPE_MODE (vec_type)))
> > +       continue;
> > 
> > put the check below next to the one above, it's cheaper than the
> > poly-int stuff that currently preceeds it.
> > 
> > +      v_info_ptr *info_ptr = v_info_map.get (vec);
> > +      if (info_ptr)
> > +       {
> > 
> > there is get_or_insert which enables you to mostly merge the two cases
> > with a preceeding
> > 
> >   if (!existed)
> >     v_info_ptr = new v_info;
> > 
> > also eliding the extra hash-lookup the put operation otherwise requires.
> > 
> > +  for (hash_map<tree, v_info_ptr>::iterator it = v_info_map.begin ();
> > +       it != v_info_map.end (); ++it)
> > +    {
> > +      tree cand_vec = (*it).first;
> > +      v_info_ptr cand_info = (*it).second;
> > +      unsigned int num_elems = VECTOR_CST_NELTS (cand_vec).to_constant 
> > ();
> > 
> > please add a quick out like
> > 
> >       if (cand_info->length () != num_elems)
> >         continue;
> > 
> > since to have no holes and no overlap you need exactly that many.
> > 
> > I think you can avoid the somewhat ugly mode_to_total counter array.
> > Since you have sorted valid_vecs after modes simply do
> > 
> >   for (unsigned i = 0; i < valid_vecs.length () - 1; ++i)
> >     {
> >       tree tvec = valid_vecs[i];
> >       enum machine_mode mode = TYPE_MODE (TREE_TYPE (tvec));
> > 
> > (please don't use unsigned int for mode!)
> > 
> >       /* Skip modes with only a single candidate.  */
> >       if (TYPE_MODE (TREE_TYPE (valid_vecs[i+1])) != mode)
> >         continue;
> > 
> >       do
> >         {
> > ...
> >         }
> >       while (...)
> > 
> > Richard.
> > 
> 
> Thanks a lot for your comments, I've updated the patch accordingly (also
> including Segher's comments on test cases).
> 
> Bootstrapped and regression test passed on powerpc64le-unknown-linux-gnu
> and x86_64-linux-gnu.
> 
> Is it ok for trunk? 

This is OK.

Thanks,
Richard.

> Thanks,
> Kewen
> 
> -----------
> 
> gcc/ChangeLog
> 
> 2019-07-11  Kewen Lin  <linkw@gcc.gnu.org>
> 
> 	PR tree-optimization/88497
> 	* tree-ssa-reassoc.c (reassociate_bb): Swap the positions of
> 	GIMPLE_BINARY_RHS check and gimple_visited_p check, call new
> 	function undistribute_bitref_for_vector.
> 	(undistribute_bitref_for_vector): New function.
> 	(cleanup_vinfo_map): Likewise.
> 	(sort_by_mach_mode): Likewise.
> 
> gcc/testsuite/ChangeLog
> 
> 2019-07-11  Kewen Lin  <linkw@gcc.gnu.org>
> 
> 	* gcc.dg/tree-ssa/pr88497-1.c: New test.
> 	* gcc.dg/tree-ssa/pr88497-2.c: Likewise.
> 	* gcc.dg/tree-ssa/pr88497-3.c: Likewise.
> 	* gcc.dg/tree-ssa/pr88497-4.c: Likewise.
> 	* gcc.dg/tree-ssa/pr88497-5.c: Likewise.
> 	* gcc.dg/tree-ssa/pr88497-6.c: Likewise.
> 	* gcc.dg/tree-ssa/pr88497-7.c: Likewise.
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2019-07-12 10:06 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-20  3:33 [PATCH V3] PR88497 - Extend reassoc for vector bit_field_ref Kewen.Lin
2019-04-03 22:00 ` [PING] " Kewen.Lin
2019-05-05  6:15 ` Kewen.Lin
2019-05-21  2:03   ` Kewen.Lin
2019-06-11  2:46     ` [PING^4] " Kewen.Lin
2019-06-26  5:37       ` [PING^5] " Kewen.Lin
2019-07-02 12:43 ` Richard Biener
2019-07-03  3:20   ` Kewen.Lin
2019-07-03 12:21     ` Richard Biener
2019-07-08  8:14       ` [PATCH V4] " Kewen.Lin
2019-07-08 16:56         ` Segher Boessenkool
2019-07-09  2:37           ` Kewen.Lin
2019-07-09 16:51             ` Segher Boessenkool
2019-07-10 11:54         ` Richard Biener
2019-07-11 13:51           ` [PATCH V5] " Kewen.Lin
2019-07-12 10:07             ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).