public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH PR78005]Fix miscompare issue by computing correct guard condition for vectorized loop
@ 2016-10-19 10:39 Bin Cheng
  2016-10-19 10:41 ` Richard Biener
  2017-06-10  9:40 ` Richard Sandiford
  0 siblings, 2 replies; 9+ messages in thread
From: Bin Cheng @ 2016-10-19 10:39 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd

[-- Attachment #1: Type: text/plain, Size: 1931 bytes --]

Hi,
This is the patch fixing spec mis-compare issue reported in PR78005.  In the original patch, boundary condition was mis-handled when generating guard checking if vectorized/scalar loop should be preferred.  This isn't an issue in general, but in cases we need to peel for gaps, as well as skipping branch of epilogue loop can be optimized out, it could result in more iterations being executed.  That's why the mis-compare issue happens.  This patch corrects the bug, also cleans up existing code that computes loop niters and its upper bound.  The refactoring improves niters bound information for epilogue loop, thus the patch adjusts related tests.  An execution test is also added.

With the patch, mis-compare issue is fixed.  Bootstrap and test on x86_64 and AArch64.  Is it OK?

Thanks,
bin

2016-10-18  Bin Cheng  <bin.cheng@arm.com>

	PR tree-optimization/78005
	* tree-vect-loop-manip.c (vect_gen_prolog_loop_niters): Compute
	upper (included) bound for niters of prolog loop.
	(vect_gen_scalar_loop_niters): Change parameter VF to VFM1.
	Compute niters of scalar loop above which vectorized loop is
	preferred, as well as the upper (included) bound for the niters.
	(vect_do_peeling): Record niter bound for loops accordingly.

gcc/testsuite/ChangeLog
2016-10-18  Bin Cheng  <bin.cheng@arm.com>

	PR tree-optimization/78005
	* gcc.dg/vect/pr78005.c: New.
	* gcc.target/i386/l_fma_float_1.c: Revise test.
	* gcc.target/i386/l_fma_float_2.c: Ditto.
	* gcc.target/i386/l_fma_float_3.c: Ditto.
	* gcc.target/i386/l_fma_float_4.c: Ditto.
	* gcc.target/i386/l_fma_float_5.c: Ditto.
	* gcc.target/i386/l_fma_float_6.c: Ditto.
	* gcc.target/i386/l_fma_double_1.c: Ditto.
	* gcc.target/i386/l_fma_double_2.c: Ditto.
	* gcc.target/i386/l_fma_double_3.c: Ditto.
	* gcc.target/i386/l_fma_double_4.c: Ditto.
	* gcc.target/i386/l_fma_double_5.c: Ditto.
	* gcc.target/i386/l_fma_double_6.c: Ditto.

[-- Attachment #2: pr78005-20161016.txt --]
[-- Type: text/plain, Size: 21668 bytes --]

diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 291ecd9..6bfd332 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -904,7 +904,7 @@ vect_update_ivs_after_vectorizer (loop_vec_info loop_vinfo,
    is the inner type of the vectype)
 
    The computations will be emitted at the end of BB.  We also compute and
-   store upper bound of the result in BOUND.
+   store upper bound (included) of the result in BOUND.
 
    When the step of the data-ref in the loop is not 1 (as in interleaved data
    and SLP), the number of iterations of the prolog must be divided by the step
@@ -941,7 +941,7 @@ vect_gen_prolog_loop_niters (loop_vec_info loop_vinfo,
                          "known peeling = %d.\n", npeel);
 
       iters = build_int_cst (niters_type, npeel);
-      *bound = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) + 1;
+      *bound = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo);
     }
   else
     {
@@ -976,7 +976,7 @@ vect_gen_prolog_loop_niters (loop_vec_info loop_vinfo,
 	iters = fold_build2 (MINUS_EXPR, type, nelements_tree, elem_misalign);
       iters = fold_build2 (BIT_AND_EXPR, type, iters, nelements_minus_1);
       iters = fold_convert (niters_type, iters);
-      *bound = nelements;
+      *bound = nelements - 1;
     }
 
   if (dump_enabled_p ())
@@ -1090,43 +1090,47 @@ vect_build_loop_niters (loop_vec_info loop_vinfo)
     }
 }
 
-/* Calculate the number of iterations under which scalar loop will be
-   preferred than vectorized loop.  NITERS_PROLOG is the number of
-   iterations of prolog loop.  If it's integer const, the integer
-   number is also passed by INT_NITERS_PROLOG.  VF is vector factor;
-   TH is the threshold for vectorized loop if CHECK_PROFITABILITY is
-   true.  This function also store upper bound of the result in BOUND.  */
+/* Calculate the number of iterations above which vectorized loop will be
+   preferred than scalar loop.  NITERS_PROLOG is the number of iterations
+   of prolog loop.  If it's integer const, the integer number is also passed
+   in INT_NITERS_PROLOG.  BOUND_PROLOG is the upper bound (included) of
+   number of iterations of prolog loop.  VFM1 is vector factor minus one.
+   If CHECK_PROFITABILITY is true, TH is the threshold below which scalar
+   (rather than vectorized) loop will be executed.  This function stores
+   upper bound (included) of the result in BOUND_SCALAR.  */
 
 static tree
 vect_gen_scalar_loop_niters (tree niters_prolog, int int_niters_prolog,
-			     int bound_prolog, int vf, int th, int *bound,
-			     bool check_profitability)
+			     int bound_prolog, int vfm1, int th,
+			     int *bound_scalar, bool check_profitability)
 {
   tree type = TREE_TYPE (niters_prolog);
   tree niters = fold_build2 (PLUS_EXPR, type, niters_prolog,
-			     build_int_cst (type, vf));
+			     build_int_cst (type, vfm1));
 
-  *bound = vf + bound_prolog;
+  *bound_scalar = vfm1 + bound_prolog;
   if (check_profitability)
     {
-      th++;
+      /* TH indicates the minimum niters of vectorized loop, while we
+	 compute the maximum niters of scalar loop.  */
+      th--;
       /* Peeling for constant times.  */
       if (int_niters_prolog >= 0)
 	{
-	  *bound = (int_niters_prolog + vf < th
-					   ? th
-					   : vf + int_niters_prolog);
-	  return build_int_cst (type, *bound);
+	  *bound_scalar = (int_niters_prolog + vfm1 < th
+			    ? th
+			    : vfm1 + int_niters_prolog);
+	  return build_int_cst (type, *bound_scalar);
 	}
-      /* Peeling for unknown times, in this case, prolog loop must
-	 execute less than bound_prolog times.  */
-      if (th >=  vf + bound_prolog - 1)
+      /* Peeling for unknown times.  Note BOUND_PROLOG is the upper
+	 bound (inlcuded) of niters of prolog loop.  */
+      if (th >=  vfm1 + bound_prolog)
 	{
-	  *bound = th;
+	  *bound_scalar = th;
 	  return build_int_cst (type, th);
 	}
-      /* Need to do runtime comparison, but bound remains the same.  */
-      else if (th > vf)
+      /* Need to do runtime comparison, but BOUND_SCALAR remains the same.  */
+      else if (th > vfm1)
 	return fold_build2 (MAX_EXPR, type, build_int_cst (type, th), niters);
     }
   return niters;
@@ -1620,7 +1624,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
   tree type = TREE_TYPE (niters), guard_cond;
   basic_block guard_bb, guard_to;
   int prob_prolog, prob_vector, prob_epilog;
-  int bound_prolog = 0, bound_epilog = 0, bound = 0;
+  int bound_prolog = 0, bound_scalar = 0, bound = 0;
   int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
   int prolog_peeling = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo);
   bool epilog_peeling = (LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo)
@@ -1721,9 +1725,9 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 		       LOOP_VINFO_NITERSM1 (loop_vinfo), niters_prolog);
       niters = vect_build_loop_niters (loop_vinfo);
 
-      /* Prolog iterates at most bound_prolog - 1 times, latch iterates
-	 at most bound_prolog - 2 times.  */
-      record_niter_bound (prolog, bound_prolog - 2, false, true);
+      /* Prolog iterates at most bound_prolog times, latch iterates at
+	 most bound_prolog - 1 times.  */
+      record_niter_bound (prolog, bound_prolog - 1, false, true);
       delete_update_ssa ();
       adjust_vec_debug_stmts ();
       scev_reset ();
@@ -1754,16 +1758,15 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	 won't be vectorized.  */
       if (skip_vector)
 	{
-	  /* Guard_cond needs is based on NITERSM1 because NITERS might
-	     overflow, so here it is niters_scalar - 1 generated.  In
-	     other words, both niters_scalar and bound_epilog are for
-	     scalar loop's latch.  */
+	  /* Additional epilogue iteration is peeled if gap exists.  */
+	  bool peel_for_gaps = LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo);
 	  tree t = vect_gen_scalar_loop_niters (niters_prolog, prolog_peeling,
-						bound_prolog, vf - 1, th - 1,
-						&bound_epilog,
+						bound_prolog,
+						peel_for_gaps ? vf : vf - 1,
+						th, &bound_scalar,
 						check_profitability);
-	  guard_cond = fold_build2 (LT_EXPR, boolean_type_node,
-				    nitersm1, t);
+	  /* Build guard against NITERSM1 since NITERS may overflow.  */
+	  guard_cond = fold_build2 (LT_EXPR, boolean_type_node, nitersm1, t);
 	  guard_bb = anchor;
 	  guard_to = split_edge (loop_preheader_edge (epilog));
 	  guard_e = slpeel_add_loop_guard (guard_bb, guard_cond,
@@ -1772,7 +1775,7 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
 	  e = EDGE_PRED (guard_to, 0);
 	  e = (e != guard_e ? e : EDGE_PRED (guard_to, 1));
 	  slpeel_update_phi_nodes_for_guard1 (first_loop, epilog, guard_e, e);
-	  scale_loop_profile (epilog, prob_vector, bound_epilog);
+	  scale_loop_profile (epilog, prob_vector, bound_scalar);
 	}
 
       tree niters_vector_mult_vf;
@@ -1807,10 +1810,10 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1,
       else
 	slpeel_update_phi_nodes_for_lcssa (epilog);
 
-      bound = (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) ? vf * 2 : vf) - 2;
+      bound = LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo) ? vf - 1 : vf - 2;
       /* We share epilog loop with scalar version loop.  */
-      bound_epilog = MAX (bound, bound_epilog - 1);
-      record_niter_bound (epilog, bound_epilog, false, true);
+      bound = MAX (bound, bound_scalar - 1);
+      record_niter_bound (epilog, bound, false, true);
 
       delete_update_ssa ();
       adjust_vec_debug_stmts ();
diff --git a/gcc/testsuite/gcc.dg/vect/pr78005.c b/gcc/testsuite/gcc.dg/vect/pr78005.c
new file mode 100644
index 0000000..7cefe73
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr78005.c
@@ -0,0 +1,48 @@
+/* { dg-require-effective-target vect_int } */
+#include "tree-vect.h"
+
+#define N 20
+int u[N] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19};
+int z[N] = {-1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,  10, 11, 12, 13, 14, 15, 16, 17, 18};
+int res4[N] = {0, 1, 8, 3, 22, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19};
+int res5[N] = {0, 1, 8, 3, 22, 5, 36, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19};
+int res6[N] = {0, 1, 8, 3, 22, 5, 36, 7, 50, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19};
+int res7[N] = {0, 1, 8, 3, 22, 5, 36, 7, 50, 9, 64, 11, 12, 13, 14, 15, 16, 17, 18, 19};
+int res8[N] = {0, 1, 8, 3, 22, 5, 36, 7, 50, 9, 64, 11, 78, 13, 14, 15, 16, 17, 18, 19};
+int res9[N] = {0, 1, 8, 3, 22, 5, 36, 7, 50, 9, 64, 11, 78, 13, 92, 15, 16, 17, 18, 19};
+int res10[N] = {0, 1, 8, 3, 22, 5, 36, 7, 50, 9, 64, 11, 78, 13, 92, 15, 106, 17, 18, 19};
+
+__attribute__ ((noinline)) void
+foo (int n, int d)
+{
+  int i;
+  for (i = 2; i < n; i++)
+    u[2*i-2] = u[2*i-2] + d * (z[i-1] + z[i] + z[i-1] + z[i] + z[i-1] + z[i]);
+}
+
+#define check_u(x)		\
+  foo (x, 2);			\
+  for (i = 0; i < N; i++)	\
+    {				\
+      if (u[i] != res##x[i])	\
+	abort ();		\
+      u[i] = i;			\
+    }
+
+int main(void)
+{
+  int i;
+
+  check_vect ();
+
+  /* Need to check for all possible vector factors.  */
+  check_u(4);
+  check_u(5);
+  check_u(6);
+  check_u(7);
+  check_u(8);
+  check_u(9);
+  check_u(10);
+
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.target/i386/l_fma_double_1.c b/gcc/testsuite/gcc.target/i386/l_fma_double_1.c
index ea90a35..c8ea28a 100644
--- a/gcc/testsuite/gcc.target/i386/l_fma_double_1.c
+++ b/gcc/testsuite/gcc.target/i386/l_fma_double_1.c
@@ -13,7 +13,7 @@ typedef double adouble __attribute__((aligned(sizeof (double))));
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+pd" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 88 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 88 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 88 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 88 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 80 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 80 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 80 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 80 } } */
diff --git a/gcc/testsuite/gcc.target/i386/l_fma_double_2.c b/gcc/testsuite/gcc.target/i386/l_fma_double_2.c
index d604d57..bd873bc 100644
--- a/gcc/testsuite/gcc.target/i386/l_fma_double_2.c
+++ b/gcc/testsuite/gcc.target/i386/l_fma_double_2.c
@@ -13,7 +13,7 @@ typedef double adouble __attribute__((aligned(sizeof (double))));
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+pd" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 88 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 88 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 88 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 88 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 80 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 80 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 80 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 80 } } */
diff --git a/gcc/testsuite/gcc.target/i386/l_fma_double_3.c b/gcc/testsuite/gcc.target/i386/l_fma_double_3.c
index d1ac6a5..7677ecb 100644
--- a/gcc/testsuite/gcc.target/i386/l_fma_double_3.c
+++ b/gcc/testsuite/gcc.target/i386/l_fma_double_3.c
@@ -13,7 +13,7 @@ typedef double adouble __attribute__((aligned(sizeof (double))));
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+pd" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 88 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 88 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 88 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 88 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 80 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 80 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 80 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 80 } } */
diff --git a/gcc/testsuite/gcc.target/i386/l_fma_double_4.c b/gcc/testsuite/gcc.target/i386/l_fma_double_4.c
index 58cd272..e5de796 100644
--- a/gcc/testsuite/gcc.target/i386/l_fma_double_4.c
+++ b/gcc/testsuite/gcc.target/i386/l_fma_double_4.c
@@ -13,7 +13,7 @@ typedef double adouble __attribute__((aligned(sizeof (double))));
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+pd" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 88 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 88 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 88 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 88 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 80 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 80 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 80 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 80 } } */
diff --git a/gcc/testsuite/gcc.target/i386/l_fma_double_5.c b/gcc/testsuite/gcc.target/i386/l_fma_double_5.c
index 6005a18..dfeb96b 100644
--- a/gcc/testsuite/gcc.target/i386/l_fma_double_5.c
+++ b/gcc/testsuite/gcc.target/i386/l_fma_double_5.c
@@ -13,7 +13,7 @@ typedef double adouble __attribute__((aligned(sizeof (double))));
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+pd" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 88 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 88 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 88 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 88 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 80 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 80 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 80 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 80 } } */
diff --git a/gcc/testsuite/gcc.target/i386/l_fma_double_6.c b/gcc/testsuite/gcc.target/i386/l_fma_double_6.c
index 3289baa..13cbd7a 100644
--- a/gcc/testsuite/gcc.target/i386/l_fma_double_6.c
+++ b/gcc/testsuite/gcc.target/i386/l_fma_double_6.c
@@ -13,7 +13,7 @@ typedef double adouble __attribute__((aligned(sizeof (double))));
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+pd" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+pd" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 88 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 88 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 88 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 88 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+sd" 80 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+sd" 80 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+sd" 80 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+sd" 80 } } */
diff --git a/gcc/testsuite/gcc.target/i386/l_fma_float_1.c b/gcc/testsuite/gcc.target/i386/l_fma_float_1.c
index 8ecf81c..f20c009 100644
--- a/gcc/testsuite/gcc.target/i386/l_fma_float_1.c
+++ b/gcc/testsuite/gcc.target/i386/l_fma_float_1.c
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+ps" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 184 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 184 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 184 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 184 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 176 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 176 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 176 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 176 } } */
diff --git a/gcc/testsuite/gcc.target/i386/l_fma_float_2.c b/gcc/testsuite/gcc.target/i386/l_fma_float_2.c
index a0cb9c7..ce4efe9 100644
--- a/gcc/testsuite/gcc.target/i386/l_fma_float_2.c
+++ b/gcc/testsuite/gcc.target/i386/l_fma_float_2.c
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+ps" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 184 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 184 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 184 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 184 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 176 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 176 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 176 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 176 } } */
diff --git a/gcc/testsuite/gcc.target/i386/l_fma_float_3.c b/gcc/testsuite/gcc.target/i386/l_fma_float_3.c
index 9045ce4..91b953a 100644
--- a/gcc/testsuite/gcc.target/i386/l_fma_float_3.c
+++ b/gcc/testsuite/gcc.target/i386/l_fma_float_3.c
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+ps" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 184 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 184 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 184 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 184 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 176 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 176 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 176 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 176 } } */
diff --git a/gcc/testsuite/gcc.target/i386/l_fma_float_4.c b/gcc/testsuite/gcc.target/i386/l_fma_float_4.c
index 3a55211..447be12 100644
--- a/gcc/testsuite/gcc.target/i386/l_fma_float_4.c
+++ b/gcc/testsuite/gcc.target/i386/l_fma_float_4.c
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+ps" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 184 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 184 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 184 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 184 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 176 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 176 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 176 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 176 } } */
diff --git a/gcc/testsuite/gcc.target/i386/l_fma_float_5.c b/gcc/testsuite/gcc.target/i386/l_fma_float_5.c
index 6e5cbea..99a5b96 100644
--- a/gcc/testsuite/gcc.target/i386/l_fma_float_5.c
+++ b/gcc/testsuite/gcc.target/i386/l_fma_float_5.c
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+ps" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 184 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 184 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 184 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 184 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 176 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 176 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 176 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 176 } } */
diff --git a/gcc/testsuite/gcc.target/i386/l_fma_float_6.c b/gcc/testsuite/gcc.target/i386/l_fma_float_6.c
index bf4edcf..02f4b10 100644
--- a/gcc/testsuite/gcc.target/i386/l_fma_float_6.c
+++ b/gcc/testsuite/gcc.target/i386/l_fma_float_6.c
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times "vfmsub\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmadd\[123\]+ps" 8 } } */
 /* { dg-final { scan-assembler-times "vfnmsub\[123\]+ps" 8 } } */
-/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 184 } } */
-/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 184 } } */
-/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 184 } } */
-/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 184 } } */
+/* { dg-final { scan-assembler-times "vfmadd\[123\]+ss" 176 } } */
+/* { dg-final { scan-assembler-times "vfmsub\[123\]+ss" 176 } } */
+/* { dg-final { scan-assembler-times "vfnmadd\[123\]+ss" 176 } } */
+/* { dg-final { scan-assembler-times "vfnmsub\[123\]+ss" 176 } } */

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH PR78005]Fix miscompare issue by computing correct guard condition for vectorized loop
  2016-10-19 10:39 [PATCH PR78005]Fix miscompare issue by computing correct guard condition for vectorized loop Bin Cheng
@ 2016-10-19 10:41 ` Richard Biener
  2017-06-10  9:40 ` Richard Sandiford
  1 sibling, 0 replies; 9+ messages in thread
From: Richard Biener @ 2016-10-19 10:41 UTC (permalink / raw)
  To: Bin Cheng; +Cc: gcc-patches, nd

On Wed, Oct 19, 2016 at 12:38 PM, Bin Cheng <Bin.Cheng@arm.com> wrote:
> Hi,
> This is the patch fixing spec mis-compare issue reported in PR78005.  In the original patch, boundary condition was mis-handled when generating guard checking if vectorized/scalar loop should be preferred.  This isn't an issue in general, but in cases we need to peel for gaps, as well as skipping branch of epilogue loop can be optimized out, it could result in more iterations being executed.  That's why the mis-compare issue happens.  This patch corrects the bug, also cleans up existing code that computes loop niters and its upper bound.  The refactoring improves niters bound information for epilogue loop, thus the patch adjusts related tests.  An execution test is also added.
>
> With the patch, mis-compare issue is fixed.  Bootstrap and test on x86_64 and AArch64.  Is it OK?

Ok.

Thanks,
Richard.

> Thanks,
> bin
>
> 2016-10-18  Bin Cheng  <bin.cheng@arm.com>
>
>         PR tree-optimization/78005
>         * tree-vect-loop-manip.c (vect_gen_prolog_loop_niters): Compute
>         upper (included) bound for niters of prolog loop.
>         (vect_gen_scalar_loop_niters): Change parameter VF to VFM1.
>         Compute niters of scalar loop above which vectorized loop is
>         preferred, as well as the upper (included) bound for the niters.
>         (vect_do_peeling): Record niter bound for loops accordingly.
>
> gcc/testsuite/ChangeLog
> 2016-10-18  Bin Cheng  <bin.cheng@arm.com>
>
>         PR tree-optimization/78005
>         * gcc.dg/vect/pr78005.c: New.
>         * gcc.target/i386/l_fma_float_1.c: Revise test.
>         * gcc.target/i386/l_fma_float_2.c: Ditto.
>         * gcc.target/i386/l_fma_float_3.c: Ditto.
>         * gcc.target/i386/l_fma_float_4.c: Ditto.
>         * gcc.target/i386/l_fma_float_5.c: Ditto.
>         * gcc.target/i386/l_fma_float_6.c: Ditto.
>         * gcc.target/i386/l_fma_double_1.c: Ditto.
>         * gcc.target/i386/l_fma_double_2.c: Ditto.
>         * gcc.target/i386/l_fma_double_3.c: Ditto.
>         * gcc.target/i386/l_fma_double_4.c: Ditto.
>         * gcc.target/i386/l_fma_double_5.c: Ditto.
>         * gcc.target/i386/l_fma_double_6.c: Ditto.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH PR78005]Fix miscompare issue by computing correct guard condition for vectorized loop
  2016-10-19 10:39 [PATCH PR78005]Fix miscompare issue by computing correct guard condition for vectorized loop Bin Cheng
  2016-10-19 10:41 ` Richard Biener
@ 2017-06-10  9:40 ` Richard Sandiford
  2017-06-11 22:24   ` Bin.Cheng
  1 sibling, 1 reply; 9+ messages in thread
From: Richard Sandiford @ 2017-06-10  9:40 UTC (permalink / raw)
  To: Bin Cheng; +Cc: gcc-patches, nd

Sorry to return this old patch, but:

Bin Cheng <Bin.Cheng@arm.com> writes:
> -/* Calculate the number of iterations under which scalar loop will be
> -   preferred than vectorized loop.  NITERS_PROLOG is the number of
> -   iterations of prolog loop.  If it's integer const, the integer
> -   number is also passed by INT_NITERS_PROLOG.  VF is vector factor;
> -   TH is the threshold for vectorized loop if CHECK_PROFITABILITY is
> -   true.  This function also store upper bound of the result in BOUND.  */
> +/* Calculate the number of iterations above which vectorized loop will be
> +   preferred than scalar loop.  NITERS_PROLOG is the number of iterations
> +   of prolog loop.  If it's integer const, the integer number is also passed
> +   in INT_NITERS_PROLOG.  BOUND_PROLOG is the upper bound (included) of
> +   number of iterations of prolog loop.  VFM1 is vector factor minus one.
> +   If CHECK_PROFITABILITY is true, TH is the threshold below which scalar
> +   (rather than vectorized) loop will be executed.  This function stores
> +   upper bound (included) of the result in BOUND_SCALAR.  */
>  
>  static tree
>  vect_gen_scalar_loop_niters (tree niters_prolog, int int_niters_prolog,
> -			     int bound_prolog, int vf, int th, int *bound,
> -			     bool check_profitability)
> +			     int bound_prolog, int vfm1, int th,
> +			     int *bound_scalar, bool check_profitability)
>  {
>    tree type = TREE_TYPE (niters_prolog);
>    tree niters = fold_build2 (PLUS_EXPR, type, niters_prolog,
> -			     build_int_cst (type, vf));
> +			     build_int_cst (type, vfm1));
>  
> -  *bound = vf + bound_prolog;
> +  *bound_scalar = vfm1 + bound_prolog;
>    if (check_profitability)
>      {
> -      th++;
> +      /* TH indicates the minimum niters of vectorized loop, while we
> +	 compute the maximum niters of scalar loop.  */
> +      th--;

Are you sure about this last change?  It looks like it should be dropping
the increment rather than replacing it with a decrement.

It looks like the threshold is already the maximum niters for the scalar
loop.  It's set by:

  min_scalar_loop_bound = ((PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)
			    * vectorization_factor) - 1);

  /* Use the cost model only if it is more conservative than user specified
     threshold.  */
  th = (unsigned) min_scalar_loop_bound;
  if (min_profitable_iters
      && (!min_scalar_loop_bound
          || min_profitable_iters > min_scalar_loop_bound))
    th = (unsigned) min_profitable_iters;

  LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = th;

(Note the "- 1" in the min_scalar_loop_bound.  The multiplication result
is the minimum niters for the vector loop.)

min_profitable_iters sounds like it _ought_ to be the minimum niters for
which the vector loop is used, but vect_estimate_min_profitable_iters
instead returns the largest niters for which the scalar loop should be
preferred:

  /* Cost model disabled.  */
  if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
    {
      dump_printf_loc (MSG_NOTE, vect_location, "cost model disabled.\n");
      *ret_min_profitable_niters = 0;
      *ret_min_profitable_estimate = 0;
      return;
    }
  [...]
  /* Because the condition we create is:
     if (niters <= min_profitable_iters)
       then skip the vectorized loop.  */
  min_profitable_iters--;
  [...]
  min_profitable_estimate --;

Also, when deciding whether the threshold needs to be used, we have:

  th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
  if (th >= LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1
      && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
    {
      if (dump_enabled_p ())
	dump_printf_loc (MSG_NOTE, vect_location,
			 "Profitability threshold is %d loop iterations.\n",
                         th);
      check_profitability = true;
    }

where again the "- 1" seems to assume that the threshold is already the
maximum niters for the scalar loop.  (Although checking for == seems a bit
pointless too.)

The corresponding profitability check for loop versioning still assumes
that th is the maximum niters for the scalar loop:

  if (check_profitability)
    cond_expr = fold_build2 (GT_EXPR, boolean_type_node, scalar_loop_iters,
			     build_int_cst (TREE_TYPE (scalar_loop_iters),
						       th));

since we use GT_EXPR instead of GE_EXPR.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH PR78005]Fix miscompare issue by computing correct guard condition for vectorized loop
  2017-06-10  9:40 ` Richard Sandiford
@ 2017-06-11 22:24   ` Bin.Cheng
  2017-06-12  8:19     ` Richard Sandiford
  0 siblings, 1 reply; 9+ messages in thread
From: Bin.Cheng @ 2017-06-11 22:24 UTC (permalink / raw)
  To: gcc-patches, Richard Sandiford, Richard Biener

On Sat, Jun 10, 2017 at 10:40 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Sorry to return this old patch, but:
>
> Bin Cheng <Bin.Cheng@arm.com> writes:
>> -/* Calculate the number of iterations under which scalar loop will be
>> -   preferred than vectorized loop.  NITERS_PROLOG is the number of
>> -   iterations of prolog loop.  If it's integer const, the integer
>> -   number is also passed by INT_NITERS_PROLOG.  VF is vector factor;
>> -   TH is the threshold for vectorized loop if CHECK_PROFITABILITY is
>> -   true.  This function also store upper bound of the result in BOUND.  */
>> +/* Calculate the number of iterations above which vectorized loop will be
>> +   preferred than scalar loop.  NITERS_PROLOG is the number of iterations
>> +   of prolog loop.  If it's integer const, the integer number is also passed
>> +   in INT_NITERS_PROLOG.  BOUND_PROLOG is the upper bound (included) of
>> +   number of iterations of prolog loop.  VFM1 is vector factor minus one.
>> +   If CHECK_PROFITABILITY is true, TH is the threshold below which scalar
>> +   (rather than vectorized) loop will be executed.  This function stores
>> +   upper bound (included) of the result in BOUND_SCALAR.  */
>>
>>  static tree
>>  vect_gen_scalar_loop_niters (tree niters_prolog, int int_niters_prolog,
>> -                          int bound_prolog, int vf, int th, int *bound,
>> -                          bool check_profitability)
>> +                          int bound_prolog, int vfm1, int th,
>> +                          int *bound_scalar, bool check_profitability)
>>  {
>>    tree type = TREE_TYPE (niters_prolog);
>>    tree niters = fold_build2 (PLUS_EXPR, type, niters_prolog,
>> -                          build_int_cst (type, vf));
>> +                          build_int_cst (type, vfm1));
>>
>> -  *bound = vf + bound_prolog;
>> +  *bound_scalar = vfm1 + bound_prolog;
>>    if (check_profitability)
>>      {
>> -      th++;
>> +      /* TH indicates the minimum niters of vectorized loop, while we
>> +      compute the maximum niters of scalar loop.  */
>> +      th--;
>
> Are you sure about this last change?  It looks like it should be dropping
Hi Richard,
Thanks for spotting this.  I vaguely remember I got this from the way
how niter/th was checked in previous peeling code, but did't double
check it now.  I tend to believe there is inconsistence about th,
especially with comment like:

  /* Threshold of number of iterations below which vectorzation will not be
     performed. It is calculated from MIN_PROFITABLE_ITERS and
     PARAM_MIN_VECT_LOOP_BOUND. */
  unsigned int th;

I also tend to believe the inconsistence was introduced partly because
niters in vectorizer stands for latch_niters + 1, while latch_niters
in rest of the compiler.

and...,

> the increment rather than replacing it with a decrement.
>
> It looks like the threshold is already the maximum niters for the scalar
> loop.  It's set by:
>
>   min_scalar_loop_bound = ((PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)
>                             * vectorization_factor) - 1);
>
>   /* Use the cost model only if it is more conservative than user specified
>      threshold.  */
>   th = (unsigned) min_scalar_loop_bound;
>   if (min_profitable_iters
>       && (!min_scalar_loop_bound
>           || min_profitable_iters > min_scalar_loop_bound))
>     th = (unsigned) min_profitable_iters;
>
>   LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = th;
>
> (Note the "- 1" in the min_scalar_loop_bound.  The multiplication result
> is the minimum niters for the vector loop.)
To be honest, min_scalar_loop_bound is more likely for something's
lower bound which is the niters for the vector loop.  If it refers to
the niters scalar loop, it is in actuality the "max" value we should
use.  I am not quite sure here, partly because I am not a native
speaker.

>
> min_profitable_iters sounds like it _ought_ to be the minimum niters for
> which the vector loop is used, but vect_estimate_min_profitable_iters
> instead returns the largest niters for which the scalar loop should be
> preferred:
>
>   /* Cost model disabled.  */
>   if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
>     {
>       dump_printf_loc (MSG_NOTE, vect_location, "cost model disabled.\n");
>       *ret_min_profitable_niters = 0;
>       *ret_min_profitable_estimate = 0;
>       return;
>     }
>   [...]
>   /* Because the condition we create is:
>      if (niters <= min_profitable_iters)
>        then skip the vectorized loop.  */
>   min_profitable_iters--;
>   [...]
>   min_profitable_estimate --;
>
> Also, when deciding whether the threshold needs to be used, we have:
>
>   th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
>   if (th >= LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1
>       && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
>     {
>       if (dump_enabled_p ())
>         dump_printf_loc (MSG_NOTE, vect_location,
>                          "Profitability threshold is %d loop iterations.\n",
>                          th);
>       check_profitability = true;
>     }
Yes, this indeed indicates that th refers to the maximum niters of the
scalar loop.  Anyway, we should resolve the inconsistence and make it
more clear in variable names etc..

Thanks,
bin
>
> where again the "- 1" seems to assume that the threshold is already the
> maximum niters for the scalar loop.  (Although checking for == seems a bit
> pointless too.)
>
> The corresponding profitability check for loop versioning still assumes
> that th is the maximum niters for the scalar loop:
>
>   if (check_profitability)
>     cond_expr = fold_build2 (GT_EXPR, boolean_type_node, scalar_loop_iters,
>                              build_int_cst (TREE_TYPE (scalar_loop_iters),
>                                                        th));
>
> since we use GT_EXPR instead of GE_EXPR.
>
> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH PR78005]Fix miscompare issue by computing correct guard condition for vectorized loop
  2017-06-11 22:24   ` Bin.Cheng
@ 2017-06-12  8:19     ` Richard Sandiford
  2017-06-12  8:32       ` Bin.Cheng
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Sandiford @ 2017-06-12  8:19 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: gcc-patches, Richard Biener

"Bin.Cheng" <amker.cheng@gmail.com> writes:
> On Sat, Jun 10, 2017 at 10:40 AM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Sorry to return this old patch, but:
>>
>> Bin Cheng <Bin.Cheng@arm.com> writes:
>>> -/* Calculate the number of iterations under which scalar loop will be
>>> -   preferred than vectorized loop.  NITERS_PROLOG is the number of
>>> -   iterations of prolog loop.  If it's integer const, the integer
>>> -   number is also passed by INT_NITERS_PROLOG.  VF is vector factor;
>>> -   TH is the threshold for vectorized loop if CHECK_PROFITABILITY is
>>> -   true.  This function also store upper bound of the result in BOUND.  */
>>> +/* Calculate the number of iterations above which vectorized loop will be
>>> +   preferred than scalar loop.  NITERS_PROLOG is the number of iterations
>>> +   of prolog loop.  If it's integer const, the integer number is also passed
>>> +   in INT_NITERS_PROLOG.  BOUND_PROLOG is the upper bound (included) of
>>> +   number of iterations of prolog loop.  VFM1 is vector factor minus one.
>>> +   If CHECK_PROFITABILITY is true, TH is the threshold below which scalar
>>> +   (rather than vectorized) loop will be executed.  This function stores
>>> +   upper bound (included) of the result in BOUND_SCALAR.  */
>>>
>>>  static tree
>>>  vect_gen_scalar_loop_niters (tree niters_prolog, int int_niters_prolog,
>>> -                          int bound_prolog, int vf, int th, int *bound,
>>> -                          bool check_profitability)
>>> +                          int bound_prolog, int vfm1, int th,
>>> +                          int *bound_scalar, bool check_profitability)
>>>  {
>>>    tree type = TREE_TYPE (niters_prolog);
>>>    tree niters = fold_build2 (PLUS_EXPR, type, niters_prolog,
>>> -                          build_int_cst (type, vf));
>>> +                          build_int_cst (type, vfm1));
>>>
>>> -  *bound = vf + bound_prolog;
>>> +  *bound_scalar = vfm1 + bound_prolog;
>>>    if (check_profitability)
>>>      {
>>> -      th++;
>>> +      /* TH indicates the minimum niters of vectorized loop, while we
>>> +      compute the maximum niters of scalar loop.  */
>>> +      th--;
>>
>> Are you sure about this last change?  It looks like it should be dropping
> Hi Richard,
> Thanks for spotting this.  I vaguely remember I got this from the way
> how niter/th was checked in previous peeling code, but did't double
> check it now.  I tend to believe there is inconsistence about th,
> especially with comment like:
>
>   /* Threshold of number of iterations below which vectorzation will not be
>      performed. It is calculated from MIN_PROFITABLE_ITERS and
>      PARAM_MIN_VECT_LOOP_BOUND. */
>   unsigned int th;
>
> I also tend to believe the inconsistence was introduced partly because
> niters in vectorizer stands for latch_niters + 1, while latch_niters
> in rest of the compiler.
>
> and...,
>
>> the increment rather than replacing it with a decrement.
>>
>> It looks like the threshold is already the maximum niters for the scalar
>> loop.  It's set by:
>>
>>   min_scalar_loop_bound = ((PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)
>>                             * vectorization_factor) - 1);
>>
>>   /* Use the cost model only if it is more conservative than user specified
>>      threshold.  */
>>   th = (unsigned) min_scalar_loop_bound;
>>   if (min_profitable_iters
>>       && (!min_scalar_loop_bound
>>           || min_profitable_iters > min_scalar_loop_bound))
>>     th = (unsigned) min_profitable_iters;
>>
>>   LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = th;
>>
>> (Note the "- 1" in the min_scalar_loop_bound.  The multiplication result
>> is the minimum niters for the vector loop.)
> To be honest, min_scalar_loop_bound is more likely for something's
> lower bound which is the niters for the vector loop.  If it refers to
> the niters scalar loop, it is in actuality the "max" value we should
> use.  I am not quite sure here, partly because I am not a native
> speaker.
>
>>
>> min_profitable_iters sounds like it _ought_ to be the minimum niters for
>> which the vector loop is used, but vect_estimate_min_profitable_iters
>> instead returns the largest niters for which the scalar loop should be
>> preferred:
>>
>>   /* Cost model disabled.  */
>>   if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
>>     {
>>       dump_printf_loc (MSG_NOTE, vect_location, "cost model disabled.\n");
>>       *ret_min_profitable_niters = 0;
>>       *ret_min_profitable_estimate = 0;
>>       return;
>>     }
>>   [...]
>>   /* Because the condition we create is:
>>      if (niters <= min_profitable_iters)
>>        then skip the vectorized loop.  */
>>   min_profitable_iters--;
>>   [...]
>>   min_profitable_estimate --;
>>
>> Also, when deciding whether the threshold needs to be used, we have:
>>
>>   th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
>>   if (th >= LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1
>>       && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
>>     {
>>       if (dump_enabled_p ())
>>         dump_printf_loc (MSG_NOTE, vect_location,
>>                          "Profitability threshold is %d loop iterations.\n",
>>                          th);
>>       check_profitability = true;
>>     }
> Yes, this indeed indicates that th refers to the maximum niters of the
> scalar loop.  Anyway, we should resolve the inconsistence and make it
> more clear in variable names etc..

Yeah, agree the variable names are really confusing.  Given the choice
between changing the names to match the current meaning and changing
the meaning to match the current names, the latter looks like it would
give cleaner code.  I'm happy to do that it that sounds like the way
to go.

FWIW, I came across this while trying make sense of the "+ 1" in:

  /* Decide whether we need to create an epilogue loop to handle
     remaining scalar iterations.  */
  th = ((LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) + 1)
        / LOOP_VINFO_VECT_FACTOR (loop_vinfo))
       * LOOP_VINFO_VECT_FACTOR (loop_vinfo);

Thanks,
Richard

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH PR78005]Fix miscompare issue by computing correct guard condition for vectorized loop
  2017-06-12  8:19     ` Richard Sandiford
@ 2017-06-12  8:32       ` Bin.Cheng
  2017-06-12  8:44         ` Richard Biener
  0 siblings, 1 reply; 9+ messages in thread
From: Bin.Cheng @ 2017-06-12  8:32 UTC (permalink / raw)
  To: Bin.Cheng, gcc-patches, Richard Biener, Richard Sandiford

On Mon, Jun 12, 2017 at 9:19 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> "Bin.Cheng" <amker.cheng@gmail.com> writes:
>> On Sat, Jun 10, 2017 at 10:40 AM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> Sorry to return this old patch, but:
>>>
>>> Bin Cheng <Bin.Cheng@arm.com> writes:
>>>> -/* Calculate the number of iterations under which scalar loop will be
>>>> -   preferred than vectorized loop.  NITERS_PROLOG is the number of
>>>> -   iterations of prolog loop.  If it's integer const, the integer
>>>> -   number is also passed by INT_NITERS_PROLOG.  VF is vector factor;
>>>> -   TH is the threshold for vectorized loop if CHECK_PROFITABILITY is
>>>> -   true.  This function also store upper bound of the result in BOUND.  */
>>>> +/* Calculate the number of iterations above which vectorized loop will be
>>>> +   preferred than scalar loop.  NITERS_PROLOG is the number of iterations
>>>> +   of prolog loop.  If it's integer const, the integer number is also passed
>>>> +   in INT_NITERS_PROLOG.  BOUND_PROLOG is the upper bound (included) of
>>>> +   number of iterations of prolog loop.  VFM1 is vector factor minus one.
>>>> +   If CHECK_PROFITABILITY is true, TH is the threshold below which scalar
>>>> +   (rather than vectorized) loop will be executed.  This function stores
>>>> +   upper bound (included) of the result in BOUND_SCALAR.  */
>>>>
>>>>  static tree
>>>>  vect_gen_scalar_loop_niters (tree niters_prolog, int int_niters_prolog,
>>>> -                          int bound_prolog, int vf, int th, int *bound,
>>>> -                          bool check_profitability)
>>>> +                          int bound_prolog, int vfm1, int th,
>>>> +                          int *bound_scalar, bool check_profitability)
>>>>  {
>>>>    tree type = TREE_TYPE (niters_prolog);
>>>>    tree niters = fold_build2 (PLUS_EXPR, type, niters_prolog,
>>>> -                          build_int_cst (type, vf));
>>>> +                          build_int_cst (type, vfm1));
>>>>
>>>> -  *bound = vf + bound_prolog;
>>>> +  *bound_scalar = vfm1 + bound_prolog;
>>>>    if (check_profitability)
>>>>      {
>>>> -      th++;
>>>> +      /* TH indicates the minimum niters of vectorized loop, while we
>>>> +      compute the maximum niters of scalar loop.  */
>>>> +      th--;
>>>
>>> Are you sure about this last change?  It looks like it should be dropping
>> Hi Richard,
>> Thanks for spotting this.  I vaguely remember I got this from the way
>> how niter/th was checked in previous peeling code, but did't double
>> check it now.  I tend to believe there is inconsistence about th,
>> especially with comment like:
>>
>>   /* Threshold of number of iterations below which vectorzation will not be
>>      performed. It is calculated from MIN_PROFITABLE_ITERS and
>>      PARAM_MIN_VECT_LOOP_BOUND. */
>>   unsigned int th;
>>
>> I also tend to believe the inconsistence was introduced partly because
>> niters in vectorizer stands for latch_niters + 1, while latch_niters
>> in rest of the compiler.
>>
>> and...,
>>
>>> the increment rather than replacing it with a decrement.
>>>
>>> It looks like the threshold is already the maximum niters for the scalar
>>> loop.  It's set by:
>>>
>>>   min_scalar_loop_bound = ((PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)
>>>                             * vectorization_factor) - 1);
>>>
>>>   /* Use the cost model only if it is more conservative than user specified
>>>      threshold.  */
>>>   th = (unsigned) min_scalar_loop_bound;
>>>   if (min_profitable_iters
>>>       && (!min_scalar_loop_bound
>>>           || min_profitable_iters > min_scalar_loop_bound))
>>>     th = (unsigned) min_profitable_iters;
>>>
>>>   LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = th;
>>>
>>> (Note the "- 1" in the min_scalar_loop_bound.  The multiplication result
>>> is the minimum niters for the vector loop.)
>> To be honest, min_scalar_loop_bound is more likely for something's
>> lower bound which is the niters for the vector loop.  If it refers to
>> the niters scalar loop, it is in actuality the "max" value we should
>> use.  I am not quite sure here, partly because I am not a native
>> speaker.
>>
>>>
>>> min_profitable_iters sounds like it _ought_ to be the minimum niters for
>>> which the vector loop is used, but vect_estimate_min_profitable_iters
>>> instead returns the largest niters for which the scalar loop should be
>>> preferred:
>>>
>>>   /* Cost model disabled.  */
>>>   if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
>>>     {
>>>       dump_printf_loc (MSG_NOTE, vect_location, "cost model disabled.\n");
>>>       *ret_min_profitable_niters = 0;
>>>       *ret_min_profitable_estimate = 0;
>>>       return;
>>>     }
>>>   [...]
>>>   /* Because the condition we create is:
>>>      if (niters <= min_profitable_iters)
>>>        then skip the vectorized loop.  */
>>>   min_profitable_iters--;
>>>   [...]
>>>   min_profitable_estimate --;
>>>
>>> Also, when deciding whether the threshold needs to be used, we have:
>>>
>>>   th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
>>>   if (th >= LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1
>>>       && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
>>>     {
>>>       if (dump_enabled_p ())
>>>         dump_printf_loc (MSG_NOTE, vect_location,
>>>                          "Profitability threshold is %d loop iterations.\n",
>>>                          th);
>>>       check_profitability = true;
>>>     }
>> Yes, this indeed indicates that th refers to the maximum niters of the
>> scalar loop.  Anyway, we should resolve the inconsistence and make it
>> more clear in variable names etc..
>
> Yeah, agree the variable names are really confusing.  Given the choice
> between changing the names to match the current meaning and changing
> the meaning to match the current names, the latter looks like it would
> give cleaner code.  I'm happy to do that it that sounds like the way
> to go.
I am all for this direction, but Richard B may have some to say here.
BTW, will you also fix the other inconsistence you found?

Thanks,
bin

>
> FWIW, I came across this while trying make sense of the "+ 1" in:
>
>   /* Decide whether we need to create an epilogue loop to handle
>      remaining scalar iterations.  */
>   th = ((LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) + 1)
>         / LOOP_VINFO_VECT_FACTOR (loop_vinfo))
>        * LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>
> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH PR78005]Fix miscompare issue by computing correct guard condition for vectorized loop
  2017-06-12  8:32       ` Bin.Cheng
@ 2017-06-12  8:44         ` Richard Biener
  2017-07-03  8:07           ` Richard Sandiford
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Biener @ 2017-06-12  8:44 UTC (permalink / raw)
  To: Bin.Cheng; +Cc: gcc-patches, Richard Sandiford

On Mon, 12 Jun 2017, Bin.Cheng wrote:

> On Mon, Jun 12, 2017 at 9:19 AM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
> > "Bin.Cheng" <amker.cheng@gmail.com> writes:
> >> On Sat, Jun 10, 2017 at 10:40 AM, Richard Sandiford
> >> <richard.sandiford@linaro.org> wrote:
> >>> Sorry to return this old patch, but:
> >>>
> >>> Bin Cheng <Bin.Cheng@arm.com> writes:
> >>>> -/* Calculate the number of iterations under which scalar loop will be
> >>>> -   preferred than vectorized loop.  NITERS_PROLOG is the number of
> >>>> -   iterations of prolog loop.  If it's integer const, the integer
> >>>> -   number is also passed by INT_NITERS_PROLOG.  VF is vector factor;
> >>>> -   TH is the threshold for vectorized loop if CHECK_PROFITABILITY is
> >>>> -   true.  This function also store upper bound of the result in BOUND.  */
> >>>> +/* Calculate the number of iterations above which vectorized loop will be
> >>>> +   preferred than scalar loop.  NITERS_PROLOG is the number of iterations
> >>>> +   of prolog loop.  If it's integer const, the integer number is also passed
> >>>> +   in INT_NITERS_PROLOG.  BOUND_PROLOG is the upper bound (included) of
> >>>> +   number of iterations of prolog loop.  VFM1 is vector factor minus one.
> >>>> +   If CHECK_PROFITABILITY is true, TH is the threshold below which scalar
> >>>> +   (rather than vectorized) loop will be executed.  This function stores
> >>>> +   upper bound (included) of the result in BOUND_SCALAR.  */
> >>>>
> >>>>  static tree
> >>>>  vect_gen_scalar_loop_niters (tree niters_prolog, int int_niters_prolog,
> >>>> -                          int bound_prolog, int vf, int th, int *bound,
> >>>> -                          bool check_profitability)
> >>>> +                          int bound_prolog, int vfm1, int th,
> >>>> +                          int *bound_scalar, bool check_profitability)
> >>>>  {
> >>>>    tree type = TREE_TYPE (niters_prolog);
> >>>>    tree niters = fold_build2 (PLUS_EXPR, type, niters_prolog,
> >>>> -                          build_int_cst (type, vf));
> >>>> +                          build_int_cst (type, vfm1));
> >>>>
> >>>> -  *bound = vf + bound_prolog;
> >>>> +  *bound_scalar = vfm1 + bound_prolog;
> >>>>    if (check_profitability)
> >>>>      {
> >>>> -      th++;
> >>>> +      /* TH indicates the minimum niters of vectorized loop, while we
> >>>> +      compute the maximum niters of scalar loop.  */
> >>>> +      th--;
> >>>
> >>> Are you sure about this last change?  It looks like it should be dropping
> >> Hi Richard,
> >> Thanks for spotting this.  I vaguely remember I got this from the way
> >> how niter/th was checked in previous peeling code, but did't double
> >> check it now.  I tend to believe there is inconsistence about th,
> >> especially with comment like:
> >>
> >>   /* Threshold of number of iterations below which vectorzation will not be
> >>      performed. It is calculated from MIN_PROFITABLE_ITERS and
> >>      PARAM_MIN_VECT_LOOP_BOUND. */
> >>   unsigned int th;
> >>
> >> I also tend to believe the inconsistence was introduced partly because
> >> niters in vectorizer stands for latch_niters + 1, while latch_niters
> >> in rest of the compiler.
> >>
> >> and...,
> >>
> >>> the increment rather than replacing it with a decrement.
> >>>
> >>> It looks like the threshold is already the maximum niters for the scalar
> >>> loop.  It's set by:
> >>>
> >>>   min_scalar_loop_bound = ((PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)
> >>>                             * vectorization_factor) - 1);
> >>>
> >>>   /* Use the cost model only if it is more conservative than user specified
> >>>      threshold.  */
> >>>   th = (unsigned) min_scalar_loop_bound;
> >>>   if (min_profitable_iters
> >>>       && (!min_scalar_loop_bound
> >>>           || min_profitable_iters > min_scalar_loop_bound))
> >>>     th = (unsigned) min_profitable_iters;
> >>>
> >>>   LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = th;
> >>>
> >>> (Note the "- 1" in the min_scalar_loop_bound.  The multiplication result
> >>> is the minimum niters for the vector loop.)
> >> To be honest, min_scalar_loop_bound is more likely for something's
> >> lower bound which is the niters for the vector loop.  If it refers to
> >> the niters scalar loop, it is in actuality the "max" value we should
> >> use.  I am not quite sure here, partly because I am not a native
> >> speaker.
> >>
> >>>
> >>> min_profitable_iters sounds like it _ought_ to be the minimum niters for
> >>> which the vector loop is used, but vect_estimate_min_profitable_iters
> >>> instead returns the largest niters for which the scalar loop should be
> >>> preferred:
> >>>
> >>>   /* Cost model disabled.  */
> >>>   if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
> >>>     {
> >>>       dump_printf_loc (MSG_NOTE, vect_location, "cost model disabled.\n");
> >>>       *ret_min_profitable_niters = 0;
> >>>       *ret_min_profitable_estimate = 0;
> >>>       return;
> >>>     }
> >>>   [...]
> >>>   /* Because the condition we create is:
> >>>      if (niters <= min_profitable_iters)
> >>>        then skip the vectorized loop.  */
> >>>   min_profitable_iters--;
> >>>   [...]
> >>>   min_profitable_estimate --;
> >>>
> >>> Also, when deciding whether the threshold needs to be used, we have:
> >>>
> >>>   th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
> >>>   if (th >= LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1
> >>>       && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
> >>>     {
> >>>       if (dump_enabled_p ())
> >>>         dump_printf_loc (MSG_NOTE, vect_location,
> >>>                          "Profitability threshold is %d loop iterations.\n",
> >>>                          th);
> >>>       check_profitability = true;
> >>>     }
> >> Yes, this indeed indicates that th refers to the maximum niters of the
> >> scalar loop.  Anyway, we should resolve the inconsistence and make it
> >> more clear in variable names etc..
> >
> > Yeah, agree the variable names are really confusing.  Given the choice
> > between changing the names to match the current meaning and changing
> > the meaning to match the current names, the latter looks like it would
> > give cleaner code.  I'm happy to do that it that sounds like the way
> > to go.
> I am all for this direction, but Richard B may have some to say here.
> BTW, will you also fix the other inconsistence you found?

I'm all to making the code clearer but I guess it will remain a
tricky area.  Each time I need to change sth there I need half an
hour to recap how all the stuff is tied together...

So please go forward.  Many bonus points if you manage to
create testcases covering the mistakes you fix.

Richard.

> Thanks,
> bin
> 
> >
> > FWIW, I came across this while trying make sense of the "+ 1" in:
> >
> >   /* Decide whether we need to create an epilogue loop to handle
> >      remaining scalar iterations.  */
> >   th = ((LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) + 1)
> >         / LOOP_VINFO_VECT_FACTOR (loop_vinfo))
> >        * LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> >
> > Thanks,
> > Richard
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH PR78005]Fix miscompare issue by computing correct guard condition for vectorized loop
  2017-06-12  8:44         ` Richard Biener
@ 2017-07-03  8:07           ` Richard Sandiford
  2017-07-03  8:43             ` Richard Biener
  0 siblings, 1 reply; 9+ messages in thread
From: Richard Sandiford @ 2017-07-03  8:07 UTC (permalink / raw)
  To: Richard Biener; +Cc: Bin.Cheng, gcc-patches

Richard Biener <rguenther@suse.de> writes:
> On Mon, 12 Jun 2017, Bin.Cheng wrote:
>> On Mon, Jun 12, 2017 at 9:19 AM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>> > "Bin.Cheng" <amker.cheng@gmail.com> writes:
>> >> On Sat, Jun 10, 2017 at 10:40 AM, Richard Sandiford
>> >> <richard.sandiford@linaro.org> wrote:
>> >>> Sorry to return this old patch, but:
>> >>>
>> >>> Bin Cheng <Bin.Cheng@arm.com> writes:
>> >>>> -/* Calculate the number of iterations under which scalar loop will be
>> >>>> -   preferred than vectorized loop.  NITERS_PROLOG is the number of
>> >>>> -   iterations of prolog loop.  If it's integer const, the integer
>> >>>> -   number is also passed by INT_NITERS_PROLOG.  VF is vector factor;
>> >>>> -   TH is the threshold for vectorized loop if CHECK_PROFITABILITY is
>> >>>> -   true.  This function also store upper bound of the result in BOUND.  */
>> >>>> +/* Calculate the number of iterations above which vectorized loop will be
>> >>>> +   preferred than scalar loop.  NITERS_PROLOG is the number of iterations
>> >>>> +   of prolog loop.  If it's integer const, the integer number is also passed
>> >>>> +   in INT_NITERS_PROLOG.  BOUND_PROLOG is the upper bound (included) of
>> >>>> +   number of iterations of prolog loop.  VFM1 is vector factor minus one.
>> >>>> +   If CHECK_PROFITABILITY is true, TH is the threshold below which scalar
>> >>>> +   (rather than vectorized) loop will be executed.  This function stores
>> >>>> +   upper bound (included) of the result in BOUND_SCALAR.  */
>> >>>>
>> >>>>  static tree
>> >>>>  vect_gen_scalar_loop_niters (tree niters_prolog, int int_niters_prolog,
>> >>>> -                          int bound_prolog, int vf, int th, int *bound,
>> >>>> -                          bool check_profitability)
>> >>>> +                          int bound_prolog, int vfm1, int th,
>> >>>> +                          int *bound_scalar, bool check_profitability)
>> >>>>  {
>> >>>>    tree type = TREE_TYPE (niters_prolog);
>> >>>>    tree niters = fold_build2 (PLUS_EXPR, type, niters_prolog,
>> >>>> -                          build_int_cst (type, vf));
>> >>>> +                          build_int_cst (type, vfm1));
>> >>>>
>> >>>> -  *bound = vf + bound_prolog;
>> >>>> +  *bound_scalar = vfm1 + bound_prolog;
>> >>>>    if (check_profitability)
>> >>>>      {
>> >>>> -      th++;
>> >>>> +      /* TH indicates the minimum niters of vectorized loop, while we
>> >>>> +      compute the maximum niters of scalar loop.  */
>> >>>> +      th--;
>> >>>
>> >>> Are you sure about this last change?  It looks like it should be dropping
>> >> Hi Richard,
>> >> Thanks for spotting this.  I vaguely remember I got this from the way
>> >> how niter/th was checked in previous peeling code, but did't double
>> >> check it now.  I tend to believe there is inconsistence about th,
>> >> especially with comment like:
>> >>
>> >>   /* Threshold of number of iterations below which vectorzation will not be
>> >>      performed. It is calculated from MIN_PROFITABLE_ITERS and
>> >>      PARAM_MIN_VECT_LOOP_BOUND. */
>> >>   unsigned int th;
>> >>
>> >> I also tend to believe the inconsistence was introduced partly because
>> >> niters in vectorizer stands for latch_niters + 1, while latch_niters
>> >> in rest of the compiler.
>> >>
>> >> and...,
>> >>
>> >>> the increment rather than replacing it with a decrement.
>> >>>
>> >>> It looks like the threshold is already the maximum niters for the scalar
>> >>> loop.  It's set by:
>> >>>
>> >>>   min_scalar_loop_bound = ((PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)
>> >>>                             * vectorization_factor) - 1);
>> >>>
>> >>>   /* Use the cost model only if it is more conservative than user specified
>> >>>      threshold.  */
>> >>>   th = (unsigned) min_scalar_loop_bound;
>> >>>   if (min_profitable_iters
>> >>>       && (!min_scalar_loop_bound
>> >>>           || min_profitable_iters > min_scalar_loop_bound))
>> >>>     th = (unsigned) min_profitable_iters;
>> >>>
>> >>>   LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = th;
>> >>>
>> >>> (Note the "- 1" in the min_scalar_loop_bound.  The multiplication result
>> >>> is the minimum niters for the vector loop.)
>> >> To be honest, min_scalar_loop_bound is more likely for something's
>> >> lower bound which is the niters for the vector loop.  If it refers to
>> >> the niters scalar loop, it is in actuality the "max" value we should
>> >> use.  I am not quite sure here, partly because I am not a native
>> >> speaker.
>> >>
>> >>>
>> >>> min_profitable_iters sounds like it _ought_ to be the minimum niters for
>> >>> which the vector loop is used, but vect_estimate_min_profitable_iters
>> >>> instead returns the largest niters for which the scalar loop should be
>> >>> preferred:
>> >>>
>> >>>   /* Cost model disabled.  */
>> >>>   if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
>> >>>     {
>> >>>       dump_printf_loc (MSG_NOTE, vect_location, "cost model disabled.\n");
>> >>>       *ret_min_profitable_niters = 0;
>> >>>       *ret_min_profitable_estimate = 0;
>> >>>       return;
>> >>>     }
>> >>>   [...]
>> >>>   /* Because the condition we create is:
>> >>>      if (niters <= min_profitable_iters)
>> >>>        then skip the vectorized loop.  */
>> >>>   min_profitable_iters--;
>> >>>   [...]
>> >>>   min_profitable_estimate --;
>> >>>
>> >>> Also, when deciding whether the threshold needs to be used, we have:
>> >>>
>> >>>   th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
>> >>>   if (th >= LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1
>> >>>       && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
>> >>>     {
>> >>>       if (dump_enabled_p ())
>> >>>         dump_printf_loc (MSG_NOTE, vect_location,
>> >>>                          "Profitability threshold is %d loop iterations.\n",
>> >>>                          th);
>> >>>       check_profitability = true;
>> >>>     }
>> >> Yes, this indeed indicates that th refers to the maximum niters of the
>> >> scalar loop.  Anyway, we should resolve the inconsistence and make it
>> >> more clear in variable names etc..
>> >
>> > Yeah, agree the variable names are really confusing.  Given the choice
>> > between changing the names to match the current meaning and changing
>> > the meaning to match the current names, the latter looks like it would
>> > give cleaner code.  I'm happy to do that it that sounds like the way
>> > to go.
>> I am all for this direction, but Richard B may have some to say here.
>> BTW, will you also fix the other inconsistence you found?
>
> I'm all to making the code clearer but I guess it will remain a
> tricky area.  Each time I need to change sth there I need half an
> hour to recap how all the stuff is tied together...
>
> So please go forward.  Many bonus points if you manage to
> create testcases covering the mistakes you fix.

No testcases, sorry, but I built the patched compiler for
aarch64-linux-gnu, arm-linux-gnueabi, x86_64-linux-gnu and
powerpc64le-linux-gnu, compiled the testsuite with -O3,
and compared the result with:

diff -ur gcc/tree-vect-loop.c /hdd/richards/compare/old/gcc/tree-vect-loop.c
--- gcc/tree-vect-loop.c	2017-07-01 16:18:55.951581540 +0100
+++ /hdd/richards/compare/old/gcc/tree-vect-loop.c	2017-07-01 16:00:20.266729964 +0100
@@ -2247,8 +2247,10 @@
       /* Niters for at least one iteration of vectorized loop.  */
       niters_th += LOOP_VINFO_VECT_FACTOR (loop_vinfo);
       /* One additional iteration because of peeling for gap.  */
-      if (!LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
+      if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
 	niters_th++;
+      /* Convert to maximum number of scalar iters.  */
+      niters_th--;
       if (LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) < niters_th)
 	LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = niters_th;
     }
diff -ur gcc/tree-vect-loop-manip.c /hdd/richards/compare/old/gcc/tree-vect-loop-manip.c
--- gcc/tree-vect-loop-manip.c	2017-07-01 16:18:55.951581540 +0100
+++ /hdd/richards/compare/old/gcc/tree-vect-loop-manip.c	2017-07-01 14:00:26.775542411 +0100
@@ -1143,9 +1143,6 @@
   *bound_scalar = vfm1 + bound_prolog;
   if (check_profitability)
     {
-      /* TH indicates the minimum niters of vectorized loop, while we
-	 compute the maximum niters of scalar loop.  */
-      th--;
       /* Peeling for constant times.  */
       if (int_niters_prolog >= 0)
 	{

which is the patch that would make everything consistent with the
current meaning (where the "minimum number of iterations" for which
vectorisation was profitable were actually the maximum number of
iterations for which vectorisation wasn't profitable).

Also tested normally on aarch64-linux-gnu and x86_64-linux-gnu.
OK to install?

Richard


2017-07-03  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* tree-vect-loop.c (vect_analyze_loop_2): Treat min_scalar_loop_bound,
	min_profitable_iters, and th as inclusive lower bounds.
	Fix LOOP_VINFO_PEELING_FOR_GAPS condition.
	(vect_estimate_min_profitable_iters): Return inclusive lower bounds
	for min_profitable_iters and min_profitable_estimate.
	(vect_transform_loop): Treat th as an inclusive lower bound.
	* tree-vect-loop-manip.c (vect_loop_versioning): Likewise.

Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2017-07-03 08:42:51.149380545 +0100
+++ gcc/tree-vect-loop.c	2017-07-03 08:50:43.507370205 +0100
@@ -2132,21 +2132,17 @@ vect_analyze_loop_2 (loop_vec_info loop_
       goto again;
     }
 
-  min_scalar_loop_bound = ((PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)
-			    * vectorization_factor) - 1);
+  min_scalar_loop_bound = (PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)
+			   * vectorization_factor);
 
   /* Use the cost model only if it is more conservative than user specified
      threshold.  */
-  th = (unsigned) min_scalar_loop_bound;
-  if (min_profitable_iters
-      && (!min_scalar_loop_bound
-          || min_profitable_iters > min_scalar_loop_bound))
-    th = (unsigned) min_profitable_iters;
+  th = (unsigned) MAX (min_scalar_loop_bound, min_profitable_iters);
 
   LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = th;
 
   if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
-      && LOOP_VINFO_INT_NITERS (loop_vinfo) <= th)
+      && LOOP_VINFO_INT_NITERS (loop_vinfo) < th)
     {
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -2165,7 +2161,7 @@ vect_analyze_loop_2 (loop_vec_info loop_
     estimated_niter = max_niter;
   if (estimated_niter != -1
       && ((unsigned HOST_WIDE_INT) estimated_niter
-          <= MAX (th, (unsigned)min_profitable_estimate)))
+          < MAX (th, (unsigned) min_profitable_estimate)))
     {
       if (dump_enabled_p ())
 	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -2182,9 +2178,9 @@ vect_analyze_loop_2 (loop_vec_info loop_
 
   /* Decide whether we need to create an epilogue loop to handle
      remaining scalar iterations.  */
-  th = ((LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) + 1)
-        / LOOP_VINFO_VECT_FACTOR (loop_vinfo))
-       * LOOP_VINFO_VECT_FACTOR (loop_vinfo);
+  th = ((LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo)
+	 / LOOP_VINFO_VECT_FACTOR (loop_vinfo))
+	* LOOP_VINFO_VECT_FACTOR (loop_vinfo));
 
   if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
       && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) > 0)
@@ -2247,7 +2243,7 @@ vect_analyze_loop_2 (loop_vec_info loop_
       /* Niters for at least one iteration of vectorized loop.  */
       niters_th += LOOP_VINFO_VECT_FACTOR (loop_vinfo);
       /* One additional iteration because of peeling for gap.  */
-      if (!LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
+      if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
 	niters_th++;
       if (LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) < niters_th)
 	LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = niters_th;
@@ -3545,7 +3541,7 @@ vect_estimate_min_profitable_iters (loop
   if ((scalar_single_iter_cost * vf) > (int) vec_inside_cost)
     {
       if (vec_outside_cost <= 0)
-        min_profitable_iters = 1;
+        min_profitable_iters = 0;
       else
         {
           min_profitable_iters = ((vec_outside_cost - scalar_outside_cost) * vf
@@ -3586,11 +3582,6 @@ vect_estimate_min_profitable_iters (loop
   min_profitable_iters =
 	min_profitable_iters < vf ? vf : min_profitable_iters;
 
-  /* Because the condition we create is:
-     if (niters <= min_profitable_iters)
-       then skip the vectorized loop.  */
-  min_profitable_iters--;
-
   if (dump_enabled_p ())
     dump_printf_loc (MSG_NOTE, vect_location,
                      "  Runtime profitability threshold = %d\n",
@@ -3606,7 +3597,7 @@ vect_estimate_min_profitable_iters (loop
      SIC * niters > VIC * ((niters-PL_ITERS-EP_ITERS)/VF) + VOC + SOC  */
 
   if (vec_outside_cost <= 0)
-    min_profitable_estimate = 1;
+    min_profitable_estimate = 0;
   else
     {
       min_profitable_estimate = ((vec_outside_cost + scalar_outside_cost) * vf
@@ -3615,7 +3606,6 @@ vect_estimate_min_profitable_iters (loop
 				 / ((scalar_single_iter_cost * vf)
 				   - vec_inside_cost);
     }
-  min_profitable_estimate --;
   min_profitable_estimate = MAX (min_profitable_estimate, min_profitable_iters);
   if (dump_enabled_p ())
     dump_printf_loc (MSG_NOTE, vect_location,
@@ -7180,7 +7170,7 @@ vect_transform_loop (loop_vec_info loop_
      run at least the vectorization factor number of times checking
      is pointless, too.  */
   th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
-  if (th >= LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1
+  if (th >= LOOP_VINFO_VECT_FACTOR (loop_vinfo)
       && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
     {
       if (dump_enabled_p ())
Index: gcc/tree-vect-loop-manip.c
===================================================================
--- gcc/tree-vect-loop-manip.c	2017-07-03 08:42:46.863565062 +0100
+++ gcc/tree-vect-loop-manip.c	2017-07-03 08:50:43.506370248 +0100
@@ -2142,7 +2142,7 @@ vect_loop_versioning (loop_vec_info loop
   bool version_niter = LOOP_REQUIRES_VERSIONING_FOR_NITERS (loop_vinfo);
 
   if (check_profitability)
-    cond_expr = fold_build2 (GT_EXPR, boolean_type_node, scalar_loop_iters,
+    cond_expr = fold_build2 (GE_EXPR, boolean_type_node, scalar_loop_iters,
 			     build_int_cst (TREE_TYPE (scalar_loop_iters),
 						       th));
 

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH PR78005]Fix miscompare issue by computing correct guard condition for vectorized loop
  2017-07-03  8:07           ` Richard Sandiford
@ 2017-07-03  8:43             ` Richard Biener
  0 siblings, 0 replies; 9+ messages in thread
From: Richard Biener @ 2017-07-03  8:43 UTC (permalink / raw)
  To: Richard Sandiford; +Cc: Bin.Cheng, gcc-patches

On Mon, 3 Jul 2017, Richard Sandiford wrote:

> Richard Biener <rguenther@suse.de> writes:
> > On Mon, 12 Jun 2017, Bin.Cheng wrote:
> >> On Mon, Jun 12, 2017 at 9:19 AM, Richard Sandiford
> >> <richard.sandiford@linaro.org> wrote:
> >> > "Bin.Cheng" <amker.cheng@gmail.com> writes:
> >> >> On Sat, Jun 10, 2017 at 10:40 AM, Richard Sandiford
> >> >> <richard.sandiford@linaro.org> wrote:
> >> >>> Sorry to return this old patch, but:
> >> >>>
> >> >>> Bin Cheng <Bin.Cheng@arm.com> writes:
> >> >>>> -/* Calculate the number of iterations under which scalar loop will be
> >> >>>> -   preferred than vectorized loop.  NITERS_PROLOG is the number of
> >> >>>> -   iterations of prolog loop.  If it's integer const, the integer
> >> >>>> -   number is also passed by INT_NITERS_PROLOG.  VF is vector factor;
> >> >>>> -   TH is the threshold for vectorized loop if CHECK_PROFITABILITY is
> >> >>>> -   true.  This function also store upper bound of the result in BOUND.  */
> >> >>>> +/* Calculate the number of iterations above which vectorized loop will be
> >> >>>> +   preferred than scalar loop.  NITERS_PROLOG is the number of iterations
> >> >>>> +   of prolog loop.  If it's integer const, the integer number is also passed
> >> >>>> +   in INT_NITERS_PROLOG.  BOUND_PROLOG is the upper bound (included) of
> >> >>>> +   number of iterations of prolog loop.  VFM1 is vector factor minus one.
> >> >>>> +   If CHECK_PROFITABILITY is true, TH is the threshold below which scalar
> >> >>>> +   (rather than vectorized) loop will be executed.  This function stores
> >> >>>> +   upper bound (included) of the result in BOUND_SCALAR.  */
> >> >>>>
> >> >>>>  static tree
> >> >>>>  vect_gen_scalar_loop_niters (tree niters_prolog, int int_niters_prolog,
> >> >>>> -                          int bound_prolog, int vf, int th, int *bound,
> >> >>>> -                          bool check_profitability)
> >> >>>> +                          int bound_prolog, int vfm1, int th,
> >> >>>> +                          int *bound_scalar, bool check_profitability)
> >> >>>>  {
> >> >>>>    tree type = TREE_TYPE (niters_prolog);
> >> >>>>    tree niters = fold_build2 (PLUS_EXPR, type, niters_prolog,
> >> >>>> -                          build_int_cst (type, vf));
> >> >>>> +                          build_int_cst (type, vfm1));
> >> >>>>
> >> >>>> -  *bound = vf + bound_prolog;
> >> >>>> +  *bound_scalar = vfm1 + bound_prolog;
> >> >>>>    if (check_profitability)
> >> >>>>      {
> >> >>>> -      th++;
> >> >>>> +      /* TH indicates the minimum niters of vectorized loop, while we
> >> >>>> +      compute the maximum niters of scalar loop.  */
> >> >>>> +      th--;
> >> >>>
> >> >>> Are you sure about this last change?  It looks like it should be dropping
> >> >> Hi Richard,
> >> >> Thanks for spotting this.  I vaguely remember I got this from the way
> >> >> how niter/th was checked in previous peeling code, but did't double
> >> >> check it now.  I tend to believe there is inconsistence about th,
> >> >> especially with comment like:
> >> >>
> >> >>   /* Threshold of number of iterations below which vectorzation will not be
> >> >>      performed. It is calculated from MIN_PROFITABLE_ITERS and
> >> >>      PARAM_MIN_VECT_LOOP_BOUND. */
> >> >>   unsigned int th;
> >> >>
> >> >> I also tend to believe the inconsistence was introduced partly because
> >> >> niters in vectorizer stands for latch_niters + 1, while latch_niters
> >> >> in rest of the compiler.
> >> >>
> >> >> and...,
> >> >>
> >> >>> the increment rather than replacing it with a decrement.
> >> >>>
> >> >>> It looks like the threshold is already the maximum niters for the scalar
> >> >>> loop.  It's set by:
> >> >>>
> >> >>>   min_scalar_loop_bound = ((PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)
> >> >>>                             * vectorization_factor) - 1);
> >> >>>
> >> >>>   /* Use the cost model only if it is more conservative than user specified
> >> >>>      threshold.  */
> >> >>>   th = (unsigned) min_scalar_loop_bound;
> >> >>>   if (min_profitable_iters
> >> >>>       && (!min_scalar_loop_bound
> >> >>>           || min_profitable_iters > min_scalar_loop_bound))
> >> >>>     th = (unsigned) min_profitable_iters;
> >> >>>
> >> >>>   LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = th;
> >> >>>
> >> >>> (Note the "- 1" in the min_scalar_loop_bound.  The multiplication result
> >> >>> is the minimum niters for the vector loop.)
> >> >> To be honest, min_scalar_loop_bound is more likely for something's
> >> >> lower bound which is the niters for the vector loop.  If it refers to
> >> >> the niters scalar loop, it is in actuality the "max" value we should
> >> >> use.  I am not quite sure here, partly because I am not a native
> >> >> speaker.
> >> >>
> >> >>>
> >> >>> min_profitable_iters sounds like it _ought_ to be the minimum niters for
> >> >>> which the vector loop is used, but vect_estimate_min_profitable_iters
> >> >>> instead returns the largest niters for which the scalar loop should be
> >> >>> preferred:
> >> >>>
> >> >>>   /* Cost model disabled.  */
> >> >>>   if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
> >> >>>     {
> >> >>>       dump_printf_loc (MSG_NOTE, vect_location, "cost model disabled.\n");
> >> >>>       *ret_min_profitable_niters = 0;
> >> >>>       *ret_min_profitable_estimate = 0;
> >> >>>       return;
> >> >>>     }
> >> >>>   [...]
> >> >>>   /* Because the condition we create is:
> >> >>>      if (niters <= min_profitable_iters)
> >> >>>        then skip the vectorized loop.  */
> >> >>>   min_profitable_iters--;
> >> >>>   [...]
> >> >>>   min_profitable_estimate --;
> >> >>>
> >> >>> Also, when deciding whether the threshold needs to be used, we have:
> >> >>>
> >> >>>   th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
> >> >>>   if (th >= LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1
> >> >>>       && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
> >> >>>     {
> >> >>>       if (dump_enabled_p ())
> >> >>>         dump_printf_loc (MSG_NOTE, vect_location,
> >> >>>                          "Profitability threshold is %d loop iterations.\n",
> >> >>>                          th);
> >> >>>       check_profitability = true;
> >> >>>     }
> >> >> Yes, this indeed indicates that th refers to the maximum niters of the
> >> >> scalar loop.  Anyway, we should resolve the inconsistence and make it
> >> >> more clear in variable names etc..
> >> >
> >> > Yeah, agree the variable names are really confusing.  Given the choice
> >> > between changing the names to match the current meaning and changing
> >> > the meaning to match the current names, the latter looks like it would
> >> > give cleaner code.  I'm happy to do that it that sounds like the way
> >> > to go.
> >> I am all for this direction, but Richard B may have some to say here.
> >> BTW, will you also fix the other inconsistence you found?
> >
> > I'm all to making the code clearer but I guess it will remain a
> > tricky area.  Each time I need to change sth there I need half an
> > hour to recap how all the stuff is tied together...
> >
> > So please go forward.  Many bonus points if you manage to
> > create testcases covering the mistakes you fix.
> 
> No testcases, sorry, but I built the patched compiler for
> aarch64-linux-gnu, arm-linux-gnueabi, x86_64-linux-gnu and
> powerpc64le-linux-gnu, compiled the testsuite with -O3,
> and compared the result with:
> 
> diff -ur gcc/tree-vect-loop.c /hdd/richards/compare/old/gcc/tree-vect-loop.c
> --- gcc/tree-vect-loop.c	2017-07-01 16:18:55.951581540 +0100
> +++ /hdd/richards/compare/old/gcc/tree-vect-loop.c	2017-07-01 16:00:20.266729964 +0100
> @@ -2247,8 +2247,10 @@
>        /* Niters for at least one iteration of vectorized loop.  */
>        niters_th += LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>        /* One additional iteration because of peeling for gap.  */
> -      if (!LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> +      if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
>  	niters_th++;
> +      /* Convert to maximum number of scalar iters.  */
> +      niters_th--;
>        if (LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) < niters_th)
>  	LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = niters_th;
>      }
> diff -ur gcc/tree-vect-loop-manip.c /hdd/richards/compare/old/gcc/tree-vect-loop-manip.c
> --- gcc/tree-vect-loop-manip.c	2017-07-01 16:18:55.951581540 +0100
> +++ /hdd/richards/compare/old/gcc/tree-vect-loop-manip.c	2017-07-01 14:00:26.775542411 +0100
> @@ -1143,9 +1143,6 @@
>    *bound_scalar = vfm1 + bound_prolog;
>    if (check_profitability)
>      {
> -      /* TH indicates the minimum niters of vectorized loop, while we
> -	 compute the maximum niters of scalar loop.  */
> -      th--;
>        /* Peeling for constant times.  */
>        if (int_niters_prolog >= 0)
>  	{
> 
> which is the patch that would make everything consistent with the
> current meaning (where the "minimum number of iterations" for which
> vectorisation was profitable were actually the maximum number of
> iterations for which vectorisation wasn't profitable).
> 
> Also tested normally on aarch64-linux-gnu and x86_64-linux-gnu.
> OK to install?

Ok.

Thanks,
Richard.

> Richard
> 
> 
> 2017-07-03  Richard Sandiford  <richard.sandiford@linaro.org>
> 
> gcc/
> 	* tree-vect-loop.c (vect_analyze_loop_2): Treat min_scalar_loop_bound,
> 	min_profitable_iters, and th as inclusive lower bounds.
> 	Fix LOOP_VINFO_PEELING_FOR_GAPS condition.
> 	(vect_estimate_min_profitable_iters): Return inclusive lower bounds
> 	for min_profitable_iters and min_profitable_estimate.
> 	(vect_transform_loop): Treat th as an inclusive lower bound.
> 	* tree-vect-loop-manip.c (vect_loop_versioning): Likewise.
> 
> Index: gcc/tree-vect-loop.c
> ===================================================================
> --- gcc/tree-vect-loop.c	2017-07-03 08:42:51.149380545 +0100
> +++ gcc/tree-vect-loop.c	2017-07-03 08:50:43.507370205 +0100
> @@ -2132,21 +2132,17 @@ vect_analyze_loop_2 (loop_vec_info loop_
>        goto again;
>      }
>  
> -  min_scalar_loop_bound = ((PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)
> -			    * vectorization_factor) - 1);
> +  min_scalar_loop_bound = (PARAM_VALUE (PARAM_MIN_VECT_LOOP_BOUND)
> +			   * vectorization_factor);
>  
>    /* Use the cost model only if it is more conservative than user specified
>       threshold.  */
> -  th = (unsigned) min_scalar_loop_bound;
> -  if (min_profitable_iters
> -      && (!min_scalar_loop_bound
> -          || min_profitable_iters > min_scalar_loop_bound))
> -    th = (unsigned) min_profitable_iters;
> +  th = (unsigned) MAX (min_scalar_loop_bound, min_profitable_iters);
>  
>    LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = th;
>  
>    if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> -      && LOOP_VINFO_INT_NITERS (loop_vinfo) <= th)
> +      && LOOP_VINFO_INT_NITERS (loop_vinfo) < th)
>      {
>        if (dump_enabled_p ())
>  	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -2165,7 +2161,7 @@ vect_analyze_loop_2 (loop_vec_info loop_
>      estimated_niter = max_niter;
>    if (estimated_niter != -1
>        && ((unsigned HOST_WIDE_INT) estimated_niter
> -          <= MAX (th, (unsigned)min_profitable_estimate)))
> +          < MAX (th, (unsigned) min_profitable_estimate)))
>      {
>        if (dump_enabled_p ())
>  	dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> @@ -2182,9 +2178,9 @@ vect_analyze_loop_2 (loop_vec_info loop_
>  
>    /* Decide whether we need to create an epilogue loop to handle
>       remaining scalar iterations.  */
> -  th = ((LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) + 1)
> -        / LOOP_VINFO_VECT_FACTOR (loop_vinfo))
> -       * LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> +  th = ((LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo)
> +	 / LOOP_VINFO_VECT_FACTOR (loop_vinfo))
> +	* LOOP_VINFO_VECT_FACTOR (loop_vinfo));
>  
>    if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
>        && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) > 0)
> @@ -2247,7 +2243,7 @@ vect_analyze_loop_2 (loop_vec_info loop_
>        /* Niters for at least one iteration of vectorized loop.  */
>        niters_th += LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>        /* One additional iteration because of peeling for gap.  */
> -      if (!LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> +      if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
>  	niters_th++;
>        if (LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) < niters_th)
>  	LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) = niters_th;
> @@ -3545,7 +3541,7 @@ vect_estimate_min_profitable_iters (loop
>    if ((scalar_single_iter_cost * vf) > (int) vec_inside_cost)
>      {
>        if (vec_outside_cost <= 0)
> -        min_profitable_iters = 1;
> +        min_profitable_iters = 0;
>        else
>          {
>            min_profitable_iters = ((vec_outside_cost - scalar_outside_cost) * vf
> @@ -3586,11 +3582,6 @@ vect_estimate_min_profitable_iters (loop
>    min_profitable_iters =
>  	min_profitable_iters < vf ? vf : min_profitable_iters;
>  
> -  /* Because the condition we create is:
> -     if (niters <= min_profitable_iters)
> -       then skip the vectorized loop.  */
> -  min_profitable_iters--;
> -
>    if (dump_enabled_p ())
>      dump_printf_loc (MSG_NOTE, vect_location,
>                       "  Runtime profitability threshold = %d\n",
> @@ -3606,7 +3597,7 @@ vect_estimate_min_profitable_iters (loop
>       SIC * niters > VIC * ((niters-PL_ITERS-EP_ITERS)/VF) + VOC + SOC  */
>  
>    if (vec_outside_cost <= 0)
> -    min_profitable_estimate = 1;
> +    min_profitable_estimate = 0;
>    else
>      {
>        min_profitable_estimate = ((vec_outside_cost + scalar_outside_cost) * vf
> @@ -3615,7 +3606,6 @@ vect_estimate_min_profitable_iters (loop
>  				 / ((scalar_single_iter_cost * vf)
>  				   - vec_inside_cost);
>      }
> -  min_profitable_estimate --;
>    min_profitable_estimate = MAX (min_profitable_estimate, min_profitable_iters);
>    if (dump_enabled_p ())
>      dump_printf_loc (MSG_NOTE, vect_location,
> @@ -7180,7 +7170,7 @@ vect_transform_loop (loop_vec_info loop_
>       run at least the vectorization factor number of times checking
>       is pointless, too.  */
>    th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo);
> -  if (th >= LOOP_VINFO_VECT_FACTOR (loop_vinfo) - 1
> +  if (th >= LOOP_VINFO_VECT_FACTOR (loop_vinfo)
>        && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo))
>      {
>        if (dump_enabled_p ())
> Index: gcc/tree-vect-loop-manip.c
> ===================================================================
> --- gcc/tree-vect-loop-manip.c	2017-07-03 08:42:46.863565062 +0100
> +++ gcc/tree-vect-loop-manip.c	2017-07-03 08:50:43.506370248 +0100
> @@ -2142,7 +2142,7 @@ vect_loop_versioning (loop_vec_info loop
>    bool version_niter = LOOP_REQUIRES_VERSIONING_FOR_NITERS (loop_vinfo);
>  
>    if (check_profitability)
> -    cond_expr = fold_build2 (GT_EXPR, boolean_type_node, scalar_loop_iters,
> +    cond_expr = fold_build2 (GE_EXPR, boolean_type_node, scalar_loop_iters,
>  			     build_int_cst (TREE_TYPE (scalar_loop_iters),
>  						       th));
>  
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-07-03  8:43 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-10-19 10:39 [PATCH PR78005]Fix miscompare issue by computing correct guard condition for vectorized loop Bin Cheng
2016-10-19 10:41 ` Richard Biener
2017-06-10  9:40 ` Richard Sandiford
2017-06-11 22:24   ` Bin.Cheng
2017-06-12  8:19     ` Richard Sandiford
2017-06-12  8:32       ` Bin.Cheng
2017-06-12  8:44         ` Richard Biener
2017-07-03  8:07           ` Richard Sandiford
2017-07-03  8:43             ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).