public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH/RFC] Make loop-header-copying more aggressive, rerun before tree-if-conversion
@ 2015-05-22 15:46 Alan Lawrence
  2015-05-27 16:01 ` Jeff Law
  2015-05-28 12:11 ` Richard Biener
  0 siblings, 2 replies; 8+ messages in thread
From: Alan Lawrence @ 2015-05-22 15:46 UTC (permalink / raw)
  To: gcc-patches; +Cc: Richard Biener

[-- Attachment #1: Type: text/plain, Size: 4408 bytes --]

This example which I wrote to test ifconversion, currently fails to if-convert 
or vectorize:

int foo ()
{
   for (int i = 0; i < 32 ; i++)
     {
       int m = (a[i] & i) ? 5 : 4;
       b[i] = a[i] * m;
     }
}

...because jump-threading in dom1 rearranged the loop into a form that neither 
if-conversion nor vectorization would attempt. Discussion at 
https://gcc.gnu.org/ml/gcc/2015-04/msg00343.html lead to the suggestion that I 
should rerun loop-header copying (an earlier attempt to fix ifconversion, 
https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01743.html, still did not enable 
vectorization.)

This patch does so (and makes slightly less conservative, to tackle the example 
above). I found I had to make this a separate pass, so that the phi nodes were 
cleaned up at the end of the pass before running tree_if_conversion. Also at 
this stage in the compiler (inside loop opts) it was not possible to run 
loop_optimizer_init+finalize, or other loop_optimizer data structures needed 
later would be deleted; hence, I have two nearly-but-not-quite-identical passes, 
the new "ch_vect" avoiding the init/finalize. I tried to tackle this with some 
C++ subclassing, which removes the duplication, but the result feels a little 
ugly; suggestions for any neater approach welcome.

This patch causes failure of the scan-tree-dump of dom2 in gcc.dg/ssa/pr21417.c. 
This looks for jump-threading to perform an optimization, but no longer finds 
the expected line in the log - as the loop-header-copying phase has already done 
an equivalent transformation *before* dom2. The final CFG is thus in the desired 
form, but I'm not sure how to determine this (scanning the CFG itself is very 
difficult, well beyond what we can do with regex, requiring looking at multiple 
lines and basic blocks). Can anyone advise? [The test issue can be worked around 
by preserving the old do_while_p logic for the first header-copying pass, and 
using the new logic only for the second, but this is more awkward inside the 
compiler, which feels wrong.]

Besides the new vect-ifcvt-11.c, the testsuite actually has a couple of other 
examples where this patch enables (undesired!) vectorization. I've dealt with 
these, but for the record:
	* gcc.dg/vect/slp-perm-7.c: the initialization loop in main, contained a check 
that input[i] < 200; this was already optimized out (because input[i] was set to 
i%256, where i<N with N #defined to 16), but that loop was not vectorized because:
/work/alalaw01/oban/srcfsf/gcc/gcc/testsuite/gcc.dg/vect/slp-perm-7.c:54:3: 
note: not vectorized: latch block not empty.
/work/alalaw01/oban/srcfsf/gcc/gcc/testsuite/gcc.dg/vect/slp-perm-7.c:54:3: 
note: bad loop form.

	* gcc.dg/vect/vect-strided-a-u16-i4.c: the main1() function has three loops; 
the first (initialization) has an 'if (y) abort() /* Avoid vectorization.  */'. 
However, the 'volatile int y = 0' this was meant to reference, is actually 
shadowed by a local non-volatile; the test is thus peeled off and absent from 
the body of the loop. The loop only avoided vectorization because of non-empty 
latch and bad loop form, as previous.

With this patch, both those loops now have good form, hence I have fixed both to 
check a global volatile to prevent these extraneous parts from being vectorized.

Tested with bootstrap + check-gcc on x86_64 and AArch64 (linux). As noted above, 
this causes a spurious PASS->FAIL of a scan-tree-dump test, which I'm unsure how 
to fix, but no other regressions.

gcc/ChangeLog:

	* tree-pass.h (make_pass_ch_vect): New.
	* passes.def: Add pass_ch_vect just before pass_if_conversion.

	* tree-ssa-loop-ch.c (do_while_loop_p): For single-exit loops,
	look for blocks with exit edge and code after them.
	(pass_data_ch_vect, class pass_ch_vect, make_pass_ch_vect): New.
	(class pass_ch): Extend pass_ch_vect.
	(pass_ch::execute): Move all but loop_optimizer_init/finalize to...
	(pass_ch_vect::execute): ...here.

	* tree-ssa-loop.c (pass_tree_loop_init::execute): Add flags
	LOOPS_HAVE_PREHEADERS and LOOPS_HAVE_SIMPLE_LATCHES.

gcc/testsuite/ChangeLog:

	* gcc.dg/vect/slp-perm-7.c (zero): New.
	(main): Test zero rather than input[i], to avoid vectorization.
	* gcc.dg/vect/vect-strided-a-u16-i4.c (main1): Narrow scope of x,y,z,w.
	of unsigned
	* gcc.dg/vect/vect-ifcvt-11.c: New.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: rerun-loop-ch.patch --]
[-- Type: text/x-patch; name=rerun-loop-ch.patch, Size: 7848 bytes --]

diff --git a/gcc/passes.def b/gcc/passes.def
index 1d598b2..87cfe2a 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -247,6 +247,7 @@ along with GCC; see the file COPYING3.  If not see
 	  PUSH_INSERT_PASSES_WITHIN (pass_parallelize_loops)
 	      NEXT_PASS (pass_expand_omp_ssa);
 	  POP_INSERT_PASSES ()
+	  NEXT_PASS (pass_ch_vect);
 	  NEXT_PASS (pass_if_conversion);
 	  /* pass_vectorize must immediately follow pass_if_conversion.
 	     Please do not add any other passes in between.  */
diff --git a/gcc/testsuite/gcc.dg/vect/slp-perm-7.c b/gcc/testsuite/gcc.dg/vect/slp-perm-7.c
index 6291096..e0ec55a 100644
--- a/gcc/testsuite/gcc.dg/vect/slp-perm-7.c
+++ b/gcc/testsuite/gcc.dg/vect/slp-perm-7.c
@@ -20,6 +20,8 @@
 
 #define N 16
 
+volatile int zero = 0;
+
 /* SLP with load permutation and loop-based vectorization.  */
 void foo (int *__restrict__ pInput, int *__restrict__ pOutput,
           int *__restrict__ pInput2, int *__restrict__ pOutput2)
@@ -57,7 +59,7 @@ int main (int argc, const char* argv[])
       input2[i] = i%256;
       output[i] = 0;
       output2[i] = 0;
-      if (input[i] > 200)
+      if (zero) /* Avoid vectorization.  */
         abort ();
     }
 
diff --git a/gcc/testsuite/gcc.dg/vect/vect-ifcvt-11.c b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-11.c
new file mode 100644
index 0000000..53b139d
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-ifcvt-11.c
@@ -0,0 +1,37 @@
+/* { dg-require-effective-target vect_condition } */
+/* { dg-require-effective-target vect_int } */
+
+#include "tree-vect.h"
+
+#define N 16
+
+extern void abort(void);
+
+int A[N] = {36, 39, 42, 45, 43, 32, 21, 12, 23, 34, 45, 56, 67, 78, 81, 11};
+int B[N] = {144,195,210,225,172,128,105,60, 92, 136,225,280,268,390,324,55};
+
+__attribute__((noinline))
+void foo ()
+{
+  for (int i = 0; i < N; i++)
+    {
+      int m = (A[i] & i) ? 5 : 4;
+      A[i] = A[i] * m;
+    }
+}
+
+int main ()
+{
+
+  check_vect ();
+  foo ();
+  /* check results:  */
+  for (int i = 0; i < N; i++)
+    if (A[i] != B[i])
+      abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c
index 68114a6..0656fb7 100644
--- a/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c
+++ b/gcc/testsuite/gcc.dg/vect/vect-strided-a-u16-i4.c
@@ -21,7 +21,6 @@ main1 ()
   s *ptr = arr;
   s res[N];
   int i;
-  unsigned short x, y, z, w;
 
   for (i = 0; i < N; i++)
     {
@@ -35,6 +34,7 @@ main1 ()
 
   for (i = 0; i < N; i++)
     {
+      unsigned short x, y, z, w;
       x = ptr->b - ptr->a;
       y = ptr->d - ptr->c;
       res[i].c = x + y;
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index bc8763d..cb0f986 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -379,6 +379,7 @@ extern gimple_opt_pass *make_pass_loop_prefetch (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_iv_optimize (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_tree_loop_done (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ch (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_ch_vect (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_ccp (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_phi_only_cprop (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_build_ssa (gcc::context *ctxt);
diff --git a/gcc/tree-ssa-loop-ch.c b/gcc/tree-ssa-loop-ch.c
index c6441b8..0fb7155 100644
--- a/gcc/tree-ssa-loop-ch.c
+++ b/gcc/tree-ssa-loop-ch.c
@@ -140,6 +140,29 @@ do_while_loop_p (struct loop *loop)
       && gimple_code (stmt) == GIMPLE_COND)
     return false;
 
+  /* Leave anything with multiple exits, given it satisfies the above.  */
+  if (!single_exit (loop))
+    return true;
+
+  /* If any block in the loop has an exit edge, and code after it, it is
+     not a do-while loop.  */
+  basic_block *body = get_loop_body (loop);
+  for (unsigned i = 0; i < loop->num_nodes; i++)
+    {
+      bool block_has_exit = false, block_precedes_code = false;
+      edge_iterator ei;
+      edge e;
+      FOR_EACH_EDGE (e, ei, body[i]->succs)
+	if (loop_exit_edge_p (loop, e))
+	  block_has_exit = true;
+	else if (e->dest != loop->header
+		 && e->dest != loop->latch)
+	  block_precedes_code = true;
+      if (block_has_exit && block_precedes_code)
+        return false;
+    }
+  free (body);
+
   return true;
 }
 
@@ -162,21 +185,48 @@ const pass_data pass_data_ch =
   0, /* todo_flags_finish */
 };
 
-class pass_ch : public gimple_opt_pass
+const pass_data pass_data_ch_vect =
+{
+  GIMPLE_PASS, /* type */
+  "ch_vect", /* name */
+  OPTGROUP_LOOP, /* optinfo_flags */
+  TV_TREE_CH, /* tv_id */
+  ( PROP_cfg | PROP_ssa ), /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  0, /* todo_flags_finish */
+};
+
+class pass_ch_vect : public gimple_opt_pass
 {
 public:
-  pass_ch (gcc::context *ctxt)
-    : gimple_opt_pass (pass_data_ch, ctxt)
+  pass_ch_vect (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_ch_vect, ctxt)
   {}
 
   /* opt_pass methods: */
   virtual bool gate (function *) { return flag_tree_ch != 0; }
   virtual unsigned int execute (function *);
+protected:
+  pass_ch_vect (const pass_data data, gcc::context *ctxt)
+    : gimple_opt_pass (data, ctxt)
+  {}
+}; // class pass_ch_vect
 
+/* Adds a call to loop_optimizer_init before it executes,
+   and loop_optimizer_finalize after.  */
+class pass_ch : public pass_ch_vect
+{
+public:
+  pass_ch (gcc::context *ctxt) : pass_ch_vect (pass_data_ch, ctxt)
+  {}
+
+  virtual unsigned int execute (function *);
 }; // class pass_ch
 
 unsigned int
-pass_ch::execute (function *fun)
+pass_ch_vect::execute (function *fun)
 {
   struct loop *loop;
   basic_block header;
@@ -186,13 +236,8 @@ pass_ch::execute (function *fun)
   unsigned bbs_size;
   bool changed = false;
 
-  loop_optimizer_init (LOOPS_HAVE_PREHEADERS
-		       | LOOPS_HAVE_SIMPLE_LATCHES);
   if (number_of_loops (fun) <= 1)
-    {
-      loop_optimizer_finalize ();
       return 0;
-    }
 
   bbs = XNEWVEC (basic_block, n_basic_blocks_for_fn (fun));
   copied_bbs = XNEWVEC (basic_block, n_basic_blocks_for_fn (fun));
@@ -296,17 +341,35 @@ pass_ch::execute (function *fun)
       changed = true;
     }
 
-  update_ssa (TODO_update_ssa);
+  if (changed)
+    update_ssa (TODO_update_ssa);
   free (bbs);
   free (copied_bbs);
 
-  loop_optimizer_finalize ();
   return changed ? TODO_cleanup_cfg : 0;
 }
 
+unsigned int
+pass_ch::execute (function *fun)
+{
+  loop_optimizer_init (LOOPS_HAVE_PREHEADERS
+		       | LOOPS_HAVE_SIMPLE_LATCHES);
+
+  unsigned int res = pass_ch_vect::execute (fun);
+
+  loop_optimizer_finalize ();
+  return res;
+}
+
 } // anon namespace
 
 gimple_opt_pass *
+make_pass_ch_vect (gcc::context *ctxt)
+{
+  return new pass_ch_vect (ctxt);
+}
+
+gimple_opt_pass *
 make_pass_ch (gcc::context *ctxt)
 {
   return new pass_ch (ctxt);
diff --git a/gcc/tree-ssa-loop.c b/gcc/tree-ssa-loop.c
index ccb8f97..a5029bb 100644
--- a/gcc/tree-ssa-loop.c
+++ b/gcc/tree-ssa-loop.c
@@ -234,7 +234,9 @@ unsigned int
 pass_tree_loop_init::execute (function *fun ATTRIBUTE_UNUSED)
 {
   loop_optimizer_init (LOOPS_NORMAL
-		       | LOOPS_HAVE_RECORDED_EXITS);
+		       | LOOPS_HAVE_RECORDED_EXITS
+		       | LOOPS_HAVE_PREHEADERS
+		       | LOOPS_HAVE_SIMPLE_LATCHES);
   rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa);
 
   /* We might discover new loops, e.g. when turning irreducible

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-07-01 17:06 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-22 15:46 [PATCH/RFC] Make loop-header-copying more aggressive, rerun before tree-if-conversion Alan Lawrence
2015-05-27 16:01 ` Jeff Law
2015-06-19 17:54   ` Alan Lawrence
2015-06-23 20:34     ` Jeff Law
2015-06-30 16:10       ` Alan Lawrence
2015-07-01 17:06         ` Jeff Law
2015-05-28 12:11 ` Richard Biener
2015-06-19 17:38   ` Alan Lawrence

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).