public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Yet another simple fix to enhance outer-loop vectorization.
@ 2015-06-08 10:43 Yuri Rumyantsev
  2015-06-09 13:28 ` Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: Yuri Rumyantsev @ 2015-06-08 10:43 UTC (permalink / raw)
  To: gcc-patches, Richard Biener, Igor Zamyatin

[-- Attachment #1: Type: text/plain, Size: 1036 bytes --]

Hi All,

Here is a simple fix which allows duplication of outer loops to
perform peeling for number of iterations if outer loop is marked with
pragma omp simd.

Bootstrap and regression testing did not show any new failures.
Is it OK for trunk?

ChangeLog:

2015-06-08  Yuri Rumyantsev  <ysrumyan@gmail.com>

* tree-vect-loop-manip.c (rename_variables_in_bb): Add argument
to allow renaming of PHI arguments on edges incoming from outer
loop header, add corresponding check before start PHI iterator.
(slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool
variable DUPLICATE_OUTER_LOOP and set it to true for outer loops
with true force_vectorize.  Set-up dominator for outer loop too.
Pass DUPLICATE_OUTER_LOOP as argument to rename_variables_in_bb.
(slpeel_can_duplicate_loop_p): Allow duplicate of outer loop if it
was marked with force_vectorize and has restricted cfg.
* tre-vect-loop.c (vect_analyze_loop_2): Prohibit alignment peeling
for outer loops.

gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-outer-simd-2.c: New test.

[-- Attachment #2: patch.1 --]
[-- Type: application/octet-stream, Size: 5942 bytes --]

Index: testsuite/gcc.dg/vect/vect-outer-simd-2.c
===================================================================
--- testsuite/gcc.dg/vect/vect-outer-simd-2.c	(revision 0)
+++ testsuite/gcc.dg/vect/vect-outer-simd-2.c	(working copy)
@@ -0,0 +1,75 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -ffast-math" } */
+#include <stdlib.h>
+#include "tree-vect.h"
+#define N 64
+
+float *px, *py;
+float *tx, *ty;
+float *x1, *z1, *t1, *t2;
+
+static void inline bar (const float cx, float cy,
+                         float *vx, float *vy)
+{
+  int j;
+    for (j = 0; j < N; ++j)
+    {
+        const float dx  = cx - px[j];
+        const float dy  = cy - py[j];
+        *vx               -= dx * tx[j];
+        *vy               -= dy * ty[j];
+    }
+}
+
+__attribute__((noinline, noclone)) void foo1 (int n)
+{
+  int i;
+#pragma omp simd
+  for (i=0; i<n; i++)
+    bar (px[i], py[i], x1+i, z1+i);
+}
+
+__attribute__((noinline, noclone)) void foo2 (int n)
+{
+  volatile int i;
+  for (i=0; i<n; i++)
+    bar (px[i], py[i], x1+i, z1+i);
+}
+
+
+int main ()
+{
+  float *X = (float*)malloc (N * 8 * sizeof (float));
+  int i;
+  int n = N - 1;
+  check_vect ();
+  px = &X[0];
+  py = &X[N * 1];
+  tx = &X[N * 2];
+  ty = &X[N * 3];
+  x1 = &X[N * 4];
+  z1 = &X[N * 5];
+  t1 = &X[N * 6];
+  t2 = &X[N * 7];
+
+  for (i=0; i<N; i++)
+    {
+      px[i] = (float) (i+2);
+      tx[i] = (float) (i+1);
+      py[i] = (float) (i+4);
+      ty[i] = (float) (i+3);
+      x1[i] = z1[i] = 1.0f;
+    }
+  foo1 (n);  /* vector variant.  */
+  for (i=0; i<N;i++)
+    {
+      t1[i] = x1[i]; x1[i] = 1.0f;
+      t2[i] = z1[i]; z1[i] = 1.0f;
+    }
+  foo2 (n);  /* scalar variant.  */
+  for (i=0; i<N; i++)
+    if (x1[i] != t1[i] || z1[i] != t2[i])
+      abort ();
+  return 0;
+}
+/* { dg-final { scan-tree-dump "OUTER LOOP VECTORIZED" "vect" } } */
Index: tree-vect-loop-manip.c
===================================================================
--- tree-vect-loop-manip.c	(revision 224100)
+++ tree-vect-loop-manip.c	(working copy)
@@ -97,10 +97,12 @@
 }
 
 
-/* Renames the variables in basic block BB.  */
+/* Renames the variables in basic block BB.  Allow renaming  of PHI argumnets
+   on edges incoming from outer-block header if RENAME_FROM_OUTER_LOOP is
+   true.  */
 
 static void
-rename_variables_in_bb (basic_block bb)
+rename_variables_in_bb (basic_block bb, bool rename_from_outer_loop)
 {
   gimple stmt;
   use_operand_p use_p;
@@ -108,7 +110,14 @@
   edge e;
   edge_iterator ei;
   struct loop *loop = bb->loop_father;
+  struct loop *outer_loop = NULL;
 
+  if (rename_from_outer_loop)
+    {
+      gcc_assert (loop);
+      outer_loop = loop_outer (loop);
+    }
+
   for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
        gsi_next (&gsi))
     {
@@ -119,7 +128,8 @@
 
   FOR_EACH_EDGE (e, ei, bb->preds)
     {
-      if (!flow_bb_inside_loop_p (loop, e->src))
+      if (!flow_bb_inside_loop_p (loop, e->src)
+	  && (!rename_from_outer_loop || e->src != outer_loop->header))
 	continue;
       for (gphi_iterator gsi = gsi_start_phis (bb); !gsi_end_p (gsi);
 	   gsi_next (&gsi))
@@ -775,6 +785,7 @@
   bool was_imm_dom;
   basic_block exit_dest;
   edge exit, new_exit;
+  bool duplicate_outer_loop = false;
 
   exit = single_exit (loop);
   at_exit = (e == exit);
@@ -786,7 +797,10 @@
 
   bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
   get_loop_body_with_size (scalar_loop, bbs, scalar_loop->num_nodes);
-
+  /* Allow duplication of outer loops if they are marked with pragma
+     omp simd.  */
+  if (scalar_loop->force_vectorize && scalar_loop->inner)
+    duplicate_outer_loop = true;
   /* Check whether duplication is possible.  */
   if (!can_copy_bbs_p (bbs, scalar_loop->num_nodes))
     {
@@ -855,7 +869,7 @@
       redirect_edge_and_branch_force (e, new_preheader);
       flush_pending_stmts (e);
       set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src);
-      if (was_imm_dom)
+      if (was_imm_dom || duplicate_outer_loop)
 	set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit->src);
 
       /* And remove the non-necessary forwarder again.  Keep the other
@@ -898,7 +912,7 @@
     }
 
   for (unsigned i = 0; i < scalar_loop->num_nodes + 1; i++)
-    rename_variables_in_bb (new_bbs[i]);
+    rename_variables_in_bb (new_bbs[i], duplicate_outer_loop);
 
   if (scalar_loop != loop)
     {
@@ -985,7 +999,10 @@
    (3) it is single entry, single exit
    (4) its exit condition is the last stmt in the header
    (5) E is the entry/exit edge of LOOP.
- */
+   Allow duplication of outer loops if:
+   (1') it is marked with force_vectorize flag.
+   (2') it consists of exactly 5 basic blocks.
+   Other conditions are taken above.  */
 
 bool
 slpeel_can_duplicate_loop_p (const struct loop *loop, const_edge e)
@@ -995,6 +1012,11 @@
   gcond *orig_cond = get_loop_exit_condition (loop);
   gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
 
+  if (loop->inner && loop->force_vectorize && loop->num_nodes == 5
+      && single_exit (loop) && (e == exit_e || e == entry_e)
+      && orig_cond && orig_cond == gsi_stmt (loop_exit_gsi))
+    return true;
+
   if (loop->inner
       /* All loops have an outer scope; the only case loop->outer is NULL is for
          the function itself.  */
Index: tree-vect-loop.c
===================================================================
--- tree-vect-loop.c	(revision 224100)
+++ tree-vect-loop.c	(working copy)
@@ -1879,6 +1879,10 @@
       return false;
     }
 
+  /* Peeling for alignment is not supported for outer-loop vectorization.  */
+  if (LOOP_VINFO_LOOP (loop_vinfo)->inner)
+    LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) = 0;
+
   /* Decide whether we need to create an epilogue loop to handle
      remaining scalar iterations.  */
   th = ((LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo) + 1)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Yet another simple fix to enhance outer-loop vectorization.
  2015-06-08 10:43 [PATCH] Yet another simple fix to enhance outer-loop vectorization Yuri Rumyantsev
@ 2015-06-09 13:28 ` Richard Biener
  2015-06-16 14:39   ` Yuri Rumyantsev
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Biener @ 2015-06-09 13:28 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin

On Mon, Jun 8, 2015 at 12:27 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Hi All,
>
> Here is a simple fix which allows duplication of outer loops to
> perform peeling for number of iterations if outer loop is marked with
> pragma omp simd.
>
> Bootstrap and regression testing did not show any new failures.
> Is it OK for trunk?

Hmm, I don't remember needing to adjust rename_variables_in_bb
when teaching loop distibution to call slpeel_tree_duplicate_to_edge_cfg
on non-innermost loops...  (I just copied, I never called
slpeel_can_duplicate_loop_p though).

So - you should just remove the loop->inner condition from
slpeel_can_duplicate_loop_p as it is used by non-vectorizer
code as well (yeah, I never merged the nested loop support
for loop distribution...).

Index: tree-vect-loop.c
===================================================================
--- tree-vect-loop.c    (revision 224100)
+++ tree-vect-loop.c    (working copy)
@@ -1879,6 +1879,10 @@
       return false;
     }

+  /* Peeling for alignment is not supported for outer-loop vectorization.  */
+  if (LOOP_VINFO_LOOP (loop_vinfo)->inner)
+    LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) = 0;

I think you can't simply do this - if vect_enhance_data_refs_alignment
decided to peel for alignment then it has adjusted the DRs alignment
info already.  So instead of the above simply disallow peeling for
alignment in vect_enhance_data_refs_alignment?  Thus add
|| ->inner to

  /* Check if we can possibly peel the loop.  */
  if (!vect_can_advance_ivs_p (loop_vinfo)
      || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)))
    do_peeling = false;

?

I also can't see why the improvement has to be gated on force_vect,
it surely looks profitable to enable more outer loop vectorization in
general, no?

How do the cost model calculations end up with peeling the outer loop
for niter?

On targets which don't support unaligned accesses we're left with
versioning for alignment.  Isn't peeling for alignment better there?
Thus only disallow peeling for alignment if there is no unhandled
alignment?

Thanks,
Richard.

> ChangeLog:
>
> 2015-06-08  Yuri Rumyantsev  <ysrumyan@gmail.com>
>
> * tree-vect-loop-manip.c (rename_variables_in_bb): Add argument
> to allow renaming of PHI arguments on edges incoming from outer
> loop header, add corresponding check before start PHI iterator.
> (slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool
> variable DUPLICATE_OUTER_LOOP and set it to true for outer loops
> with true force_vectorize.  Set-up dominator for outer loop too.
> Pass DUPLICATE_OUTER_LOOP as argument to rename_variables_in_bb.
> (slpeel_can_duplicate_loop_p): Allow duplicate of outer loop if it
> was marked with force_vectorize and has restricted cfg.
> * tre-vect-loop.c (vect_analyze_loop_2): Prohibit alignment peeling
> for outer loops.
>
> gcc/testsuite/ChangeLog:
> * gcc.dg/vect/vect-outer-simd-2.c: New test.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Yet another simple fix to enhance outer-loop vectorization.
  2015-06-09 13:28 ` Richard Biener
@ 2015-06-16 14:39   ` Yuri Rumyantsev
  2015-06-17 12:28     ` Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: Yuri Rumyantsev @ 2015-06-16 14:39 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin

[-- Attachment #1: Type: text/plain, Size: 4849 bytes --]

Thanks a lot Richard for your review.

I presented updated patch which is not gated by force_vectorize.
I added test on outer-loop in vect_enhance_data_refs_alignment
and it returns false for it because we can not improve dr alighment
through outer-loop peeling in general. So I assume that only
versioning for alignment can be applied for targets do not support
unaligned memory access.

I did not change tests for outer loops in slpeel_can_duplicate_loop_p
as you proposed since it is not called outside vectorization.

I also noticed one not-resolved issue with outer-loop peeling - we don't
consider remainder for possible vectorization of inner-loop as we can see
on the following example:

  for (i = 0; i < n; i++) {
    diff = 0;
    for (j = 0; j < M; j++) {
      diff += in[j+i]*coeff[j];
    }
    out[i] = diff;
  }

Is it worth to fix it?

2015-06-16  Yuri Rumyantsev  <ysrumyan@gmail.com>

* tree-vect-loop-manip.c (rename_variables_in_bb): Add argument
to allow renaming of PHI arguments on edges incoming from outer
loop header, add corresponding check before start PHI iterator.
(slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool
variable DUPLICATE_OUTER_LOOP and set it to true for outer loops
with true force_vectorize.  Set-up dominator for outer loop too.
Pass DUPLICATE_OUTER_LOOP as argument to rename_variables_in_bb.
(slpeel_can_duplicate_loop_p): Allow duplicate of outer loop if it
was marked with force_vectorize and has restricted cfg.
* tree-vect-data-refs.c (vector_alignment_reachable_p): Alignment can
not be reachable for outer loops.
(vect_enhance_data_refs_alignment): Add test on true value of
do_peeling.

gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-outer-simd-2.c: New test.

2015-06-09 16:26 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Mon, Jun 8, 2015 at 12:27 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Hi All,
>>
>> Here is a simple fix which allows duplication of outer loops to
>> perform peeling for number of iterations if outer loop is marked with
>> pragma omp simd.
>>
>> Bootstrap and regression testing did not show any new failures.
>> Is it OK for trunk?
>
> Hmm, I don't remember needing to adjust rename_variables_in_bb
> when teaching loop distibution to call slpeel_tree_duplicate_to_edge_cfg
> on non-innermost loops...  (I just copied, I never called
> slpeel_can_duplicate_loop_p though).
>
> So - you should just remove the loop->inner condition from
> slpeel_can_duplicate_loop_p as it is used by non-vectorizer
> code as well (yeah, I never merged the nested loop support
> for loop distribution...).
>
> Index: tree-vect-loop.c
> ===================================================================
> --- tree-vect-loop.c    (revision 224100)
> +++ tree-vect-loop.c    (working copy)
> @@ -1879,6 +1879,10 @@
>        return false;
>      }
>
> +  /* Peeling for alignment is not supported for outer-loop vectorization.  */
> +  if (LOOP_VINFO_LOOP (loop_vinfo)->inner)
> +    LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) = 0;
>
> I think you can't simply do this - if vect_enhance_data_refs_alignment
> decided to peel for alignment then it has adjusted the DRs alignment
> info already.  So instead of the above simply disallow peeling for
> alignment in vect_enhance_data_refs_alignment?  Thus add
> || ->inner to
>
>   /* Check if we can possibly peel the loop.  */
>   if (!vect_can_advance_ivs_p (loop_vinfo)
>       || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)))
>     do_peeling = false;
>
> ?
>
> I also can't see why the improvement has to be gated on force_vect,
> it surely looks profitable to enable more outer loop vectorization in
> general, no?
>
> How do the cost model calculations end up with peeling the outer loop
> for niter?
>
> On targets which don't support unaligned accesses we're left with
> versioning for alignment.  Isn't peeling for alignment better there?
> Thus only disallow peeling for alignment if there is no unhandled
> alignment?
>
> Thanks,
> Richard.
>
>> ChangeLog:
>>
>> 2015-06-08  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>
>> * tree-vect-loop-manip.c (rename_variables_in_bb): Add argument
>> to allow renaming of PHI arguments on edges incoming from outer
>> loop header, add corresponding check before start PHI iterator.
>> (slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool
>> variable DUPLICATE_OUTER_LOOP and set it to true for outer loops
>> with true force_vectorize.  Set-up dominator for outer loop too.
>> Pass DUPLICATE_OUTER_LOOP as argument to rename_variables_in_bb.
>> (slpeel_can_duplicate_loop_p): Allow duplicate of outer loop if it
>> was marked with force_vectorize and has restricted cfg.
>> * tre-vect-loop.c (vect_analyze_loop_2): Prohibit alignment peeling
>> for outer loops.
>>
>> gcc/testsuite/ChangeLog:
>> * gcc.dg/vect/vect-outer-simd-2.c: New test.

[-- Attachment #2: patch.1.2 --]
[-- Type: application/octet-stream, Size: 6291 bytes --]

Index: testsuite/gcc.dg/vect/vect-outer-simd-2.c
===================================================================
--- testsuite/gcc.dg/vect/vect-outer-simd-2.c	(revision 0)
+++ testsuite/gcc.dg/vect/vect-outer-simd-2.c	(working copy)
@@ -0,0 +1,75 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -ffast-math" } */
+#include <stdlib.h>
+#include "tree-vect.h"
+#define N 64
+
+float *px, *py;
+float *tx, *ty;
+float *x1, *z1, *t1, *t2;
+
+static void inline bar (const float cx, float cy,
+                         float *vx, float *vy)
+{
+  int j;
+    for (j = 0; j < N; ++j)
+    {
+        const float dx  = cx - px[j];
+        const float dy  = cy - py[j];
+        *vx               -= dx * tx[j];
+        *vy               -= dy * ty[j];
+    }
+}
+
+__attribute__((noinline, noclone)) void foo1 (int n)
+{
+  int i;
+#pragma omp simd
+  for (i=0; i<n; i++)
+    bar (px[i], py[i], x1+i, z1+i);
+}
+
+__attribute__((noinline, noclone)) void foo2 (int n)
+{
+  volatile int i;
+  for (i=0; i<n; i++)
+    bar (px[i], py[i], x1+i, z1+i);
+}
+
+
+int main ()
+{
+  float *X = (float*)malloc (N * 8 * sizeof (float));
+  int i;
+  int n = N - 1;
+  check_vect ();
+  px = &X[0];
+  py = &X[N * 1];
+  tx = &X[N * 2];
+  ty = &X[N * 3];
+  x1 = &X[N * 4];
+  z1 = &X[N * 5];
+  t1 = &X[N * 6];
+  t2 = &X[N * 7];
+
+  for (i=0; i<N; i++)
+    {
+      px[i] = (float) (i+2);
+      tx[i] = (float) (i+1);
+      py[i] = (float) (i+4);
+      ty[i] = (float) (i+3);
+      x1[i] = z1[i] = 1.0f;
+    }
+  foo1 (n);  /* vector variant.  */
+  for (i=0; i<N;i++)
+    {
+      t1[i] = x1[i]; x1[i] = 1.0f;
+      t2[i] = z1[i]; z1[i] = 1.0f;
+    }
+  foo2 (n);  /* scalar variant.  */
+  for (i=0; i<N; i++)
+    if (x1[i] != t1[i] || z1[i] != t2[i])
+      abort ();
+  return 0;
+}
+/* { dg-final { scan-tree-dump "OUTER LOOP VECTORIZED" "vect" } } */
Index: tree-vect-data-refs.c
===================================================================
--- tree-vect-data-refs.c	(revision 224100)
+++ tree-vect-data-refs.c	(working copy)
@@ -998,7 +998,12 @@
   gimple stmt = DR_STMT (dr);
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
 
+  /* Peeling of outer loops can't improve alignment.  */
+  if (loop_vinfo && LOOP_VINFO_LOOP (loop_vinfo)->inner)
+    return false;
+
   if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
     {
       /* For interleaved access we peel only if number of iterations in
@@ -1537,8 +1542,9 @@
     }
 
   /* Check if we can possibly peel the loop.  */
-  if (!vect_can_advance_ivs_p (loop_vinfo)
-      || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)))
+  if (do_peeling
+      && (!vect_can_advance_ivs_p (loop_vinfo)
+	  || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))))
     do_peeling = false;
 
   if (do_peeling
Index: tree-vect-loop-manip.c
===================================================================
--- tree-vect-loop-manip.c	(revision 224100)
+++ tree-vect-loop-manip.c	(working copy)
@@ -97,10 +97,12 @@
 }
 
 
-/* Renames the variables in basic block BB.  */
+/* Renames the variables in basic block BB.  Allow renaming  of PHI argumnets
+   on edges incoming from outer-block header if RENAME_FROM_OUTER_LOOP is
+   true.  */
 
 static void
-rename_variables_in_bb (basic_block bb)
+rename_variables_in_bb (basic_block bb, bool rename_from_outer_loop)
 {
   gimple stmt;
   use_operand_p use_p;
@@ -108,7 +110,14 @@
   edge e;
   edge_iterator ei;
   struct loop *loop = bb->loop_father;
+  struct loop *outer_loop = NULL;
 
+  if (rename_from_outer_loop)
+    {
+      gcc_assert (loop);
+      outer_loop = loop_outer (loop);
+    }
+
   for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
        gsi_next (&gsi))
     {
@@ -119,7 +128,8 @@
 
   FOR_EACH_EDGE (e, ei, bb->preds)
     {
-      if (!flow_bb_inside_loop_p (loop, e->src))
+      if (!flow_bb_inside_loop_p (loop, e->src)
+	  && (!rename_from_outer_loop || e->src != outer_loop->header))
 	continue;
       for (gphi_iterator gsi = gsi_start_phis (bb); !gsi_end_p (gsi);
 	   gsi_next (&gsi))
@@ -775,6 +785,7 @@
   bool was_imm_dom;
   basic_block exit_dest;
   edge exit, new_exit;
+  bool duplicate_outer_loop = false;
 
   exit = single_exit (loop);
   at_exit = (e == exit);
@@ -786,7 +797,9 @@
 
   bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
   get_loop_body_with_size (scalar_loop, bbs, scalar_loop->num_nodes);
-
+  /* Allow duplication of outer loops.  */
+  if (scalar_loop->inner)
+    duplicate_outer_loop = true;
   /* Check whether duplication is possible.  */
   if (!can_copy_bbs_p (bbs, scalar_loop->num_nodes))
     {
@@ -855,7 +868,7 @@
       redirect_edge_and_branch_force (e, new_preheader);
       flush_pending_stmts (e);
       set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src);
-      if (was_imm_dom)
+      if (was_imm_dom || duplicate_outer_loop)
 	set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit->src);
 
       /* And remove the non-necessary forwarder again.  Keep the other
@@ -898,7 +911,7 @@
     }
 
   for (unsigned i = 0; i < scalar_loop->num_nodes + 1; i++)
-    rename_variables_in_bb (new_bbs[i]);
+    rename_variables_in_bb (new_bbs[i], duplicate_outer_loop);
 
   if (scalar_loop != loop)
     {
@@ -985,7 +998,9 @@
    (3) it is single entry, single exit
    (4) its exit condition is the last stmt in the header
    (5) E is the entry/exit edge of LOOP.
- */
+   Allow duplication of outer loops if it is consists of exactly 5 basic
+   blocks.
+   Other conditions are taken above.  */
 
 bool
 slpeel_can_duplicate_loop_p (const struct loop *loop, const_edge e)
@@ -995,6 +1010,12 @@
   gcond *orig_cond = get_loop_exit_condition (loop);
   gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
 
+  if (loop->inner && loop->num_nodes == 5
+      && empty_block_p (loop->latch)
+      && single_exit (loop) && (e == exit_e || e == entry_e)
+      && orig_cond && orig_cond == gsi_stmt (loop_exit_gsi))
+    return true;
+
   if (loop->inner
       /* All loops have an outer scope; the only case loop->outer is NULL is for
          the function itself.  */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Yet another simple fix to enhance outer-loop vectorization.
  2015-06-16 14:39   ` Yuri Rumyantsev
@ 2015-06-17 12:28     ` Richard Biener
  2015-06-17 17:13       ` Yuri Rumyantsev
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Biener @ 2015-06-17 12:28 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin

On Tue, Jun 16, 2015 at 4:12 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Thanks a lot Richard for your review.
>
> I presented updated patch which is not gated by force_vectorize.
> I added test on outer-loop in vect_enhance_data_refs_alignment
> and it returns false for it because we can not improve dr alighment
> through outer-loop peeling in general. So I assume that only
> versioning for alignment can be applied for targets do not support
> unaligned memory access.

@@ -998,7 +998,12 @@
   gimple stmt = DR_STMT (dr);
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);

+  /* Peeling of outer loops can't improve alignment.  */
+  if (loop_vinfo && LOOP_VINFO_LOOP (loop_vinfo)->inner)
+    return false;
+

but this looks wrong.  It depends on the context (DRs in the outer loop
can improve alignment by peeling the outer loop and we can still
peel the inner loop for alignment).

So IMHO the correct place to amend is vect_enhance_data_refs_alignment
(which it seems doesn't consider peeling the inner loop).

I'd say you should simply add

     || loop->inner)

to the

  /* Check if we can possibly peel the loop.  */
  if (!vect_can_advance_ivs_p (loop_vinfo)
      || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)))
    do_peeling = false;

check.

> I did not change tests for outer loops in slpeel_can_duplicate_loop_p
> as you proposed since it is not called outside vectorization.

There is still no reason for this complex condition, so please remove it.

_Please_ also generate diffs with -p, it is very tedious to see patch hunks
without a function name context.

You didn't explain why you needed the renaming changes as I don't
remember needing it when using the code from loop distribution.

> I also noticed one not-resolved issue with outer-loop peeling - we don't
> consider remainder for possible vectorization of inner-loop as we can see
> on the following example:
>
>   for (i = 0; i < n; i++) {
>     diff = 0;
>     for (j = 0; j < M; j++) {
>       diff += in[j+i]*coeff[j];
>     }
>     out[i] = diff;
>   }
>
> Is it worth to fix it?

You mean vectorizing the inner loop in the niter peel epilogue loop
of the outer loop?  Possibly yes.

Thanks,
Richard.

> 2015-06-16  Yuri Rumyantsev  <ysrumyan@gmail.com>
>
> * tree-vect-loop-manip.c (rename_variables_in_bb): Add argument
> to allow renaming of PHI arguments on edges incoming from outer
> loop header, add corresponding check before start PHI iterator.
> (slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool
> variable DUPLICATE_OUTER_LOOP and set it to true for outer loops
> with true force_vectorize.  Set-up dominator for outer loop too.
> Pass DUPLICATE_OUTER_LOOP as argument to rename_variables_in_bb.
> (slpeel_can_duplicate_loop_p): Allow duplicate of outer loop if it
> was marked with force_vectorize and has restricted cfg.
> * tree-vect-data-refs.c (vector_alignment_reachable_p): Alignment can
> not be reachable for outer loops.
> (vect_enhance_data_refs_alignment): Add test on true value of
> do_peeling.
>
> gcc/testsuite/ChangeLog:
> * gcc.dg/vect/vect-outer-simd-2.c: New test.
>
> 2015-06-09 16:26 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Mon, Jun 8, 2015 at 12:27 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Hi All,
>>>
>>> Here is a simple fix which allows duplication of outer loops to
>>> perform peeling for number of iterations if outer loop is marked with
>>> pragma omp simd.
>>>
>>> Bootstrap and regression testing did not show any new failures.
>>> Is it OK for trunk?
>>
>> Hmm, I don't remember needing to adjust rename_variables_in_bb
>> when teaching loop distibution to call slpeel_tree_duplicate_to_edge_cfg
>> on non-innermost loops...  (I just copied, I never called
>> slpeel_can_duplicate_loop_p though).
>>
>> So - you should just remove the loop->inner condition from
>> slpeel_can_duplicate_loop_p as it is used by non-vectorizer
>> code as well (yeah, I never merged the nested loop support
>> for loop distribution...).
>>
>> Index: tree-vect-loop.c
>> ===================================================================
>> --- tree-vect-loop.c    (revision 224100)
>> +++ tree-vect-loop.c    (working copy)
>> @@ -1879,6 +1879,10 @@
>>        return false;
>>      }
>>
>> +  /* Peeling for alignment is not supported for outer-loop vectorization.  */
>> +  if (LOOP_VINFO_LOOP (loop_vinfo)->inner)
>> +    LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) = 0;
>>
>> I think you can't simply do this - if vect_enhance_data_refs_alignment
>> decided to peel for alignment then it has adjusted the DRs alignment
>> info already.  So instead of the above simply disallow peeling for
>> alignment in vect_enhance_data_refs_alignment?  Thus add
>> || ->inner to
>>
>>   /* Check if we can possibly peel the loop.  */
>>   if (!vect_can_advance_ivs_p (loop_vinfo)
>>       || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)))
>>     do_peeling = false;
>>
>> ?
>>
>> I also can't see why the improvement has to be gated on force_vect,
>> it surely looks profitable to enable more outer loop vectorization in
>> general, no?
>>
>> How do the cost model calculations end up with peeling the outer loop
>> for niter?
>>
>> On targets which don't support unaligned accesses we're left with
>> versioning for alignment.  Isn't peeling for alignment better there?
>> Thus only disallow peeling for alignment if there is no unhandled
>> alignment?
>>
>> Thanks,
>> Richard.
>>
>>> ChangeLog:
>>>
>>> 2015-06-08  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>
>>> * tree-vect-loop-manip.c (rename_variables_in_bb): Add argument
>>> to allow renaming of PHI arguments on edges incoming from outer
>>> loop header, add corresponding check before start PHI iterator.
>>> (slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool
>>> variable DUPLICATE_OUTER_LOOP and set it to true for outer loops
>>> with true force_vectorize.  Set-up dominator for outer loop too.
>>> Pass DUPLICATE_OUTER_LOOP as argument to rename_variables_in_bb.
>>> (slpeel_can_duplicate_loop_p): Allow duplicate of outer loop if it
>>> was marked with force_vectorize and has restricted cfg.
>>> * tre-vect-loop.c (vect_analyze_loop_2): Prohibit alignment peeling
>>> for outer loops.
>>>
>>> gcc/testsuite/ChangeLog:
>>> * gcc.dg/vect/vect-outer-simd-2.c: New test.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Yet another simple fix to enhance outer-loop vectorization.
  2015-06-17 12:28     ` Richard Biener
@ 2015-06-17 17:13       ` Yuri Rumyantsev
  2015-06-29 16:53         ` Yuri Rumyantsev
  2015-07-14 11:00         ` Richard Biener
  0 siblings, 2 replies; 8+ messages in thread
From: Yuri Rumyantsev @ 2015-06-17 17:13 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin

[-- Attachment #1: Type: text/plain, Size: 7882 bytes --]

Richard,

I attached updated patch.
You asked me to explain why I did changes for renaming.
If we do not change PHI arguments for inner loop header we can get the
following IR:
source outer loop:
  <bb 5>: outer-loop header
  # i_45 = PHI <0(4), i_18(9)>
  # .MEM_17 = PHI <.MEM_4(D)(4), .MEM_44(9)>

  <bb 6>:inner-loop header
  # .MEM_46 = PHI <.MEM_44(7), .MEM_17(5)>

after duplication we have (without argument correction):
  <bb 12>:
  # i_74 = PHI <i_64(13), 0(17)>
  # .MEM_73 = PHI <.MEM_97(13), .MEM_4(D)(17)>

  <bb 15>:
  # .MEM_63 = PHI <.MEM_17(12), .MEM_97(16)>

and later we get verifier error:
test1.c:20:6: error: definition in block 6 does not dominate use in block 10
 void foo (int n)
      ^
for SSA_NAME: .MEM_17 in statement:
.MEM_63 = PHI <.MEM_17(10), .MEM_97(14)>

and you can see that we need to rename MEM_17 argument for out-coming
edge to MEM_73 since
MEM_17 was converted to MEM_73 in outer-loop header.

This explains my simple fix in rename_variables_in_bb.
Note also that loop distribution is not performed for outer loops.

I also did a change in slpeel_can_duplicate_loop_p to simplify check.

Any comments will be appreciated.

Yuri.

2015-06-17 15:24 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Tue, Jun 16, 2015 at 4:12 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Thanks a lot Richard for your review.
>>
>> I presented updated patch which is not gated by force_vectorize.
>> I added test on outer-loop in vect_enhance_data_refs_alignment
>> and it returns false for it because we can not improve dr alighment
>> through outer-loop peeling in general. So I assume that only
>> versioning for alignment can be applied for targets do not support
>> unaligned memory access.
>
> @@ -998,7 +998,12 @@
>    gimple stmt = DR_STMT (dr);
>    stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>    tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> +  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>
> +  /* Peeling of outer loops can't improve alignment.  */
> +  if (loop_vinfo && LOOP_VINFO_LOOP (loop_vinfo)->inner)
> +    return false;
> +
>
> but this looks wrong.  It depends on the context (DRs in the outer loop
> can improve alignment by peeling the outer loop and we can still
> peel the inner loop for alignment).
>
> So IMHO the correct place to amend is vect_enhance_data_refs_alignment
> (which it seems doesn't consider peeling the inner loop).
>
> I'd say you should simply add
>
>      || loop->inner)
>
> to the
>
>   /* Check if we can possibly peel the loop.  */
>   if (!vect_can_advance_ivs_p (loop_vinfo)
>       || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)))
>     do_peeling = false;
>
> check.
>
>> I did not change tests for outer loops in slpeel_can_duplicate_loop_p
>> as you proposed since it is not called outside vectorization.
>
> There is still no reason for this complex condition, so please remove it.
>
> _Please_ also generate diffs with -p, it is very tedious to see patch hunks
> without a function name context.
>
> You didn't explain why you needed the renaming changes as I don't
> remember needing it when using the code from loop distribution.
>
>> I also noticed one not-resolved issue with outer-loop peeling - we don't
>> consider remainder for possible vectorization of inner-loop as we can see
>> on the following example:
>>
>>   for (i = 0; i < n; i++) {
>>     diff = 0;
>>     for (j = 0; j < M; j++) {
>>       diff += in[j+i]*coeff[j];
>>     }
>>     out[i] = diff;
>>   }
>>
>> Is it worth to fix it?
>
> You mean vectorizing the inner loop in the niter peel epilogue loop
> of the outer loop?  Possibly yes.
>
> Thanks,
> Richard.
>
>> 2015-06-16  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>
>> * tree-vect-loop-manip.c (rename_variables_in_bb): Add argument
>> to allow renaming of PHI arguments on edges incoming from outer
>> loop header, add corresponding check before start PHI iterator.
>> (slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool
>> variable DUPLICATE_OUTER_LOOP and set it to true for outer loops
>> with true force_vectorize.  Set-up dominator for outer loop too.
>> Pass DUPLICATE_OUTER_LOOP as argument to rename_variables_in_bb.
>> (slpeel_can_duplicate_loop_p): Allow duplicate of outer loop if it
>> was marked with force_vectorize and has restricted cfg.
>> * tree-vect-data-refs.c (vector_alignment_reachable_p): Alignment can
>> not be reachable for outer loops.
>> (vect_enhance_data_refs_alignment): Add test on true value of
>> do_peeling.
>>
>> gcc/testsuite/ChangeLog:
>> * gcc.dg/vect/vect-outer-simd-2.c: New test.
>>
>> 2015-06-09 16:26 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Mon, Jun 8, 2015 at 12:27 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Hi All,
>>>>
>>>> Here is a simple fix which allows duplication of outer loops to
>>>> perform peeling for number of iterations if outer loop is marked with
>>>> pragma omp simd.
>>>>
>>>> Bootstrap and regression testing did not show any new failures.
>>>> Is it OK for trunk?
>>>
>>> Hmm, I don't remember needing to adjust rename_variables_in_bb
>>> when teaching loop distibution to call slpeel_tree_duplicate_to_edge_cfg
>>> on non-innermost loops...  (I just copied, I never called
>>> slpeel_can_duplicate_loop_p though).
>>>
>>> So - you should just remove the loop->inner condition from
>>> slpeel_can_duplicate_loop_p as it is used by non-vectorizer
>>> code as well (yeah, I never merged the nested loop support
>>> for loop distribution...).
>>>
>>> Index: tree-vect-loop.c
>>> ===================================================================
>>> --- tree-vect-loop.c    (revision 224100)
>>> +++ tree-vect-loop.c    (working copy)
>>> @@ -1879,6 +1879,10 @@
>>>        return false;
>>>      }
>>>
>>> +  /* Peeling for alignment is not supported for outer-loop vectorization.  */
>>> +  if (LOOP_VINFO_LOOP (loop_vinfo)->inner)
>>> +    LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) = 0;
>>>
>>> I think you can't simply do this - if vect_enhance_data_refs_alignment
>>> decided to peel for alignment then it has adjusted the DRs alignment
>>> info already.  So instead of the above simply disallow peeling for
>>> alignment in vect_enhance_data_refs_alignment?  Thus add
>>> || ->inner to
>>>
>>>   /* Check if we can possibly peel the loop.  */
>>>   if (!vect_can_advance_ivs_p (loop_vinfo)
>>>       || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)))
>>>     do_peeling = false;
>>>
>>> ?
>>>
>>> I also can't see why the improvement has to be gated on force_vect,
>>> it surely looks profitable to enable more outer loop vectorization in
>>> general, no?
>>>
>>> How do the cost model calculations end up with peeling the outer loop
>>> for niter?
>>>
>>> On targets which don't support unaligned accesses we're left with
>>> versioning for alignment.  Isn't peeling for alignment better there?
>>> Thus only disallow peeling for alignment if there is no unhandled
>>> alignment?
>>>
>>> Thanks,
>>> Richard.
>>>
>>>> ChangeLog:
>>>>
>>>> 2015-06-08  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>
>>>> * tree-vect-loop-manip.c (rename_variables_in_bb): Add argument
>>>> to allow renaming of PHI arguments on edges incoming from outer
>>>> loop header, add corresponding check before start PHI iterator.
>>>> (slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool
>>>> variable DUPLICATE_OUTER_LOOP and set it to true for outer loops
>>>> with true force_vectorize.  Set-up dominator for outer loop too.
>>>> Pass DUPLICATE_OUTER_LOOP as argument to rename_variables_in_bb.
>>>> (slpeel_can_duplicate_loop_p): Allow duplicate of outer loop if it
>>>> was marked with force_vectorize and has restricted cfg.
>>>> * tre-vect-loop.c (vect_analyze_loop_2): Prohibit alignment peeling
>>>> for outer loops.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>> * gcc.dg/vect/vect-outer-simd-2.c: New test.

[-- Attachment #2: patch.1.3 --]
[-- Type: application/octet-stream, Size: 6622 bytes --]

diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-simd-2.c b/gcc/testsuite/gcc.dg/vect/vect-outer-simd-2.c
new file mode 100644
index 0000000..3ae1020
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-simd-2.c
@@ -0,0 +1,75 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -ffast-math" } */
+#include <stdlib.h>
+#include "tree-vect.h"
+#define N 64
+
+float *px, *py;
+float *tx, *ty;
+float *x1, *z1, *t1, *t2;
+
+static void inline bar (const float cx, float cy,
+                         float *vx, float *vy)
+{
+  int j;
+    for (j = 0; j < N; ++j)
+    {
+        const float dx  = cx - px[j];
+        const float dy  = cy - py[j];
+        *vx               -= dx * tx[j];
+        *vy               -= dy * ty[j];
+    }
+}
+
+__attribute__((noinline, noclone)) void foo1 (int n)
+{
+  int i;
+#pragma omp simd
+  for (i=0; i<n; i++)
+    bar (px[i], py[i], x1+i, z1+i);
+}
+
+__attribute__((noinline, noclone)) void foo2 (int n)
+{
+  volatile int i;
+  for (i=0; i<n; i++)
+    bar (px[i], py[i], x1+i, z1+i);
+}
+
+
+int main ()
+{
+  float *X = (float*)malloc (N * 8 * sizeof (float));
+  int i;
+  int n = N - 1;
+  check_vect ();
+  px = &X[0];
+  py = &X[N * 1];
+  tx = &X[N * 2];
+  ty = &X[N * 3];
+  x1 = &X[N * 4];
+  z1 = &X[N * 5];
+  t1 = &X[N * 6];
+  t2 = &X[N * 7];
+
+  for (i=0; i<N; i++)
+    {
+      px[i] = (float) (i+2);
+      tx[i] = (float) (i+1);
+      py[i] = (float) (i+4);
+      ty[i] = (float) (i+3);
+      x1[i] = z1[i] = 1.0f;
+    }
+  foo1 (n);  /* vector variant.  */
+  for (i=0; i<N;i++)
+    {
+      t1[i] = x1[i]; x1[i] = 1.0f;
+      t2[i] = z1[i]; z1[i] = 1.0f;
+    }
+  foo2 (n);  /* scalar variant.  */
+  for (i=0; i<N; i++)
+    if (x1[i] != t1[i] || z1[i] != t2[i])
+      abort ();
+  return 0;
+}
+/* { dg-final { scan-tree-dump "OUTER LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 3fc1226..f6d7874 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -1525,7 +1525,8 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
 
   /* Check if we can possibly peel the loop.  */
   if (!vect_can_advance_ivs_p (loop_vinfo)
-      || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)))
+      || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
+      || loop->inner)
     do_peeling = false;
 
   if (do_peeling
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 790cc98..414e357 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -88,10 +88,12 @@ rename_use_op (use_operand_p op_p)
 }
 
 
-/* Renames the variables in basic block BB.  */
+/* Renames the variables in basic block BB.  Allow renaming  of PHI argumnets
+   on edges incoming from outer-block header if RENAME_FROM_OUTER_LOOP is
+   true.  */
 
 static void
-rename_variables_in_bb (basic_block bb)
+rename_variables_in_bb (basic_block bb, bool rename_from_outer_loop)
 {
   gimple stmt;
   use_operand_p use_p;
@@ -99,6 +101,13 @@ rename_variables_in_bb (basic_block bb)
   edge e;
   edge_iterator ei;
   struct loop *loop = bb->loop_father;
+  struct loop *outer_loop = NULL;
+
+  if (rename_from_outer_loop)
+    {
+      gcc_assert (loop);
+      outer_loop = loop_outer (loop);
+    }
 
   for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
        gsi_next (&gsi))
@@ -110,7 +119,8 @@ rename_variables_in_bb (basic_block bb)
 
   FOR_EACH_EDGE (e, ei, bb->preds)
     {
-      if (!flow_bb_inside_loop_p (loop, e->src))
+      if (!flow_bb_inside_loop_p (loop, e->src)
+	  && (!rename_from_outer_loop || e->src != outer_loop->header))
 	continue;
       for (gphi_iterator gsi = gsi_start_phis (bb); !gsi_end_p (gsi);
 	   gsi_next (&gsi))
@@ -766,6 +776,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *loop,
   bool was_imm_dom;
   basic_block exit_dest;
   edge exit, new_exit;
+  bool duplicate_outer_loop = false;
 
   exit = single_exit (loop);
   at_exit = (e == exit);
@@ -777,7 +788,9 @@ slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *loop,
 
   bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
   get_loop_body_with_size (scalar_loop, bbs, scalar_loop->num_nodes);
-
+  /* Allow duplication of outer loops.  */
+  if (scalar_loop->inner)
+    duplicate_outer_loop = true;
   /* Check whether duplication is possible.  */
   if (!can_copy_bbs_p (bbs, scalar_loop->num_nodes))
     {
@@ -846,7 +859,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *loop,
       redirect_edge_and_branch_force (e, new_preheader);
       flush_pending_stmts (e);
       set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src);
-      if (was_imm_dom)
+      if (was_imm_dom || duplicate_outer_loop)
 	set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit->src);
 
       /* And remove the non-necessary forwarder again.  Keep the other
@@ -889,7 +902,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *loop,
     }
 
   for (unsigned i = 0; i < scalar_loop->num_nodes + 1; i++)
-    rename_variables_in_bb (new_bbs[i]);
+    rename_variables_in_bb (new_bbs[i], duplicate_outer_loop);
 
   if (scalar_loop != loop)
     {
@@ -971,11 +984,11 @@ slpeel_add_loop_guard (basic_block guard_bb, tree cond,
 
 
 /* This function verifies that the following restrictions apply to LOOP:
-   (1) it is innermost
-   (2) it consists of exactly 2 basic blocks - header, and an empty latch.
-   (3) it is single entry, single exit
-   (4) its exit condition is the last stmt in the header
-   (5) E is the entry/exit edge of LOOP.
+   (1) it consists of exactly 2 basic blocks - header, and an empty latch
+       for innermost loop and 5 basic blocks for outer-loop.
+   (2) it is single entry, single exit
+   (3) its exit condition is the last stmt in the header
+   (4) E is the entry/exit edge of LOOP.
  */
 
 bool
@@ -985,12 +998,12 @@ slpeel_can_duplicate_loop_p (const struct loop *loop, const_edge e)
   edge entry_e = loop_preheader_edge (loop);
   gcond *orig_cond = get_loop_exit_condition (loop);
   gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
+  unsigned int num_bb = loop->inner? 5 : 2;
 
-  if (loop->inner
       /* All loops have an outer scope; the only case loop->outer is NULL is for
          the function itself.  */
-      || !loop_outer (loop)
-      || loop->num_nodes != 2
+      if (!loop_outer (loop)
+      || loop->num_nodes != num_bb
       || !empty_block_p (loop->latch)
       || !single_exit (loop)
       /* Verify that new loop exit condition can be trivially modified.  */

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Yet another simple fix to enhance outer-loop vectorization.
  2015-06-17 17:13       ` Yuri Rumyantsev
@ 2015-06-29 16:53         ` Yuri Rumyantsev
  2015-07-14 11:00           ` Richard Biener
  2015-07-14 11:00         ` Richard Biener
  1 sibling, 1 reply; 8+ messages in thread
From: Yuri Rumyantsev @ 2015-06-29 16:53 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin

[-- Attachment #1: Type: text/plain, Size: 9214 bytes --]

Hi All,

Here is updated patch containing missed change in
slpeel_tree_peel_loop_to_edge which prevents renaming of exit PHI uses
in inner loop.

ChangeLog:
2015-06-29  Yuri Rumyantsev  <ysrumyan@gmail.com>

* tree-vect-loop-manip.c (rename_variables_in_bb): Add argument
to allow renaming of PHI arguments on edges incoming from outer
loop header, add corresponding check before start PHI iterator.
(slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool
variable DUPLICATE_OUTER_LOOP and set it to true for outer loops
with true force_vectorize.  Set-up dominator for outer loop too.
Pass DUPLICATE_OUTER_LOOP as argument to rename_variables_in_bb.
(slpeel_can_duplicate_loop_p): Allow duplicate of outer loop if it
was marked with force_vectorize and has restricted cfg.
(slpeel_tree_peel_loop_to_edge): Do not rename exit PHI uses in
inner loop.
* tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Do not
do peeling for outer loops.

gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-outer-simd-2.c: New test.

2015-06-17 19:35 GMT+03:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
> Richard,
>
> I attached updated patch.
> You asked me to explain why I did changes for renaming.
> If we do not change PHI arguments for inner loop header we can get the
> following IR:
> source outer loop:
>   <bb 5>: outer-loop header
>   # i_45 = PHI <0(4), i_18(9)>
>   # .MEM_17 = PHI <.MEM_4(D)(4), .MEM_44(9)>
>
>   <bb 6>:inner-loop header
>   # .MEM_46 = PHI <.MEM_44(7), .MEM_17(5)>
>
> after duplication we have (without argument correction):
>   <bb 12>:
>   # i_74 = PHI <i_64(13), 0(17)>
>   # .MEM_73 = PHI <.MEM_97(13), .MEM_4(D)(17)>
>
>   <bb 15>:
>   # .MEM_63 = PHI <.MEM_17(12), .MEM_97(16)>
>
> and later we get verifier error:
> test1.c:20:6: error: definition in block 6 does not dominate use in block 10
>  void foo (int n)
>       ^
> for SSA_NAME: .MEM_17 in statement:
> .MEM_63 = PHI <.MEM_17(10), .MEM_97(14)>
>
> and you can see that we need to rename MEM_17 argument for out-coming
> edge to MEM_73 since
> MEM_17 was converted to MEM_73 in outer-loop header.
>
> This explains my simple fix in rename_variables_in_bb.
> Note also that loop distribution is not performed for outer loops.
>
> I also did a change in slpeel_can_duplicate_loop_p to simplify check.
>
> Any comments will be appreciated.
>
> Yuri.
>
> 2015-06-17 15:24 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Tue, Jun 16, 2015 at 4:12 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Thanks a lot Richard for your review.
>>>
>>> I presented updated patch which is not gated by force_vectorize.
>>> I added test on outer-loop in vect_enhance_data_refs_alignment
>>> and it returns false for it because we can not improve dr alighment
>>> through outer-loop peeling in general. So I assume that only
>>> versioning for alignment can be applied for targets do not support
>>> unaligned memory access.
>>
>> @@ -998,7 +998,12 @@
>>    gimple stmt = DR_STMT (dr);
>>    stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>>    tree vectype = STMT_VINFO_VECTYPE (stmt_info);
>> +  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>>
>> +  /* Peeling of outer loops can't improve alignment.  */
>> +  if (loop_vinfo && LOOP_VINFO_LOOP (loop_vinfo)->inner)
>> +    return false;
>> +
>>
>> but this looks wrong.  It depends on the context (DRs in the outer loop
>> can improve alignment by peeling the outer loop and we can still
>> peel the inner loop for alignment).
>>
>> So IMHO the correct place to amend is vect_enhance_data_refs_alignment
>> (which it seems doesn't consider peeling the inner loop).
>>
>> I'd say you should simply add
>>
>>      || loop->inner)
>>
>> to the
>>
>>   /* Check if we can possibly peel the loop.  */
>>   if (!vect_can_advance_ivs_p (loop_vinfo)
>>       || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)))
>>     do_peeling = false;
>>
>> check.
>>
>>> I did not change tests for outer loops in slpeel_can_duplicate_loop_p
>>> as you proposed since it is not called outside vectorization.
>>
>> There is still no reason for this complex condition, so please remove it.
>>
>> _Please_ also generate diffs with -p, it is very tedious to see patch hunks
>> without a function name context.
>>
>> You didn't explain why you needed the renaming changes as I don't
>> remember needing it when using the code from loop distribution.
>>
>>> I also noticed one not-resolved issue with outer-loop peeling - we don't
>>> consider remainder for possible vectorization of inner-loop as we can see
>>> on the following example:
>>>
>>>   for (i = 0; i < n; i++) {
>>>     diff = 0;
>>>     for (j = 0; j < M; j++) {
>>>       diff += in[j+i]*coeff[j];
>>>     }
>>>     out[i] = diff;
>>>   }
>>>
>>> Is it worth to fix it?
>>
>> You mean vectorizing the inner loop in the niter peel epilogue loop
>> of the outer loop?  Possibly yes.
>>
>> Thanks,
>> Richard.
>>
>>> 2015-06-16  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>
>>> * tree-vect-loop-manip.c (rename_variables_in_bb): Add argument
>>> to allow renaming of PHI arguments on edges incoming from outer
>>> loop header, add corresponding check before start PHI iterator.
>>> (slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool
>>> variable DUPLICATE_OUTER_LOOP and set it to true for outer loops
>>> with true force_vectorize.  Set-up dominator for outer loop too.
>>> Pass DUPLICATE_OUTER_LOOP as argument to rename_variables_in_bb.
>>> (slpeel_can_duplicate_loop_p): Allow duplicate of outer loop if it
>>> was marked with force_vectorize and has restricted cfg.
>>> * tree-vect-data-refs.c (vector_alignment_reachable_p): Alignment can
>>> not be reachable for outer loops.
>>> (vect_enhance_data_refs_alignment): Add test on true value of
>>> do_peeling.
>>>
>>> gcc/testsuite/ChangeLog:
>>> * gcc.dg/vect/vect-outer-simd-2.c: New test.
>>>
>>> 2015-06-09 16:26 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Mon, Jun 8, 2015 at 12:27 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Hi All,
>>>>>
>>>>> Here is a simple fix which allows duplication of outer loops to
>>>>> perform peeling for number of iterations if outer loop is marked with
>>>>> pragma omp simd.
>>>>>
>>>>> Bootstrap and regression testing did not show any new failures.
>>>>> Is it OK for trunk?
>>>>
>>>> Hmm, I don't remember needing to adjust rename_variables_in_bb
>>>> when teaching loop distibution to call slpeel_tree_duplicate_to_edge_cfg
>>>> on non-innermost loops...  (I just copied, I never called
>>>> slpeel_can_duplicate_loop_p though).
>>>>
>>>> So - you should just remove the loop->inner condition from
>>>> slpeel_can_duplicate_loop_p as it is used by non-vectorizer
>>>> code as well (yeah, I never merged the nested loop support
>>>> for loop distribution...).
>>>>
>>>> Index: tree-vect-loop.c
>>>> ===================================================================
>>>> --- tree-vect-loop.c    (revision 224100)
>>>> +++ tree-vect-loop.c    (working copy)
>>>> @@ -1879,6 +1879,10 @@
>>>>        return false;
>>>>      }
>>>>
>>>> +  /* Peeling for alignment is not supported for outer-loop vectorization.  */
>>>> +  if (LOOP_VINFO_LOOP (loop_vinfo)->inner)
>>>> +    LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) = 0;
>>>>
>>>> I think you can't simply do this - if vect_enhance_data_refs_alignment
>>>> decided to peel for alignment then it has adjusted the DRs alignment
>>>> info already.  So instead of the above simply disallow peeling for
>>>> alignment in vect_enhance_data_refs_alignment?  Thus add
>>>> || ->inner to
>>>>
>>>>   /* Check if we can possibly peel the loop.  */
>>>>   if (!vect_can_advance_ivs_p (loop_vinfo)
>>>>       || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)))
>>>>     do_peeling = false;
>>>>
>>>> ?
>>>>
>>>> I also can't see why the improvement has to be gated on force_vect,
>>>> it surely looks profitable to enable more outer loop vectorization in
>>>> general, no?
>>>>
>>>> How do the cost model calculations end up with peeling the outer loop
>>>> for niter?
>>>>
>>>> On targets which don't support unaligned accesses we're left with
>>>> versioning for alignment.  Isn't peeling for alignment better there?
>>>> Thus only disallow peeling for alignment if there is no unhandled
>>>> alignment?
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>> ChangeLog:
>>>>>
>>>>> 2015-06-08  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>
>>>>> * tree-vect-loop-manip.c (rename_variables_in_bb): Add argument
>>>>> to allow renaming of PHI arguments on edges incoming from outer
>>>>> loop header, add corresponding check before start PHI iterator.
>>>>> (slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool
>>>>> variable DUPLICATE_OUTER_LOOP and set it to true for outer loops
>>>>> with true force_vectorize.  Set-up dominator for outer loop too.
>>>>> Pass DUPLICATE_OUTER_LOOP as argument to rename_variables_in_bb.
>>>>> (slpeel_can_duplicate_loop_p): Allow duplicate of outer loop if it
>>>>> was marked with force_vectorize and has restricted cfg.
>>>>> * tre-vect-loop.c (vect_analyze_loop_2): Prohibit alignment peeling
>>>>> for outer loops.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>> * gcc.dg/vect/vect-outer-simd-2.c: New test.

[-- Attachment #2: patch.1.4 --]
[-- Type: application/octet-stream, Size: 7973 bytes --]

diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-simd-2.c b/gcc/testsuite/gcc.dg/vect/vect-outer-simd-2.c
new file mode 100644
index 0000000..3ae1020
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-simd-2.c
@@ -0,0 +1,75 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -ffast-math" } */
+#include <stdlib.h>
+#include "tree-vect.h"
+#define N 64
+
+float *px, *py;
+float *tx, *ty;
+float *x1, *z1, *t1, *t2;
+
+static void inline bar (const float cx, float cy,
+                         float *vx, float *vy)
+{
+  int j;
+    for (j = 0; j < N; ++j)
+    {
+        const float dx  = cx - px[j];
+        const float dy  = cy - py[j];
+        *vx               -= dx * tx[j];
+        *vy               -= dy * ty[j];
+    }
+}
+
+__attribute__((noinline, noclone)) void foo1 (int n)
+{
+  int i;
+#pragma omp simd
+  for (i=0; i<n; i++)
+    bar (px[i], py[i], x1+i, z1+i);
+}
+
+__attribute__((noinline, noclone)) void foo2 (int n)
+{
+  volatile int i;
+  for (i=0; i<n; i++)
+    bar (px[i], py[i], x1+i, z1+i);
+}
+
+
+int main ()
+{
+  float *X = (float*)malloc (N * 8 * sizeof (float));
+  int i;
+  int n = N - 1;
+  check_vect ();
+  px = &X[0];
+  py = &X[N * 1];
+  tx = &X[N * 2];
+  ty = &X[N * 3];
+  x1 = &X[N * 4];
+  z1 = &X[N * 5];
+  t1 = &X[N * 6];
+  t2 = &X[N * 7];
+
+  for (i=0; i<N; i++)
+    {
+      px[i] = (float) (i+2);
+      tx[i] = (float) (i+1);
+      py[i] = (float) (i+4);
+      ty[i] = (float) (i+3);
+      x1[i] = z1[i] = 1.0f;
+    }
+  foo1 (n);  /* vector variant.  */
+  for (i=0; i<N;i++)
+    {
+      t1[i] = x1[i]; x1[i] = 1.0f;
+      t2[i] = z1[i]; z1[i] = 1.0f;
+    }
+  foo2 (n);  /* scalar variant.  */
+  for (i=0; i<N; i++)
+    if (x1[i] != t1[i] || z1[i] != t2[i])
+      abort ();
+  return 0;
+}
+/* { dg-final { scan-tree-dump "OUTER LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 3fc1226..f6d7874 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -1525,7 +1525,8 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
 
   /* Check if we can possibly peel the loop.  */
   if (!vect_can_advance_ivs_p (loop_vinfo)
-      || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)))
+      || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
+      || loop->inner)
     do_peeling = false;
 
   if (do_peeling
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 790cc98..bcaf81c 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -88,10 +88,12 @@ rename_use_op (use_operand_p op_p)
 }
 
 
-/* Renames the variables in basic block BB.  */
+/* Renames the variables in basic block BB.  Allow renaming  of PHI argumnets
+   on edges incoming from outer-block header if RENAME_FROM_OUTER_LOOP is
+   true.  */
 
 static void
-rename_variables_in_bb (basic_block bb)
+rename_variables_in_bb (basic_block bb, bool rename_from_outer_loop)
 {
   gimple stmt;
   use_operand_p use_p;
@@ -99,6 +101,13 @@ rename_variables_in_bb (basic_block bb)
   edge e;
   edge_iterator ei;
   struct loop *loop = bb->loop_father;
+  struct loop *outer_loop = NULL;
+
+  if (rename_from_outer_loop)
+    {
+      gcc_assert (loop);
+      outer_loop = loop_outer (loop);
+    }
 
   for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
        gsi_next (&gsi))
@@ -110,7 +119,8 @@ rename_variables_in_bb (basic_block bb)
 
   FOR_EACH_EDGE (e, ei, bb->preds)
     {
-      if (!flow_bb_inside_loop_p (loop, e->src))
+      if (!flow_bb_inside_loop_p (loop, e->src)
+	  && (!rename_from_outer_loop || e->src != outer_loop->header))
 	continue;
       for (gphi_iterator gsi = gsi_start_phis (bb); !gsi_end_p (gsi);
 	   gsi_next (&gsi))
@@ -766,6 +776,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *loop,
   bool was_imm_dom;
   basic_block exit_dest;
   edge exit, new_exit;
+  bool duplicate_outer_loop = false;
 
   exit = single_exit (loop);
   at_exit = (e == exit);
@@ -777,7 +788,9 @@ slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *loop,
 
   bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
   get_loop_body_with_size (scalar_loop, bbs, scalar_loop->num_nodes);
-
+  /* Allow duplication of outer loops.  */
+  if (scalar_loop->inner)
+    duplicate_outer_loop = true;
   /* Check whether duplication is possible.  */
   if (!can_copy_bbs_p (bbs, scalar_loop->num_nodes))
     {
@@ -846,7 +859,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *loop,
       redirect_edge_and_branch_force (e, new_preheader);
       flush_pending_stmts (e);
       set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src);
-      if (was_imm_dom)
+      if (was_imm_dom || duplicate_outer_loop)
 	set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit->src);
 
       /* And remove the non-necessary forwarder again.  Keep the other
@@ -889,7 +902,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *loop,
     }
 
   for (unsigned i = 0; i < scalar_loop->num_nodes + 1; i++)
-    rename_variables_in_bb (new_bbs[i]);
+    rename_variables_in_bb (new_bbs[i], duplicate_outer_loop);
 
   if (scalar_loop != loop)
     {
@@ -971,11 +984,11 @@ slpeel_add_loop_guard (basic_block guard_bb, tree cond,
 
 
 /* This function verifies that the following restrictions apply to LOOP:
-   (1) it is innermost
-   (2) it consists of exactly 2 basic blocks - header, and an empty latch.
-   (3) it is single entry, single exit
-   (4) its exit condition is the last stmt in the header
-   (5) E is the entry/exit edge of LOOP.
+   (1) it consists of exactly 2 basic blocks - header, and an empty latch
+       for innermost loop and 5 basic blocks for outer-loop.
+   (2) it is single entry, single exit
+   (3) its exit condition is the last stmt in the header
+   (4) E is the entry/exit edge of LOOP.
  */
 
 bool
@@ -985,12 +998,12 @@ slpeel_can_duplicate_loop_p (const struct loop *loop, const_edge e)
   edge entry_e = loop_preheader_edge (loop);
   gcond *orig_cond = get_loop_exit_condition (loop);
   gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
+  unsigned int num_bb = loop->inner? 5 : 2;
 
-  if (loop->inner
       /* All loops have an outer scope; the only case loop->outer is NULL is for
          the function itself.  */
-      || !loop_outer (loop)
-      || loop->num_nodes != 2
+      if (!loop_outer (loop)
+      || loop->num_nodes != num_bb
       || !empty_block_p (loop->latch)
       || !single_exit (loop)
       /* Verify that new loop exit condition can be trivially modified.  */
@@ -1176,6 +1189,7 @@ slpeel_tree_peel_loop_to_edge (struct loop *loop, struct loop *scalar_loop,
 			       int bound1, int bound2)
 {
   struct loop *new_loop = NULL, *first_loop, *second_loop;
+  struct loop *inner_loop = NULL;
   edge skip_e;
   tree pre_condition = NULL_TREE;
   basic_block bb_before_second_loop, bb_after_second_loop;
@@ -1196,6 +1210,9 @@ slpeel_tree_peel_loop_to_edge (struct loop *loop, struct loop *scalar_loop,
   if (!slpeel_can_duplicate_loop_p (loop, e))
     return NULL;
 
+  if (loop->inner)
+    inner_loop = loop->inner;
+
   /* We might have a queued need to update virtual SSA form.  As we
      delete the update SSA machinery below after doing a regular
      incremental SSA update during loop copying make sure we don't
@@ -1231,7 +1248,9 @@ slpeel_tree_peel_loop_to_edge (struct loop *loop, struct loop *scalar_loop,
 	    add_phi_arg (new_phi, vop, exit_e, UNKNOWN_LOCATION);
 	    gimple_phi_set_result (new_phi, new_vop);
 	    FOR_EACH_IMM_USE_STMT (stmt, imm_iter, vop)
-	      if (stmt != new_phi && gimple_bb (stmt) != loop->header)
+	      if (stmt != new_phi && gimple_bb (stmt) != loop->header
+		  /* Do not rename PHI arguments in inner-loop.  */
+		  && (!inner_loop || gimple_bb (stmt) != inner_loop->header))
 		FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter)
 		  SET_USE (use_p, new_vop);
 	  }

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Yet another simple fix to enhance outer-loop vectorization.
  2015-06-17 17:13       ` Yuri Rumyantsev
  2015-06-29 16:53         ` Yuri Rumyantsev
@ 2015-07-14 11:00         ` Richard Biener
  1 sibling, 0 replies; 8+ messages in thread
From: Richard Biener @ 2015-07-14 11:00 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin

On Wed, Jun 17, 2015 at 6:35 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Richard,
>
> I attached updated patch.
> You asked me to explain why I did changes for renaming.
> If we do not change PHI arguments for inner loop header we can get the
> following IR:
> source outer loop:
>   <bb 5>: outer-loop header
>   # i_45 = PHI <0(4), i_18(9)>
>   # .MEM_17 = PHI <.MEM_4(D)(4), .MEM_44(9)>
>
>   <bb 6>:inner-loop header
>   # .MEM_46 = PHI <.MEM_44(7), .MEM_17(5)>
>
> after duplication we have (without argument correction):
>   <bb 12>:
>   # i_74 = PHI <i_64(13), 0(17)>
>   # .MEM_73 = PHI <.MEM_97(13), .MEM_4(D)(17)>
>
>   <bb 15>:
>   # .MEM_63 = PHI <.MEM_17(12), .MEM_97(16)>
>
> and later we get verifier error:
> test1.c:20:6: error: definition in block 6 does not dominate use in block 10
>  void foo (int n)
>       ^
> for SSA_NAME: .MEM_17 in statement:
> .MEM_63 = PHI <.MEM_17(10), .MEM_97(14)>
>
> and you can see that we need to rename MEM_17 argument for out-coming
> edge to MEM_73 since
> MEM_17 was converted to MEM_73 in outer-loop header.
>
> This explains my simple fix in rename_variables_in_bb.
> Note also that loop distribution is not performed for outer loops.

Yes, I never committed the patch implementing this...  but I also now
see that loop-distribution always re-writes the virtual SSA web.

Richard.

> I also did a change in slpeel_can_duplicate_loop_p to simplify check.
>
> Any comments will be appreciated.
>
> Yuri.
>
> 2015-06-17 15:24 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>> On Tue, Jun 16, 2015 at 4:12 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>> Thanks a lot Richard for your review.
>>>
>>> I presented updated patch which is not gated by force_vectorize.
>>> I added test on outer-loop in vect_enhance_data_refs_alignment
>>> and it returns false for it because we can not improve dr alighment
>>> through outer-loop peeling in general. So I assume that only
>>> versioning for alignment can be applied for targets do not support
>>> unaligned memory access.
>>
>> @@ -998,7 +998,12 @@
>>    gimple stmt = DR_STMT (dr);
>>    stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>>    tree vectype = STMT_VINFO_VECTYPE (stmt_info);
>> +  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>>
>> +  /* Peeling of outer loops can't improve alignment.  */
>> +  if (loop_vinfo && LOOP_VINFO_LOOP (loop_vinfo)->inner)
>> +    return false;
>> +
>>
>> but this looks wrong.  It depends on the context (DRs in the outer loop
>> can improve alignment by peeling the outer loop and we can still
>> peel the inner loop for alignment).
>>
>> So IMHO the correct place to amend is vect_enhance_data_refs_alignment
>> (which it seems doesn't consider peeling the inner loop).
>>
>> I'd say you should simply add
>>
>>      || loop->inner)
>>
>> to the
>>
>>   /* Check if we can possibly peel the loop.  */
>>   if (!vect_can_advance_ivs_p (loop_vinfo)
>>       || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)))
>>     do_peeling = false;
>>
>> check.
>>
>>> I did not change tests for outer loops in slpeel_can_duplicate_loop_p
>>> as you proposed since it is not called outside vectorization.
>>
>> There is still no reason for this complex condition, so please remove it.
>>
>> _Please_ also generate diffs with -p, it is very tedious to see patch hunks
>> without a function name context.
>>
>> You didn't explain why you needed the renaming changes as I don't
>> remember needing it when using the code from loop distribution.
>>
>>> I also noticed one not-resolved issue with outer-loop peeling - we don't
>>> consider remainder for possible vectorization of inner-loop as we can see
>>> on the following example:
>>>
>>>   for (i = 0; i < n; i++) {
>>>     diff = 0;
>>>     for (j = 0; j < M; j++) {
>>>       diff += in[j+i]*coeff[j];
>>>     }
>>>     out[i] = diff;
>>>   }
>>>
>>> Is it worth to fix it?
>>
>> You mean vectorizing the inner loop in the niter peel epilogue loop
>> of the outer loop?  Possibly yes.
>>
>> Thanks,
>> Richard.
>>
>>> 2015-06-16  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>
>>> * tree-vect-loop-manip.c (rename_variables_in_bb): Add argument
>>> to allow renaming of PHI arguments on edges incoming from outer
>>> loop header, add corresponding check before start PHI iterator.
>>> (slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool
>>> variable DUPLICATE_OUTER_LOOP and set it to true for outer loops
>>> with true force_vectorize.  Set-up dominator for outer loop too.
>>> Pass DUPLICATE_OUTER_LOOP as argument to rename_variables_in_bb.
>>> (slpeel_can_duplicate_loop_p): Allow duplicate of outer loop if it
>>> was marked with force_vectorize and has restricted cfg.
>>> * tree-vect-data-refs.c (vector_alignment_reachable_p): Alignment can
>>> not be reachable for outer loops.
>>> (vect_enhance_data_refs_alignment): Add test on true value of
>>> do_peeling.
>>>
>>> gcc/testsuite/ChangeLog:
>>> * gcc.dg/vect/vect-outer-simd-2.c: New test.
>>>
>>> 2015-06-09 16:26 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>> On Mon, Jun 8, 2015 at 12:27 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>> Hi All,
>>>>>
>>>>> Here is a simple fix which allows duplication of outer loops to
>>>>> perform peeling for number of iterations if outer loop is marked with
>>>>> pragma omp simd.
>>>>>
>>>>> Bootstrap and regression testing did not show any new failures.
>>>>> Is it OK for trunk?
>>>>
>>>> Hmm, I don't remember needing to adjust rename_variables_in_bb
>>>> when teaching loop distibution to call slpeel_tree_duplicate_to_edge_cfg
>>>> on non-innermost loops...  (I just copied, I never called
>>>> slpeel_can_duplicate_loop_p though).
>>>>
>>>> So - you should just remove the loop->inner condition from
>>>> slpeel_can_duplicate_loop_p as it is used by non-vectorizer
>>>> code as well (yeah, I never merged the nested loop support
>>>> for loop distribution...).
>>>>
>>>> Index: tree-vect-loop.c
>>>> ===================================================================
>>>> --- tree-vect-loop.c    (revision 224100)
>>>> +++ tree-vect-loop.c    (working copy)
>>>> @@ -1879,6 +1879,10 @@
>>>>        return false;
>>>>      }
>>>>
>>>> +  /* Peeling for alignment is not supported for outer-loop vectorization.  */
>>>> +  if (LOOP_VINFO_LOOP (loop_vinfo)->inner)
>>>> +    LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) = 0;
>>>>
>>>> I think you can't simply do this - if vect_enhance_data_refs_alignment
>>>> decided to peel for alignment then it has adjusted the DRs alignment
>>>> info already.  So instead of the above simply disallow peeling for
>>>> alignment in vect_enhance_data_refs_alignment?  Thus add
>>>> || ->inner to
>>>>
>>>>   /* Check if we can possibly peel the loop.  */
>>>>   if (!vect_can_advance_ivs_p (loop_vinfo)
>>>>       || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)))
>>>>     do_peeling = false;
>>>>
>>>> ?
>>>>
>>>> I also can't see why the improvement has to be gated on force_vect,
>>>> it surely looks profitable to enable more outer loop vectorization in
>>>> general, no?
>>>>
>>>> How do the cost model calculations end up with peeling the outer loop
>>>> for niter?
>>>>
>>>> On targets which don't support unaligned accesses we're left with
>>>> versioning for alignment.  Isn't peeling for alignment better there?
>>>> Thus only disallow peeling for alignment if there is no unhandled
>>>> alignment?
>>>>
>>>> Thanks,
>>>> Richard.
>>>>
>>>>> ChangeLog:
>>>>>
>>>>> 2015-06-08  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>
>>>>> * tree-vect-loop-manip.c (rename_variables_in_bb): Add argument
>>>>> to allow renaming of PHI arguments on edges incoming from outer
>>>>> loop header, add corresponding check before start PHI iterator.
>>>>> (slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool
>>>>> variable DUPLICATE_OUTER_LOOP and set it to true for outer loops
>>>>> with true force_vectorize.  Set-up dominator for outer loop too.
>>>>> Pass DUPLICATE_OUTER_LOOP as argument to rename_variables_in_bb.
>>>>> (slpeel_can_duplicate_loop_p): Allow duplicate of outer loop if it
>>>>> was marked with force_vectorize and has restricted cfg.
>>>>> * tre-vect-loop.c (vect_analyze_loop_2): Prohibit alignment peeling
>>>>> for outer loops.
>>>>>
>>>>> gcc/testsuite/ChangeLog:
>>>>> * gcc.dg/vect/vect-outer-simd-2.c: New test.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] Yet another simple fix to enhance outer-loop vectorization.
  2015-06-29 16:53         ` Yuri Rumyantsev
@ 2015-07-14 11:00           ` Richard Biener
  0 siblings, 0 replies; 8+ messages in thread
From: Richard Biener @ 2015-07-14 11:00 UTC (permalink / raw)
  To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin

On Mon, Jun 29, 2015 at 6:15 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
> Hi All,
>
> Here is updated patch containing missed change in
> slpeel_tree_peel_loop_to_edge which prevents renaming of exit PHI uses
> in inner loop.

Ok.

Thanks,
Richard.

> ChangeLog:
> 2015-06-29  Yuri Rumyantsev  <ysrumyan@gmail.com>
>
> * tree-vect-loop-manip.c (rename_variables_in_bb): Add argument
> to allow renaming of PHI arguments on edges incoming from outer
> loop header, add corresponding check before start PHI iterator.
> (slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool
> variable DUPLICATE_OUTER_LOOP and set it to true for outer loops
> with true force_vectorize.  Set-up dominator for outer loop too.
> Pass DUPLICATE_OUTER_LOOP as argument to rename_variables_in_bb.
> (slpeel_can_duplicate_loop_p): Allow duplicate of outer loop if it
> was marked with force_vectorize and has restricted cfg.
> (slpeel_tree_peel_loop_to_edge): Do not rename exit PHI uses in
> inner loop.
> * tree-vect-data-refs.c (vect_enhance_data_refs_alignment): Do not
> do peeling for outer loops.
>
> gcc/testsuite/ChangeLog:
> * gcc.dg/vect/vect-outer-simd-2.c: New test.
>
> 2015-06-17 19:35 GMT+03:00 Yuri Rumyantsev <ysrumyan@gmail.com>:
>> Richard,
>>
>> I attached updated patch.
>> You asked me to explain why I did changes for renaming.
>> If we do not change PHI arguments for inner loop header we can get the
>> following IR:
>> source outer loop:
>>   <bb 5>: outer-loop header
>>   # i_45 = PHI <0(4), i_18(9)>
>>   # .MEM_17 = PHI <.MEM_4(D)(4), .MEM_44(9)>
>>
>>   <bb 6>:inner-loop header
>>   # .MEM_46 = PHI <.MEM_44(7), .MEM_17(5)>
>>
>> after duplication we have (without argument correction):
>>   <bb 12>:
>>   # i_74 = PHI <i_64(13), 0(17)>
>>   # .MEM_73 = PHI <.MEM_97(13), .MEM_4(D)(17)>
>>
>>   <bb 15>:
>>   # .MEM_63 = PHI <.MEM_17(12), .MEM_97(16)>
>>
>> and later we get verifier error:
>> test1.c:20:6: error: definition in block 6 does not dominate use in block 10
>>  void foo (int n)
>>       ^
>> for SSA_NAME: .MEM_17 in statement:
>> .MEM_63 = PHI <.MEM_17(10), .MEM_97(14)>
>>
>> and you can see that we need to rename MEM_17 argument for out-coming
>> edge to MEM_73 since
>> MEM_17 was converted to MEM_73 in outer-loop header.
>>
>> This explains my simple fix in rename_variables_in_bb.
>> Note also that loop distribution is not performed for outer loops.
>>
>> I also did a change in slpeel_can_duplicate_loop_p to simplify check.
>>
>> Any comments will be appreciated.
>>
>> Yuri.
>>
>> 2015-06-17 15:24 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Tue, Jun 16, 2015 at 4:12 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Thanks a lot Richard for your review.
>>>>
>>>> I presented updated patch which is not gated by force_vectorize.
>>>> I added test on outer-loop in vect_enhance_data_refs_alignment
>>>> and it returns false for it because we can not improve dr alighment
>>>> through outer-loop peeling in general. So I assume that only
>>>> versioning for alignment can be applied for targets do not support
>>>> unaligned memory access.
>>>
>>> @@ -998,7 +998,12 @@
>>>    gimple stmt = DR_STMT (dr);
>>>    stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>>>    tree vectype = STMT_VINFO_VECTYPE (stmt_info);
>>> +  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>>>
>>> +  /* Peeling of outer loops can't improve alignment.  */
>>> +  if (loop_vinfo && LOOP_VINFO_LOOP (loop_vinfo)->inner)
>>> +    return false;
>>> +
>>>
>>> but this looks wrong.  It depends on the context (DRs in the outer loop
>>> can improve alignment by peeling the outer loop and we can still
>>> peel the inner loop for alignment).
>>>
>>> So IMHO the correct place to amend is vect_enhance_data_refs_alignment
>>> (which it seems doesn't consider peeling the inner loop).
>>>
>>> I'd say you should simply add
>>>
>>>      || loop->inner)
>>>
>>> to the
>>>
>>>   /* Check if we can possibly peel the loop.  */
>>>   if (!vect_can_advance_ivs_p (loop_vinfo)
>>>       || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)))
>>>     do_peeling = false;
>>>
>>> check.
>>>
>>>> I did not change tests for outer loops in slpeel_can_duplicate_loop_p
>>>> as you proposed since it is not called outside vectorization.
>>>
>>> There is still no reason for this complex condition, so please remove it.
>>>
>>> _Please_ also generate diffs with -p, it is very tedious to see patch hunks
>>> without a function name context.
>>>
>>> You didn't explain why you needed the renaming changes as I don't
>>> remember needing it when using the code from loop distribution.
>>>
>>>> I also noticed one not-resolved issue with outer-loop peeling - we don't
>>>> consider remainder for possible vectorization of inner-loop as we can see
>>>> on the following example:
>>>>
>>>>   for (i = 0; i < n; i++) {
>>>>     diff = 0;
>>>>     for (j = 0; j < M; j++) {
>>>>       diff += in[j+i]*coeff[j];
>>>>     }
>>>>     out[i] = diff;
>>>>   }
>>>>
>>>> Is it worth to fix it?
>>>
>>> You mean vectorizing the inner loop in the niter peel epilogue loop
>>> of the outer loop?  Possibly yes.
>>>
>>> Thanks,
>>> Richard.
>>>
>>>> 2015-06-16  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>
>>>> * tree-vect-loop-manip.c (rename_variables_in_bb): Add argument
>>>> to allow renaming of PHI arguments on edges incoming from outer
>>>> loop header, add corresponding check before start PHI iterator.
>>>> (slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool
>>>> variable DUPLICATE_OUTER_LOOP and set it to true for outer loops
>>>> with true force_vectorize.  Set-up dominator for outer loop too.
>>>> Pass DUPLICATE_OUTER_LOOP as argument to rename_variables_in_bb.
>>>> (slpeel_can_duplicate_loop_p): Allow duplicate of outer loop if it
>>>> was marked with force_vectorize and has restricted cfg.
>>>> * tree-vect-data-refs.c (vector_alignment_reachable_p): Alignment can
>>>> not be reachable for outer loops.
>>>> (vect_enhance_data_refs_alignment): Add test on true value of
>>>> do_peeling.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>> * gcc.dg/vect/vect-outer-simd-2.c: New test.
>>>>
>>>> 2015-06-09 16:26 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>>>> On Mon, Jun 8, 2015 at 12:27 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> Here is a simple fix which allows duplication of outer loops to
>>>>>> perform peeling for number of iterations if outer loop is marked with
>>>>>> pragma omp simd.
>>>>>>
>>>>>> Bootstrap and regression testing did not show any new failures.
>>>>>> Is it OK for trunk?
>>>>>
>>>>> Hmm, I don't remember needing to adjust rename_variables_in_bb
>>>>> when teaching loop distibution to call slpeel_tree_duplicate_to_edge_cfg
>>>>> on non-innermost loops...  (I just copied, I never called
>>>>> slpeel_can_duplicate_loop_p though).
>>>>>
>>>>> So - you should just remove the loop->inner condition from
>>>>> slpeel_can_duplicate_loop_p as it is used by non-vectorizer
>>>>> code as well (yeah, I never merged the nested loop support
>>>>> for loop distribution...).
>>>>>
>>>>> Index: tree-vect-loop.c
>>>>> ===================================================================
>>>>> --- tree-vect-loop.c    (revision 224100)
>>>>> +++ tree-vect-loop.c    (working copy)
>>>>> @@ -1879,6 +1879,10 @@
>>>>>        return false;
>>>>>      }
>>>>>
>>>>> +  /* Peeling for alignment is not supported for outer-loop vectorization.  */
>>>>> +  if (LOOP_VINFO_LOOP (loop_vinfo)->inner)
>>>>> +    LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) = 0;
>>>>>
>>>>> I think you can't simply do this - if vect_enhance_data_refs_alignment
>>>>> decided to peel for alignment then it has adjusted the DRs alignment
>>>>> info already.  So instead of the above simply disallow peeling for
>>>>> alignment in vect_enhance_data_refs_alignment?  Thus add
>>>>> || ->inner to
>>>>>
>>>>>   /* Check if we can possibly peel the loop.  */
>>>>>   if (!vect_can_advance_ivs_p (loop_vinfo)
>>>>>       || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)))
>>>>>     do_peeling = false;
>>>>>
>>>>> ?
>>>>>
>>>>> I also can't see why the improvement has to be gated on force_vect,
>>>>> it surely looks profitable to enable more outer loop vectorization in
>>>>> general, no?
>>>>>
>>>>> How do the cost model calculations end up with peeling the outer loop
>>>>> for niter?
>>>>>
>>>>> On targets which don't support unaligned accesses we're left with
>>>>> versioning for alignment.  Isn't peeling for alignment better there?
>>>>> Thus only disallow peeling for alignment if there is no unhandled
>>>>> alignment?
>>>>>
>>>>> Thanks,
>>>>> Richard.
>>>>>
>>>>>> ChangeLog:
>>>>>>
>>>>>> 2015-06-08  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>>>
>>>>>> * tree-vect-loop-manip.c (rename_variables_in_bb): Add argument
>>>>>> to allow renaming of PHI arguments on edges incoming from outer
>>>>>> loop header, add corresponding check before start PHI iterator.
>>>>>> (slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool
>>>>>> variable DUPLICATE_OUTER_LOOP and set it to true for outer loops
>>>>>> with true force_vectorize.  Set-up dominator for outer loop too.
>>>>>> Pass DUPLICATE_OUTER_LOOP as argument to rename_variables_in_bb.
>>>>>> (slpeel_can_duplicate_loop_p): Allow duplicate of outer loop if it
>>>>>> was marked with force_vectorize and has restricted cfg.
>>>>>> * tre-vect-loop.c (vect_analyze_loop_2): Prohibit alignment peeling
>>>>>> for outer loops.
>>>>>>
>>>>>> gcc/testsuite/ChangeLog:
>>>>>> * gcc.dg/vect/vect-outer-simd-2.c: New test.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-07-14 11:00 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-08 10:43 [PATCH] Yet another simple fix to enhance outer-loop vectorization Yuri Rumyantsev
2015-06-09 13:28 ` Richard Biener
2015-06-16 14:39   ` Yuri Rumyantsev
2015-06-17 12:28     ` Richard Biener
2015-06-17 17:13       ` Yuri Rumyantsev
2015-06-29 16:53         ` Yuri Rumyantsev
2015-07-14 11:00           ` Richard Biener
2015-07-14 11:00         ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).