Re: [PATCH] Yet another simple fix to enhance outer-loop vectorization.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Yuri Rumyantsev <ysrumyan@gmail.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>,
	Igor Zamyatin <izamyatin@gmail.com>
Subject: Re: [PATCH] Yet another simple fix to enhance outer-loop vectorization.
Date: Wed, 17 Jun 2015 17:13:00 -0000	[thread overview]
Message-ID: <CAEoMCqRi9GJhnqkB4Yea3ENZce8pbrH6Qc53hVUBgQHCYko7Pw@mail.gmail.com> (raw)
In-Reply-To: <CAFiYyc239jQ+ZyoNHsrBMnT=u0z0DXJKDB=WvfJPbqXa_hYZQw@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 7882 bytes --]

Richard,

I attached updated patch.
You asked me to explain why I did changes for renaming.
If we do not change PHI arguments for inner loop header we can get the
following IR:
source outer loop:
  <bb 5>: outer-loop header
  # i_45 = PHI <0(4), i_18(9)>
  # .MEM_17 = PHI <.MEM_4(D)(4), .MEM_44(9)>

  <bb 6>:inner-loop header
  # .MEM_46 = PHI <.MEM_44(7), .MEM_17(5)>

after duplication we have (without argument correction):
  <bb 12>:
  # i_74 = PHI <i_64(13), 0(17)>
  # .MEM_73 = PHI <.MEM_97(13), .MEM_4(D)(17)>

  <bb 15>:
  # .MEM_63 = PHI <.MEM_17(12), .MEM_97(16)>

and later we get verifier error:
test1.c:20:6: error: definition in block 6 does not dominate use in block 10
 void foo (int n)
      ^
for SSA_NAME: .MEM_17 in statement:
.MEM_63 = PHI <.MEM_17(10), .MEM_97(14)>

and you can see that we need to rename MEM_17 argument for out-coming
edge to MEM_73 since
MEM_17 was converted to MEM_73 in outer-loop header.

This explains my simple fix in rename_variables_in_bb.
Note also that loop distribution is not performed for outer loops.

I also did a change in slpeel_can_duplicate_loop_p to simplify check.

Any comments will be appreciated.

Yuri.

2015-06-17 15:24 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Tue, Jun 16, 2015 at 4:12 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Thanks a lot Richard for your review.
>>
>> I presented updated patch which is not gated by force_vectorize.
>> I added test on outer-loop in vect_enhance_data_refs_alignment
>> and it returns false for it because we can not improve dr alighment
>> through outer-loop peeling in general. So I assume that only
>> versioning for alignment can be applied for targets do not support
>> unaligned memory access.
>
> @@ -998,7 +998,12 @@
>    gimple stmt = DR_STMT (dr);
>    stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>    tree vectype = STMT_VINFO_VECTYPE (stmt_info);
> +  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>
> +  /* Peeling of outer loops can't improve alignment.  */
> +  if (loop_vinfo && LOOP_VINFO_LOOP (loop_vinfo)->inner)
> +    return false;
> +
>
> but this looks wrong.  It depends on the context (DRs in the outer loop
> can improve alignment by peeling the outer loop and we can still
> peel the inner loop for alignment).
>
> So IMHO the correct place to amend is vect_enhance_data_refs_alignment
> (which it seems doesn't consider peeling the inner loop).
>
> I'd say you should simply add
>
>      || loop->inner)
>
> to the
>
>   /* Check if we can possibly peel the loop.  */
>   if (!vect_can_advance_ivs_p (loop_vinfo)
>       || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)))
>     do_peeling = false;
>
> check.
>
>> I did not change tests for outer loops in slpeel_can_duplicate_loop_p
>> as you proposed since it is not called outside vectorization.
>
> There is still no reason for this complex condition, so please remove it.
>
> _Please_ also generate diffs with -p, it is very tedious to see patch hunks
> without a function name context.
>
> You didn't explain why you needed the renaming changes as I don't
> remember needing it when using the code from loop distribution.
>
>> I also noticed one not-resolved issue with outer-loop peeling - we don't
>> consider remainder for possible vectorization of inner-loop as we can see
>> on the following example:
>>
>>   for (i = 0; i < n; i++) {
>>     diff = 0;
>>     for (j = 0; j < M; j++) {
>>       diff += in[j+i]*coeff[j];
>>     }
>>     out[i] = diff;
>>   }
>>
>> Is it worth to fix it?
>
> You mean vectorizing the inner loop in the niter peel epilogue loop
> of the outer loop?  Possibly yes.
>
> Thanks,
> Richard.
>
>> 2015-06-16  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>
>> * tree-vect-loop-manip.c (rename_variables_in_bb): Add argument
>> to allow renaming of PHI arguments on edges incoming from outer
>> loop header, add corresponding check before start PHI iterator.
>> (slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool
>> variable DUPLICATE_OUTER_LOOP and set it to true for outer loops
>> with true force_vectorize.  Set-up dominator for outer loop too.
>> Pass DUPLICATE_OUTER_LOOP as argument to rename_variables_in_bb.
>> (slpeel_can_duplicate_loop_p): Allow duplicate of outer loop if it
>> was marked with force_vectorize and has restricted cfg.
>> * tree-vect-data-refs.c (vector_alignment_reachable_p): Alignment can
>> not be reachable for outer loops.
>> (vect_enhance_data_refs_alignment): Add test on true value of
>> do_peeling.
>>
>> gcc/testsuite/ChangeLog:
>> * gcc.dg/vect/vect-outer-simd-2.c: New test.
>>
>> 2015-06-09 16:26 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
>>> On Mon, Jun 8, 2015 at 12:27 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>>>> Hi All,
>>>>
>>>> Here is a simple fix which allows duplication of outer loops to
>>>> perform peeling for number of iterations if outer loop is marked with
>>>> pragma omp simd.
>>>>
>>>> Bootstrap and regression testing did not show any new failures.
>>>> Is it OK for trunk?
>>>
>>> Hmm, I don't remember needing to adjust rename_variables_in_bb
>>> when teaching loop distibution to call slpeel_tree_duplicate_to_edge_cfg
>>> on non-innermost loops...  (I just copied, I never called
>>> slpeel_can_duplicate_loop_p though).
>>>
>>> So - you should just remove the loop->inner condition from
>>> slpeel_can_duplicate_loop_p as it is used by non-vectorizer
>>> code as well (yeah, I never merged the nested loop support
>>> for loop distribution...).
>>>
>>> Index: tree-vect-loop.c
>>> ===================================================================
>>> --- tree-vect-loop.c    (revision 224100)
>>> +++ tree-vect-loop.c    (working copy)
>>> @@ -1879,6 +1879,10 @@
>>>        return false;
>>>      }
>>>
>>> +  /* Peeling for alignment is not supported for outer-loop vectorization.  */
>>> +  if (LOOP_VINFO_LOOP (loop_vinfo)->inner)
>>> +    LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) = 0;
>>>
>>> I think you can't simply do this - if vect_enhance_data_refs_alignment
>>> decided to peel for alignment then it has adjusted the DRs alignment
>>> info already.  So instead of the above simply disallow peeling for
>>> alignment in vect_enhance_data_refs_alignment?  Thus add
>>> || ->inner to
>>>
>>>   /* Check if we can possibly peel the loop.  */
>>>   if (!vect_can_advance_ivs_p (loop_vinfo)
>>>       || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)))
>>>     do_peeling = false;
>>>
>>> ?
>>>
>>> I also can't see why the improvement has to be gated on force_vect,
>>> it surely looks profitable to enable more outer loop vectorization in
>>> general, no?
>>>
>>> How do the cost model calculations end up with peeling the outer loop
>>> for niter?
>>>
>>> On targets which don't support unaligned accesses we're left with
>>> versioning for alignment.  Isn't peeling for alignment better there?
>>> Thus only disallow peeling for alignment if there is no unhandled
>>> alignment?
>>>
>>> Thanks,
>>> Richard.
>>>
>>>> ChangeLog:
>>>>
>>>> 2015-06-08  Yuri Rumyantsev  <ysrumyan@gmail.com>
>>>>
>>>> * tree-vect-loop-manip.c (rename_variables_in_bb): Add argument
>>>> to allow renaming of PHI arguments on edges incoming from outer
>>>> loop header, add corresponding check before start PHI iterator.
>>>> (slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool
>>>> variable DUPLICATE_OUTER_LOOP and set it to true for outer loops
>>>> with true force_vectorize.  Set-up dominator for outer loop too.
>>>> Pass DUPLICATE_OUTER_LOOP as argument to rename_variables_in_bb.
>>>> (slpeel_can_duplicate_loop_p): Allow duplicate of outer loop if it
>>>> was marked with force_vectorize and has restricted cfg.
>>>> * tre-vect-loop.c (vect_analyze_loop_2): Prohibit alignment peeling
>>>> for outer loops.
>>>>
>>>> gcc/testsuite/ChangeLog:
>>>> * gcc.dg/vect/vect-outer-simd-2.c: New test.

[-- Attachment #2: patch.1.3 --]
[-- Type: application/octet-stream, Size: 6622 bytes --]

diff --git a/gcc/testsuite/gcc.dg/vect/vect-outer-simd-2.c b/gcc/testsuite/gcc.dg/vect/vect-outer-simd-2.c
new file mode 100644
index 0000000..3ae1020
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-outer-simd-2.c
@@ -0,0 +1,75 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -ffast-math" } */
+#include <stdlib.h>
+#include "tree-vect.h"
+#define N 64
+
+float *px, *py;
+float *tx, *ty;
+float *x1, *z1, *t1, *t2;
+
+static void inline bar (const float cx, float cy,
+                         float *vx, float *vy)
+{
+  int j;
+    for (j = 0; j < N; ++j)
+    {
+        const float dx  = cx - px[j];
+        const float dy  = cy - py[j];
+        *vx               -= dx * tx[j];
+        *vy               -= dy * ty[j];
+    }
+}
+
+__attribute__((noinline, noclone)) void foo1 (int n)
+{
+  int i;
+#pragma omp simd
+  for (i=0; i<n; i++)
+    bar (px[i], py[i], x1+i, z1+i);
+}
+
+__attribute__((noinline, noclone)) void foo2 (int n)
+{
+  volatile int i;
+  for (i=0; i<n; i++)
+    bar (px[i], py[i], x1+i, z1+i);
+}
+
+
+int main ()
+{
+  float *X = (float*)malloc (N * 8 * sizeof (float));
+  int i;
+  int n = N - 1;
+  check_vect ();
+  px = &X[0];
+  py = &X[N * 1];
+  tx = &X[N * 2];
+  ty = &X[N * 3];
+  x1 = &X[N * 4];
+  z1 = &X[N * 5];
+  t1 = &X[N * 6];
+  t2 = &X[N * 7];
+
+  for (i=0; i<N; i++)
+    {
+      px[i] = (float) (i+2);
+      tx[i] = (float) (i+1);
+      py[i] = (float) (i+4);
+      ty[i] = (float) (i+3);
+      x1[i] = z1[i] = 1.0f;
+    }
+  foo1 (n);  /* vector variant.  */
+  for (i=0; i<N;i++)
+    {
+      t1[i] = x1[i]; x1[i] = 1.0f;
+      t2[i] = z1[i]; z1[i] = 1.0f;
+    }
+  foo2 (n);  /* scalar variant.  */
+  for (i=0; i<N; i++)
+    if (x1[i] != t1[i] || z1[i] != t2[i])
+      abort ();
+  return 0;
+}
+/* { dg-final { scan-tree-dump "OUTER LOOP VECTORIZED" "vect" } } */
diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 3fc1226..f6d7874 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -1525,7 +1525,8 @@ vect_enhance_data_refs_alignment (loop_vec_info loop_vinfo)
 
   /* Check if we can possibly peel the loop.  */
   if (!vect_can_advance_ivs_p (loop_vinfo)
-      || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)))
+      || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))
+      || loop->inner)
     do_peeling = false;
 
   if (do_peeling
diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
index 790cc98..414e357 100644
--- a/gcc/tree-vect-loop-manip.c
+++ b/gcc/tree-vect-loop-manip.c
@@ -88,10 +88,12 @@ rename_use_op (use_operand_p op_p)
 }
 
 
-/* Renames the variables in basic block BB.  */
+/* Renames the variables in basic block BB.  Allow renaming  of PHI argumnets
+   on edges incoming from outer-block header if RENAME_FROM_OUTER_LOOP is
+   true.  */
 
 static void
-rename_variables_in_bb (basic_block bb)
+rename_variables_in_bb (basic_block bb, bool rename_from_outer_loop)
 {
   gimple stmt;
   use_operand_p use_p;
@@ -99,6 +101,13 @@ rename_variables_in_bb (basic_block bb)
   edge e;
   edge_iterator ei;
   struct loop *loop = bb->loop_father;
+  struct loop *outer_loop = NULL;
+
+  if (rename_from_outer_loop)
+    {
+      gcc_assert (loop);
+      outer_loop = loop_outer (loop);
+    }
 
   for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
        gsi_next (&gsi))
@@ -110,7 +119,8 @@ rename_variables_in_bb (basic_block bb)
 
   FOR_EACH_EDGE (e, ei, bb->preds)
     {
-      if (!flow_bb_inside_loop_p (loop, e->src))
+      if (!flow_bb_inside_loop_p (loop, e->src)
+	  && (!rename_from_outer_loop || e->src != outer_loop->header))
 	continue;
       for (gphi_iterator gsi = gsi_start_phis (bb); !gsi_end_p (gsi);
 	   gsi_next (&gsi))
@@ -766,6 +776,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *loop,
   bool was_imm_dom;
   basic_block exit_dest;
   edge exit, new_exit;
+  bool duplicate_outer_loop = false;
 
   exit = single_exit (loop);
   at_exit = (e == exit);
@@ -777,7 +788,9 @@ slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *loop,
 
   bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
   get_loop_body_with_size (scalar_loop, bbs, scalar_loop->num_nodes);
-
+  /* Allow duplication of outer loops.  */
+  if (scalar_loop->inner)
+    duplicate_outer_loop = true;
   /* Check whether duplication is possible.  */
   if (!can_copy_bbs_p (bbs, scalar_loop->num_nodes))
     {
@@ -846,7 +859,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *loop,
       redirect_edge_and_branch_force (e, new_preheader);
       flush_pending_stmts (e);
       set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src);
-      if (was_imm_dom)
+      if (was_imm_dom || duplicate_outer_loop)
 	set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit->src);
 
       /* And remove the non-necessary forwarder again.  Keep the other
@@ -889,7 +902,7 @@ slpeel_tree_duplicate_loop_to_edge_cfg (struct loop *loop,
     }
 
   for (unsigned i = 0; i < scalar_loop->num_nodes + 1; i++)
-    rename_variables_in_bb (new_bbs[i]);
+    rename_variables_in_bb (new_bbs[i], duplicate_outer_loop);
 
   if (scalar_loop != loop)
     {
@@ -971,11 +984,11 @@ slpeel_add_loop_guard (basic_block guard_bb, tree cond,
 
 
 /* This function verifies that the following restrictions apply to LOOP:
-   (1) it is innermost
-   (2) it consists of exactly 2 basic blocks - header, and an empty latch.
-   (3) it is single entry, single exit
-   (4) its exit condition is the last stmt in the header
-   (5) E is the entry/exit edge of LOOP.
+   (1) it consists of exactly 2 basic blocks - header, and an empty latch
+       for innermost loop and 5 basic blocks for outer-loop.
+   (2) it is single entry, single exit
+   (3) its exit condition is the last stmt in the header
+   (4) E is the entry/exit edge of LOOP.
  */
 
 bool
@@ -985,12 +998,12 @@ slpeel_can_duplicate_loop_p (const struct loop *loop, const_edge e)
   edge entry_e = loop_preheader_edge (loop);
   gcond *orig_cond = get_loop_exit_condition (loop);
   gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
+  unsigned int num_bb = loop->inner? 5 : 2;
 
-  if (loop->inner
       /* All loops have an outer scope; the only case loop->outer is NULL is for
          the function itself.  */
-      || !loop_outer (loop)
-      || loop->num_nodes != 2
+      if (!loop_outer (loop)
+      || loop->num_nodes != num_bb
       || !empty_block_p (loop->latch)
       || !single_exit (loop)
       /* Verify that new loop exit condition can be trivially modified.  */

next prev parent reply	other threads:[~2015-06-17 16:35 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-06-08 10:43 Yuri Rumyantsev
2015-06-09 13:28 ` Richard Biener
2015-06-16 14:39   ` Yuri Rumyantsev
2015-06-17 12:28     ` Richard Biener
2015-06-17 17:13       ` Yuri Rumyantsev [this message]
2015-06-29 16:53         ` Yuri Rumyantsev
2015-07-14 11:00           ` Richard Biener
2015-07-14 11:00         ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAEoMCqRi9GJhnqkB4Yea3ENZce8pbrH6Qc53hVUBgQHCYko7Pw@mail.gmail.com \
    --to=ysrumyan@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=izamyatin@gmail.com \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).