From: Yuri Rumyantsev <ysrumyan@gmail.com>
To: Richard Biener <richard.guenther@gmail.com>
Cc: gcc-patches <gcc-patches@gcc.gnu.org>,
Igor Zamyatin <izamyatin@gmail.com>
Subject: Re: [PATCH] Yet another simple fix to enhance outer-loop vectorization.
Date: Tue, 16 Jun 2015 14:39:00 -0000 [thread overview]
Message-ID: <CAEoMCqSVg4DvwFhuynQ29o4cTDwiLDbc9dR_QWewDKUvP6dHfQ@mail.gmail.com> (raw)
In-Reply-To: <CAFiYyc2366dfJBRZQgvehnpdA98Hv43n8AP4VLAjcAhDma=joA@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 4849 bytes --]
Thanks a lot Richard for your review.
I presented updated patch which is not gated by force_vectorize.
I added test on outer-loop in vect_enhance_data_refs_alignment
and it returns false for it because we can not improve dr alighment
through outer-loop peeling in general. So I assume that only
versioning for alignment can be applied for targets do not support
unaligned memory access.
I did not change tests for outer loops in slpeel_can_duplicate_loop_p
as you proposed since it is not called outside vectorization.
I also noticed one not-resolved issue with outer-loop peeling - we don't
consider remainder for possible vectorization of inner-loop as we can see
on the following example:
for (i = 0; i < n; i++) {
diff = 0;
for (j = 0; j < M; j++) {
diff += in[j+i]*coeff[j];
}
out[i] = diff;
}
Is it worth to fix it?
2015-06-16 Yuri Rumyantsev <ysrumyan@gmail.com>
* tree-vect-loop-manip.c (rename_variables_in_bb): Add argument
to allow renaming of PHI arguments on edges incoming from outer
loop header, add corresponding check before start PHI iterator.
(slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool
variable DUPLICATE_OUTER_LOOP and set it to true for outer loops
with true force_vectorize. Set-up dominator for outer loop too.
Pass DUPLICATE_OUTER_LOOP as argument to rename_variables_in_bb.
(slpeel_can_duplicate_loop_p): Allow duplicate of outer loop if it
was marked with force_vectorize and has restricted cfg.
* tree-vect-data-refs.c (vector_alignment_reachable_p): Alignment can
not be reachable for outer loops.
(vect_enhance_data_refs_alignment): Add test on true value of
do_peeling.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-outer-simd-2.c: New test.
2015-06-09 16:26 GMT+03:00 Richard Biener <richard.guenther@gmail.com>:
> On Mon, Jun 8, 2015 at 12:27 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote:
>> Hi All,
>>
>> Here is a simple fix which allows duplication of outer loops to
>> perform peeling for number of iterations if outer loop is marked with
>> pragma omp simd.
>>
>> Bootstrap and regression testing did not show any new failures.
>> Is it OK for trunk?
>
> Hmm, I don't remember needing to adjust rename_variables_in_bb
> when teaching loop distibution to call slpeel_tree_duplicate_to_edge_cfg
> on non-innermost loops... (I just copied, I never called
> slpeel_can_duplicate_loop_p though).
>
> So - you should just remove the loop->inner condition from
> slpeel_can_duplicate_loop_p as it is used by non-vectorizer
> code as well (yeah, I never merged the nested loop support
> for loop distribution...).
>
> Index: tree-vect-loop.c
> ===================================================================
> --- tree-vect-loop.c (revision 224100)
> +++ tree-vect-loop.c (working copy)
> @@ -1879,6 +1879,10 @@
> return false;
> }
>
> + /* Peeling for alignment is not supported for outer-loop vectorization. */
> + if (LOOP_VINFO_LOOP (loop_vinfo)->inner)
> + LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) = 0;
>
> I think you can't simply do this - if vect_enhance_data_refs_alignment
> decided to peel for alignment then it has adjusted the DRs alignment
> info already. So instead of the above simply disallow peeling for
> alignment in vect_enhance_data_refs_alignment? Thus add
> || ->inner to
>
> /* Check if we can possibly peel the loop. */
> if (!vect_can_advance_ivs_p (loop_vinfo)
> || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)))
> do_peeling = false;
>
> ?
>
> I also can't see why the improvement has to be gated on force_vect,
> it surely looks profitable to enable more outer loop vectorization in
> general, no?
>
> How do the cost model calculations end up with peeling the outer loop
> for niter?
>
> On targets which don't support unaligned accesses we're left with
> versioning for alignment. Isn't peeling for alignment better there?
> Thus only disallow peeling for alignment if there is no unhandled
> alignment?
>
> Thanks,
> Richard.
>
>> ChangeLog:
>>
>> 2015-06-08 Yuri Rumyantsev <ysrumyan@gmail.com>
>>
>> * tree-vect-loop-manip.c (rename_variables_in_bb): Add argument
>> to allow renaming of PHI arguments on edges incoming from outer
>> loop header, add corresponding check before start PHI iterator.
>> (slpeel_tree_duplicate_loop_to_edge_cfg): Introduce new bool
>> variable DUPLICATE_OUTER_LOOP and set it to true for outer loops
>> with true force_vectorize. Set-up dominator for outer loop too.
>> Pass DUPLICATE_OUTER_LOOP as argument to rename_variables_in_bb.
>> (slpeel_can_duplicate_loop_p): Allow duplicate of outer loop if it
>> was marked with force_vectorize and has restricted cfg.
>> * tre-vect-loop.c (vect_analyze_loop_2): Prohibit alignment peeling
>> for outer loops.
>>
>> gcc/testsuite/ChangeLog:
>> * gcc.dg/vect/vect-outer-simd-2.c: New test.
[-- Attachment #2: patch.1.2 --]
[-- Type: application/octet-stream, Size: 6291 bytes --]
Index: testsuite/gcc.dg/vect/vect-outer-simd-2.c
===================================================================
--- testsuite/gcc.dg/vect/vect-outer-simd-2.c (revision 0)
+++ testsuite/gcc.dg/vect/vect-outer-simd-2.c (working copy)
@@ -0,0 +1,75 @@
+/* { dg-require-effective-target vect_simd_clones } */
+/* { dg-additional-options "-fopenmp-simd -ffast-math" } */
+#include <stdlib.h>
+#include "tree-vect.h"
+#define N 64
+
+float *px, *py;
+float *tx, *ty;
+float *x1, *z1, *t1, *t2;
+
+static void inline bar (const float cx, float cy,
+ float *vx, float *vy)
+{
+ int j;
+ for (j = 0; j < N; ++j)
+ {
+ const float dx = cx - px[j];
+ const float dy = cy - py[j];
+ *vx -= dx * tx[j];
+ *vy -= dy * ty[j];
+ }
+}
+
+__attribute__((noinline, noclone)) void foo1 (int n)
+{
+ int i;
+#pragma omp simd
+ for (i=0; i<n; i++)
+ bar (px[i], py[i], x1+i, z1+i);
+}
+
+__attribute__((noinline, noclone)) void foo2 (int n)
+{
+ volatile int i;
+ for (i=0; i<n; i++)
+ bar (px[i], py[i], x1+i, z1+i);
+}
+
+
+int main ()
+{
+ float *X = (float*)malloc (N * 8 * sizeof (float));
+ int i;
+ int n = N - 1;
+ check_vect ();
+ px = &X[0];
+ py = &X[N * 1];
+ tx = &X[N * 2];
+ ty = &X[N * 3];
+ x1 = &X[N * 4];
+ z1 = &X[N * 5];
+ t1 = &X[N * 6];
+ t2 = &X[N * 7];
+
+ for (i=0; i<N; i++)
+ {
+ px[i] = (float) (i+2);
+ tx[i] = (float) (i+1);
+ py[i] = (float) (i+4);
+ ty[i] = (float) (i+3);
+ x1[i] = z1[i] = 1.0f;
+ }
+ foo1 (n); /* vector variant. */
+ for (i=0; i<N;i++)
+ {
+ t1[i] = x1[i]; x1[i] = 1.0f;
+ t2[i] = z1[i]; z1[i] = 1.0f;
+ }
+ foo2 (n); /* scalar variant. */
+ for (i=0; i<N; i++)
+ if (x1[i] != t1[i] || z1[i] != t2[i])
+ abort ();
+ return 0;
+}
+/* { dg-final { scan-tree-dump "OUTER LOOP VECTORIZED" "vect" } } */
Index: tree-vect-data-refs.c
===================================================================
--- tree-vect-data-refs.c (revision 224100)
+++ tree-vect-data-refs.c (working copy)
@@ -998,7 +998,12 @@
gimple stmt = DR_STMT (dr);
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
tree vectype = STMT_VINFO_VECTYPE (stmt_info);
+ loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ /* Peeling of outer loops can't improve alignment. */
+ if (loop_vinfo && LOOP_VINFO_LOOP (loop_vinfo)->inner)
+ return false;
+
if (STMT_VINFO_GROUPED_ACCESS (stmt_info))
{
/* For interleaved access we peel only if number of iterations in
@@ -1537,8 +1542,9 @@
}
/* Check if we can possibly peel the loop. */
- if (!vect_can_advance_ivs_p (loop_vinfo)
- || !slpeel_can_duplicate_loop_p (loop, single_exit (loop)))
+ if (do_peeling
+ && (!vect_can_advance_ivs_p (loop_vinfo)
+ || !slpeel_can_duplicate_loop_p (loop, single_exit (loop))))
do_peeling = false;
if (do_peeling
Index: tree-vect-loop-manip.c
===================================================================
--- tree-vect-loop-manip.c (revision 224100)
+++ tree-vect-loop-manip.c (working copy)
@@ -97,10 +97,12 @@
}
-/* Renames the variables in basic block BB. */
+/* Renames the variables in basic block BB. Allow renaming of PHI argumnets
+ on edges incoming from outer-block header if RENAME_FROM_OUTER_LOOP is
+ true. */
static void
-rename_variables_in_bb (basic_block bb)
+rename_variables_in_bb (basic_block bb, bool rename_from_outer_loop)
{
gimple stmt;
use_operand_p use_p;
@@ -108,7 +110,14 @@
edge e;
edge_iterator ei;
struct loop *loop = bb->loop_father;
+ struct loop *outer_loop = NULL;
+ if (rename_from_outer_loop)
+ {
+ gcc_assert (loop);
+ outer_loop = loop_outer (loop);
+ }
+
for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
gsi_next (&gsi))
{
@@ -119,7 +128,8 @@
FOR_EACH_EDGE (e, ei, bb->preds)
{
- if (!flow_bb_inside_loop_p (loop, e->src))
+ if (!flow_bb_inside_loop_p (loop, e->src)
+ && (!rename_from_outer_loop || e->src != outer_loop->header))
continue;
for (gphi_iterator gsi = gsi_start_phis (bb); !gsi_end_p (gsi);
gsi_next (&gsi))
@@ -775,6 +785,7 @@
bool was_imm_dom;
basic_block exit_dest;
edge exit, new_exit;
+ bool duplicate_outer_loop = false;
exit = single_exit (loop);
at_exit = (e == exit);
@@ -786,7 +797,9 @@
bbs = XNEWVEC (basic_block, scalar_loop->num_nodes + 1);
get_loop_body_with_size (scalar_loop, bbs, scalar_loop->num_nodes);
-
+ /* Allow duplication of outer loops. */
+ if (scalar_loop->inner)
+ duplicate_outer_loop = true;
/* Check whether duplication is possible. */
if (!can_copy_bbs_p (bbs, scalar_loop->num_nodes))
{
@@ -855,7 +868,7 @@
redirect_edge_and_branch_force (e, new_preheader);
flush_pending_stmts (e);
set_immediate_dominator (CDI_DOMINATORS, new_preheader, e->src);
- if (was_imm_dom)
+ if (was_imm_dom || duplicate_outer_loop)
set_immediate_dominator (CDI_DOMINATORS, exit_dest, new_exit->src);
/* And remove the non-necessary forwarder again. Keep the other
@@ -898,7 +911,7 @@
}
for (unsigned i = 0; i < scalar_loop->num_nodes + 1; i++)
- rename_variables_in_bb (new_bbs[i]);
+ rename_variables_in_bb (new_bbs[i], duplicate_outer_loop);
if (scalar_loop != loop)
{
@@ -985,7 +998,9 @@
(3) it is single entry, single exit
(4) its exit condition is the last stmt in the header
(5) E is the entry/exit edge of LOOP.
- */
+ Allow duplication of outer loops if it is consists of exactly 5 basic
+ blocks.
+ Other conditions are taken above. */
bool
slpeel_can_duplicate_loop_p (const struct loop *loop, const_edge e)
@@ -995,6 +1010,12 @@
gcond *orig_cond = get_loop_exit_condition (loop);
gimple_stmt_iterator loop_exit_gsi = gsi_last_bb (exit_e->src);
+ if (loop->inner && loop->num_nodes == 5
+ && empty_block_p (loop->latch)
+ && single_exit (loop) && (e == exit_e || e == entry_e)
+ && orig_cond && orig_cond == gsi_stmt (loop_exit_gsi))
+ return true;
+
if (loop->inner
/* All loops have an outer scope; the only case loop->outer is NULL is for
the function itself. */
next prev parent reply other threads:[~2015-06-16 14:12 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-08 10:43 Yuri Rumyantsev
2015-06-09 13:28 ` Richard Biener
2015-06-16 14:39 ` Yuri Rumyantsev [this message]
2015-06-17 12:28 ` Richard Biener
2015-06-17 17:13 ` Yuri Rumyantsev
2015-06-29 16:53 ` Yuri Rumyantsev
2015-07-14 11:00 ` Richard Biener
2015-07-14 11:00 ` Richard Biener
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAEoMCqSVg4DvwFhuynQ29o4cTDwiLDbc9dR_QWewDKUvP6dHfQ@mail.gmail.com \
--to=ysrumyan@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=izamyatin@gmail.com \
--cc=richard.guenther@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).