* [PATCH] Simple fix to enhance outer-loop vectorization. @ 2015-05-28 11:32 Yuri Rumyantsev 2015-05-28 11:56 ` Richard Biener 0 siblings, 1 reply; 4+ messages in thread From: Yuri Rumyantsev @ 2015-05-28 11:32 UTC (permalink / raw) To: gcc-patches, Igor Zamyatin, Richard Biener [-- Attachment #1: Type: text/plain, Size: 568 bytes --] Hi All, Here is a simple patch which removes restriction on outer-loop vectorization - allow references in inner-loop with zero step. This case was found in one important benchmark. Bootstrap and regression testing did not show any new failures. Is it OK for trunk. ChangeLog: 2015-05-28 Yuri Rumyantsev <ysrumyan@gmail.com> * tree-vect-data-refs.c (vect_analyze_data_ref_access): Allow consecutive accesses within outer-loop vectorization for references with zero step in inner-loop. gcc/testsuite/ChangeLog: * gcc.dg/vect/fast-math-vect-outer-1.c: New test. [-- Attachment #2: patch --] [-- Type: application/octet-stream, Size: 1886 bytes --] Index: testsuite/gcc.dg/vect/fast-math-vect-outer-1.c =================================================================== --- testsuite/gcc.dg/vect/fast-math-vect-outer-1.c (revision 0) +++ testsuite/gcc.dg/vect/fast-math-vect-outer-1.c (working copy) @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target vect_float } */ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd" } */ + +extern float *px, *py, *tx, *ty, *x, *y; +#define N 64 +static void inline bar(float cx, float cy, float *vx, float *vy) +{ + int j; + for (j = 0; j < N; ++j) + { + const float dx = cx - px[j]; + const float dy = cy - py[j]; + *vx += dx * tx[j]; + *vy -= dy * ty[j]; + } +} + +void foo () +{ + int i; + +#pragma omp simd + for (i = 0; i < N; i++) + bar(px[i],py[i],x+i,y+i); +} + +/* { dg-final { scan-tree-dump "OUTER LOOP VECTORIZED" "vect" } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ Index: tree-vect-data-refs.c =================================================================== --- tree-vect-data-refs.c (revision 223653) +++ tree-vect-data-refs.c (working copy) @@ -2261,7 +2261,6 @@ return true; } - /* Analyze the access pattern of the data-reference DR. In case of non-consecutive accesses call vect_analyze_group_access() to analyze groups of accesses. */ @@ -2291,14 +2290,8 @@ if (loop_vinfo && integer_zerop (step)) { GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL; - if (nested_in_vect_loop_p (loop, stmt)) - { - if (dump_enabled_p ()) - dump_printf_loc (MSG_NOTE, vect_location, - "zero step in inner loop of nest\n"); - return false; - } - return DR_IS_READ (dr); + if (!nested_in_vect_loop_p (loop, stmt)) + return DR_IS_READ (dr); } if (loop && nested_in_vect_loop_p (loop, stmt)) ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] Simple fix to enhance outer-loop vectorization. 2015-05-28 11:32 [PATCH] Simple fix to enhance outer-loop vectorization Yuri Rumyantsev @ 2015-05-28 11:56 ` Richard Biener 2015-05-28 17:15 ` Yuri Rumyantsev 0 siblings, 1 reply; 4+ messages in thread From: Richard Biener @ 2015-05-28 11:56 UTC (permalink / raw) To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin On Thu, May 28, 2015 at 1:00 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote: > Hi All, > > Here is a simple patch which removes restriction on outer-loop > vectorization - allow references in inner-loop with zero step. This > case was found in one important benchmark. > > Bootstrap and regression testing did not show any new failures. > Is it OK for trunk. > > ChangeLog: > 2015-05-28 Yuri Rumyantsev <ysrumyan@gmail.com> > > * tree-vect-data-refs.c (vect_analyze_data_ref_access): Allow > consecutive accesses within outer-loop vectorization for references > with zero step in inner-loop. > > gcc/testsuite/ChangeLog: > * gcc.dg/vect/fast-math-vect-outer-1.c: New test. Can you please add a non-omp-simd testcase that triggers this as well and that is a runtime testcase verifying the transform is correct? Also please don't add to the strange testcase-name machinery but just use { dg-additional-options "-ffast-math" } Index: tree-vect-data-refs.c =================================================================== --- tree-vect-data-refs.c (revision 223653) +++ tree-vect-data-refs.c (working copy) @@ -2261,7 +2261,6 @@ return true; } - /* Analyze the access pattern of the data-reference DR. In case of non-consecutive accesses call vect_analyze_group_access() to analyze groups of accesses. */ spurious white-space change @@ -2291,14 +2290,8 @@ if (loop_vinfo && integer_zerop (step)) Surely the comment before this needs updating now. { GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL; - if (nested_in_vect_loop_p (loop, stmt)) - { - if (dump_enabled_p ()) - dump_printf_loc (MSG_NOTE, vect_location, - "zero step in inner loop of nest\n"); - return false; - } - return DR_IS_READ (dr); + if (!nested_in_vect_loop_p (loop, stmt)) + return DR_IS_READ (dr); } if (loop && nested_in_vect_loop_p (loop, stmt)) so what happens after the patch? It would be nice to have a comment explaining what happens in the nested_in_vect_loop_p case for the case when the outer-loop step is zero and when it is not zero. In particular as you don't need any code generation changes - this hints at that you may miss something ;) Otherwise of course the patch is ok - lifting restrictions is good. Thanks, Richard. ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] Simple fix to enhance outer-loop vectorization. 2015-05-28 11:56 ` Richard Biener @ 2015-05-28 17:15 ` Yuri Rumyantsev 2015-06-01 10:51 ` Richard Biener 0 siblings, 1 reply; 4+ messages in thread From: Yuri Rumyantsev @ 2015-05-28 17:15 UTC (permalink / raw) To: Richard Biener; +Cc: gcc-patches, Igor Zamyatin [-- Attachment #1: Type: text/plain, Size: 3162 bytes --] Richard, First of all, I don't think that it is possible to write out test for outer-loop vectorization with zero-step reference because of possible loop-carried dependencies and run-time aliasing is not supported for outer-loop. If there are no such dependencies pre or pdse does hoisting (sinking) of such invariant references. So I add a check on it to accept zero-step references for outer loop marked with forc-vectorize flag to guarantee absence of loop-carried dependencies between inner-loop iterations. I included run-time test that checks vectorization correctness. Update patch is attached. Yuri.. 2015-05-28 14:39 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: > On Thu, May 28, 2015 at 1:00 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote: >> Hi All, >> >> Here is a simple patch which removes restriction on outer-loop >> vectorization - allow references in inner-loop with zero step. This >> case was found in one important benchmark. >> >> Bootstrap and regression testing did not show any new failures. >> Is it OK for trunk. >> >> ChangeLog: >> 2015-05-28 Yuri Rumyantsev <ysrumyan@gmail.com> >> >> * tree-vect-data-refs.c (vect_analyze_data_ref_access): Allow >> consecutive accesses within outer-loop vectorization for references >> with zero step in inner-loop. >> >> gcc/testsuite/ChangeLog: >> * gcc.dg/vect/fast-math-vect-outer-1.c: New test. > > Can you please add a non-omp-simd testcase that triggers this as well and that > is a runtime testcase verifying the transform is correct? > > Also please don't add to the strange testcase-name machinery but just > use { dg-additional-options "-ffast-math" } > > Index: tree-vect-data-refs.c > =================================================================== > --- tree-vect-data-refs.c (revision 223653) > +++ tree-vect-data-refs.c (working copy) > @@ -2261,7 +2261,6 @@ > return true; > } > > - > /* Analyze the access pattern of the data-reference DR. > In case of non-consecutive accesses call vect_analyze_group_access() to > analyze groups of accesses. */ > > spurious white-space change > > > @@ -2291,14 +2290,8 @@ > if (loop_vinfo && integer_zerop (step)) > > Surely the comment before this needs updating now. > > { > GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL; > - if (nested_in_vect_loop_p (loop, stmt)) > - { > - if (dump_enabled_p ()) > - dump_printf_loc (MSG_NOTE, vect_location, > - "zero step in inner loop of nest\n"); > - return false; > - } > - return DR_IS_READ (dr); > + if (!nested_in_vect_loop_p (loop, stmt)) > + return DR_IS_READ (dr); > } > > if (loop && nested_in_vect_loop_p (loop, stmt)) > > so what happens after the patch? It would be nice to have a comment > explaining what happens in the nested_in_vect_loop_p case for > the case when the outer-loop step is zero and when it is not zero. > > In particular as you don't need any code generation changes - this hints > at that you may miss something ;) > > Otherwise of course the patch is ok - lifting restrictions is good. > > Thanks, > Richard. [-- Attachment #2: patch.1 --] [-- Type: application/octet-stream, Size: 3186 bytes --] Index: testsuite/gcc.dg/vect/vect-outer-simd-1.c =================================================================== --- testsuite/gcc.dg/vect/vect-outer-simd-1.c (revision 0) +++ testsuite/gcc.dg/vect/vect-outer-simd-1.c (working copy) @@ -0,0 +1,83 @@ +/* { dg-do run } */ +/* { dg-require-effective-target vect_float } */ +/* { dg-require-effective-target vect_simd_clones } */ +/* { dg-additional-options "-fopenmp-simd -ffast-math" } */ + +#include <stdio.h> +#include <stdlib.h> +#include "tree-vect.h" +#define N 64 + +float *px, *py; +float *tx, *ty; +float *x1, *z1, *t1, *t2; + +static void inline bar(const float cx, float cy, + float *vx, float *vy) +{ + int j; + for (j = 0; j < N; ++j) + { + const float dx = cx - px[j]; + const float dy = cy - py[j]; + *vx -= dx * tx[j]; + *vy -= dy * ty[j]; + } +} + +__attribute__((noinline, noclone)) void foo1 () +{ + int i; +#pragma omp simd + for (i=0; i<N; i++) + bar(px[i], py[i], x1+i, z1+i); +} + +__attribute__((noinline, noclone)) void foo2 () +{ + volatile int i; + for (i=0; i<N; i++) + bar(px[i], py[i], x1+i, z1+i); +} + + +int main() +{ + float *X = (float*)malloc(N * 8 * sizeof (float)); + int i; + check_vect (); + px = &X[0]; + py = &X[N * 1]; + tx = &X[N * 2]; + ty = &X[N * 3]; + x1 = &X[N * 4]; + z1 = &X[N * 5]; + t1 = &X[N * 6]; + t2 = &X[N * 7]; + + for (i=0; i<N; i++) + { + px[i] = (float) (i+2); + tx[i] = (float) (i+1); + py[i] = (float) (i+4); + ty[i] = (float) (i+3); + x1[i] = z1[i] = 1.0f; + } + foo1 (); /* vector variant. */ + for (i=0; i<N;i++) + { + t1[i] = x1[i]; x1[i] = 1.0f; + t2[i] = z1[i]; z1[i] = 1.0f; + } + foo2 (); /* scalar variant. */ + for (i=0; i<N; i++) + if (x1[i] != t1[i] || z1[i] != t2[i]) + { + fprintf (stderr, "Miscompare for index =%d!\n"); + return 1; + } + fprintf (stderr, "Test passed!\n"); + return 0; +} +/* { dg-final { scan-tree-dump "OUTER LOOP VECTORIZED" "vect" } } */ +/* { dg-final { cleanup-tree-dump "vect" } } */ Index: tree-vect-data-refs.c =================================================================== --- tree-vect-data-refs.c (revision 223653) +++ tree-vect-data-refs.c (working copy) @@ -2287,18 +2287,17 @@ return false; } - /* Allow invariant loads in not nested loops. */ + /* Allow loads with zero step in inner-loop vectorization. */ if (loop_vinfo && integer_zerop (step)) { GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL; - if (nested_in_vect_loop_p (loop, stmt)) - { - if (dump_enabled_p ()) - dump_printf_loc (MSG_NOTE, vect_location, - "zero step in inner loop of nest\n"); - return false; - } - return DR_IS_READ (dr); + if (!nested_in_vect_loop_p (loop, stmt)) + return DR_IS_READ (dr); + /* Allow references with zero step for outer loops marked + with pragma omp simd only - it guarantees absence of + loop-carried dependencies between inner loop iterations. */ + if (!loop->force_vectorize) + return false; } if (loop && nested_in_vect_loop_p (loop, stmt)) ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH] Simple fix to enhance outer-loop vectorization. 2015-05-28 17:15 ` Yuri Rumyantsev @ 2015-06-01 10:51 ` Richard Biener 0 siblings, 0 replies; 4+ messages in thread From: Richard Biener @ 2015-06-01 10:51 UTC (permalink / raw) To: Yuri Rumyantsev; +Cc: gcc-patches, Igor Zamyatin On Thu, May 28, 2015 at 5:51 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote: > Richard, > > First of all, I don't think that it is possible to write out test for > outer-loop vectorization with zero-step reference because of possible > loop-carried dependencies and run-time aliasing is not supported for > outer-loop. If there are no such dependencies pre or pdse does > hoisting (sinking) of such invariant references. So I add a check on > it to accept zero-step references for outer loop marked with > forc-vectorize flag to guarantee absence of loop-carried dependencies > between inner-loop iterations. > I included run-time test that checks vectorization correctness. > > Update patch is attached. Please don't use fprintf from testcases but just call abort () when you detect an error. gcc.dg/vect testcases shouldn't have an explicit dg-do run, just drop it, it is implicit. Ok with that changes. Thanks, Richard. > Yuri.. > > 2015-05-28 14:39 GMT+03:00 Richard Biener <richard.guenther@gmail.com>: >> On Thu, May 28, 2015 at 1:00 PM, Yuri Rumyantsev <ysrumyan@gmail.com> wrote: >>> Hi All, >>> >>> Here is a simple patch which removes restriction on outer-loop >>> vectorization - allow references in inner-loop with zero step. This >>> case was found in one important benchmark. >>> >>> Bootstrap and regression testing did not show any new failures. >>> Is it OK for trunk. >>> >>> ChangeLog: >>> 2015-05-28 Yuri Rumyantsev <ysrumyan@gmail.com> >>> >>> * tree-vect-data-refs.c (vect_analyze_data_ref_access): Allow >>> consecutive accesses within outer-loop vectorization for references >>> with zero step in inner-loop. >>> >>> gcc/testsuite/ChangeLog: >>> * gcc.dg/vect/fast-math-vect-outer-1.c: New test. >> >> Can you please add a non-omp-simd testcase that triggers this as well and that >> is a runtime testcase verifying the transform is correct? >> >> Also please don't add to the strange testcase-name machinery but just >> use { dg-additional-options "-ffast-math" } >> >> Index: tree-vect-data-refs.c >> =================================================================== >> --- tree-vect-data-refs.c (revision 223653) >> +++ tree-vect-data-refs.c (working copy) >> @@ -2261,7 +2261,6 @@ >> return true; >> } >> >> - >> /* Analyze the access pattern of the data-reference DR. >> In case of non-consecutive accesses call vect_analyze_group_access() to >> analyze groups of accesses. */ >> >> spurious white-space change >> >> >> @@ -2291,14 +2290,8 @@ >> if (loop_vinfo && integer_zerop (step)) >> >> Surely the comment before this needs updating now. >> >> { >> GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt)) = NULL; >> - if (nested_in_vect_loop_p (loop, stmt)) >> - { >> - if (dump_enabled_p ()) >> - dump_printf_loc (MSG_NOTE, vect_location, >> - "zero step in inner loop of nest\n"); >> - return false; >> - } >> - return DR_IS_READ (dr); >> + if (!nested_in_vect_loop_p (loop, stmt)) >> + return DR_IS_READ (dr); >> } >> >> if (loop && nested_in_vect_loop_p (loop, stmt)) >> >> so what happens after the patch? It would be nice to have a comment >> explaining what happens in the nested_in_vect_loop_p case for >> the case when the outer-loop step is zero and when it is not zero. >> >> In particular as you don't need any code generation changes - this hints >> at that you may miss something ;) >> >> Otherwise of course the patch is ok - lifting restrictions is good. >> >> Thanks, >> Richard. ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2015-06-01 10:51 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-05-28 11:32 [PATCH] Simple fix to enhance outer-loop vectorization Yuri Rumyantsev 2015-05-28 11:56 ` Richard Biener 2015-05-28 17:15 ` Yuri Rumyantsev 2015-06-01 10:51 ` Richard Biener
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).