* [patch] [4.3 projects] outer-loop vectorization
@ 2007-08-07 20:58 Dorit Nuzman
0 siblings, 0 replies; only message in thread
From: Dorit Nuzman @ 2007-08-07 20:58 UTC (permalink / raw)
To: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 3737 bytes --]
Hi,
This patch brings over from autovect-branch the ability to vectorize
outer-loops (doubly-nested loops).
Here's an example for a loop that could be vectorized with this
optimization (it's an FIR-filter, extremely common in multimedia
applications):
for (i=0; i<N; i++){
s=0;
for (j=0; j<M; j+=4)
s += in[i+j] * coeffs[j];
out[i]=s;
}
...when outer-loop-vectorized it would look something like this:
for (i=0; i<N; i+=4){
vs=[0,0,0,0]
for (j=0; j<M; j+=4)
vs += a[i+j,(i+1)+j,(i+2)+j,(i+3)+j] * b[j,j,j,j]
a[i,i+1,i+2,i+3] = vs
}
Note that the inner-loop still executes M iterations sequentially, but in
each inner-loop iteration we do 4 consecutive outer-loop iterations
together.
Why is it better to vectorize the outer-loop in this case, rather than the
inner-loop?
* Correctness: sometimes you can't vectorize the inner-loop, cause maybe it
computes a reduction (e.g., summation of floats, like in the example
above), and you are not allowed to change the order of the computation.
Inner-loop vectorization would change the order of the computation, but
with outer-loop vectorization the reduction in the inner-loop will be
computed in the original order (in parallel for 4 different outputs).
* Performance: Inner-loop vectorization would create a reduction epilog (to
reduce the vector of partial results into a scalar result) after the
inner-loop (executed in each outer-loop iteration). Outer-loop
vectorization does not require a reduction epilog after the inner-loop. And
of course, with outer-loop vectorization everything is vectorized (both in
and out of the inner-loop).
There may be potentially other ways to vectorize outer-loops. One is
performing loop-interchange, but note that we have a non-perfect nest here,
which makes it much more difficult both to interchange, and to vectorize
after that. Another, I think more promising way, is doing unroll-and-jam
(unroll the outer-loop and jam together the copies of the inner-loop), and
then apply SLP in the outer-loop (and inner-loop) on top of that. Operating
on the outer-loop as a whole has the advantage of having a global view of
the memory ranges accessed in different loop-iteration, to exploit
data-reuse and improve alignment handling. Potentially such optimizations
could also be implemented separately after vectorization on top of SLP-ed
code. When we have unroll-and-jam and mature enough SLP in GCC, it would be
interesting to give that a try and compare the two schemes.
Attached below is the complete patch (almost half of it is testcases),
however I plan to commit it in two parts: The first part brings in the most
basic/minimal outer-loop vectorization support - it can vectorize
outerloops only if there are no memory-references in the inner-loop. The
second patch brings in the support for memory-references in the inner-loop.
Each of the patches was bootstrapped on powerp64-linux,
bootstraped with vectorization enabled on i386-linux,
and passed full regression testing on both platforms.
I'll submit the two parts tomorrow, and will wait at least a week before
committing them to allow people to review and comment.
The main features that are not supported yet (and that I will continue to
implment) are:
- multiple types in the inner-loop
- strided-accesses in the inner-loop
- things that require peeling/versioning of nested loops (unknown loop
bound in the outer-loop, and sometimes misaligned accesses in the
outer-loop)
- reduction detection improvements
thanks,
dorit
(See attached file: mainlineouterloopdiff123t.txt)
[-- Attachment #2: mainlineouterloopdiff123t.txt --]
[-- Type: text/plain, Size: 247066 bytes --]
Index: testsuite/gcc.dg/vect/vect-widen-mult-sum.c
===================================================================
*** testsuite/gcc.dg/vect/vect-widen-mult-sum.c (revision 127202)
--- testsuite/gcc.dg/vect/vect-widen-mult-sum.c (working copy)
*************** int main (void)
*** 42,45 ****
--- 42,46 ----
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_widen_mult_hi_to_si } } } */
+ /* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 1 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/vect-outer-2b.c
===================================================================
*** testsuite/gcc.dg/vect/vect-outer-2b.c (revision 0)
--- testsuite/gcc.dg/vect/vect-outer-2b.c (revision 0)
***************
*** 0 ****
--- 1,41 ----
+ /* { dg-require-effective-target vect_float } */
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+ float image[2*N][N][N] __attribute__ ((__aligned__(16)));
+
+ void
+ foo (){
+ int i,j,k;
+
+ for (k=0; k<N; k++) {
+ for (i = 0; i < N; i++) {
+ for (j = 0; j < N; j++) {
+ image[k+i][j][i] = j+i+k;
+ }
+ }
+ }
+ }
+
+ int main (void)
+ {
+ check_vect ();
+ int i, j, k;
+
+ foo ();
+
+ for (k=0; k<N; k++) {
+ for (i = 0; i < N; i++) {
+ for (j = 0; j < N; j++) {
+ if (image[k+i][j][i] != j+i+k)
+ abort ();
+ }
+ }
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "strided access in outer loop." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/vect-outer-4.c
===================================================================
*** testsuite/gcc.dg/vect/vect-outer-4.c (revision 0)
--- testsuite/gcc.dg/vect/vect-outer-4.c (revision 0)
***************
*** 0 ****
--- 1,55 ----
+ /* { dg-require-effective-target vect_float } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+ #define M 128
+ float in[N+M];
+ float coeff[M];
+ float out[N];
+
+ /* Outer-loop vectorization. */
+
+ void
+ foo (){
+ int i,j;
+ float diff;
+
+ for (i = 0; i < N; i++) {
+ diff = 0;
+ for (j = 0; j < M; j+=4) {
+ diff += in[j+i]*coeff[j];
+ }
+ out[i]=diff;
+ }
+ }
+
+ int main (void)
+ {
+ check_vect ();
+ int i, j;
+ float diff;
+
+ for (i = 0; i < M; i++)
+ coeff[i] = i;
+ for (i = 0; i < N+M; i++)
+ in[i] = i;
+
+ foo ();
+
+ for (i = 0; i < N; i++) {
+ diff = 0;
+ for (j = 0; j < M; j+=4) {
+ diff += in[j+i]*coeff[j];
+ }
+ if (out[i] != diff)
+ abort ();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
+ /* { dg-final { scan-tree-dump-times "zero step in outer loop." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-7.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-7.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-7.c (revision 0)
***************
*** 0 ****
--- 1,75 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 16
+
+ unsigned short in[N];
+ unsigned short coef[N];
+ unsigned short a[N];
+
+ unsigned int
+ foo (short scale){
+ int i;
+ unsigned short j;
+ unsigned int sum = 0;
+ unsigned short sum_j;
+
+ for (i = 0; i < N; i++) {
+ sum_j = 0;
+ for (j = 0; j < N; j++) {
+ sum_j += j;
+ }
+ a[i] = sum_j;
+ sum += ((unsigned int) in[i] * (unsigned int) coef[i]) >> scale;
+ }
+ return sum;
+ }
+
+ unsigned short
+ bar (void)
+ {
+ unsigned short j;
+ unsigned short sum_j;
+
+ sum_j = 0;
+ for (j = 0; j < N; j++) {
+ sum_j += j;
+ }
+
+ return sum_j;
+ }
+
+ int main (void)
+ {
+ int i;
+ unsigned short j, sum_j;
+ unsigned int sum = 0;
+ unsigned int res;
+
+ check_vect ();
+
+ for (i=0; i<N; i++){
+ in[i] = 2*i;
+ coef[i] = i;
+ }
+
+ res = foo (2);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ if (a[i] != bar ())
+ abort ();
+ sum += ((unsigned int) in[i] * (unsigned int) coef[i]) >> 2;
+ }
+ if (res != sum)
+ abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target vect_widen_mult_hi_to_si } } } */
+ /* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/vect-outer-4g.c
===================================================================
*** testsuite/gcc.dg/vect/vect-outer-4g.c (revision 0)
--- testsuite/gcc.dg/vect/vect-outer-4g.c (revision 0)
***************
*** 0 ****
--- 1,70 ----
+ /* { dg-require-effective-target vect_int } */
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+ #define M 128
+ unsigned short in[N+M];
+ unsigned int out[N];
+ unsigned char arr[N];
+
+ /* Outer-loop vectorization. */
+ /* Not vectorized due to multiple-types in the inner-loop. */
+
+ unsigned int
+ foo (){
+ int i,j;
+ unsigned int diff;
+ unsigned int s=0;
+
+ for (i = 0; i < N; i++) {
+ arr[i] = 3;
+ diff = 0;
+ for (j = 0; j < M; j+=8) {
+ diff += in[j+i];
+ }
+ s+=diff;
+ }
+ return s;
+ }
+
+ unsigned int
+ bar (int i, unsigned int diff, unsigned short *in)
+ {
+ int j;
+ for (j = 0; j < M; j+=8) {
+ diff += in[j+i];
+ }
+ return diff;
+ }
+
+ int main (void)
+ {
+ int i, j;
+ unsigned int diff;
+ unsigned int s=0,sum=0;
+
+ check_vect ();
+
+ for (i = 0; i < N+M; i++) {
+ in[i] = i;
+ }
+
+ sum=foo ();
+
+ for (i = 0; i < N; i++) {
+ arr[i] = 3;
+ diff = 0;
+ diff = bar (i, diff, in);
+ s += diff;
+ }
+
+ if (s != sum)
+ abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { scan-tree-dump-times "vect_recog_widen_sum_pattern: not allowed" 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-10.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-10.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-10.c (revision 0)
***************
*** 0 ****
--- 1,54 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+ int b[N];
+
+ int
+ foo (int n){
+ int i,j;
+ int sum,x,y;
+
+ for (i = 0; i < N/2; i++) {
+ sum = 0;
+ x = b[2*i];
+ y = b[2*i+1];
+ for (j = 0; j < n; j++) {
+ sum += j;
+ }
+ a[2*i] = sum + x;
+ a[2*i+1] = sum + y;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ b[i] = i;
+
+ foo (N-1);
+
+ /* check results: */
+ for (i=0; i<N/2; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N-1; j++)
+ sum += j;
+ if (a[2*i] != sum + b[2*i] || a[2*i+1] != sum + b[2*i+1])
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-10a.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-10a.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-10a.c (revision 0)
***************
*** 0 ****
--- 1,58 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+ int b[N];
+
+ int
+ foo (int n){
+ int i,j;
+ int sum,x,y;
+
+ if (n<=0)
+ return 0;
+
+ for (i = 0; i < N/2; i++) {
+ sum = 0;
+ x = b[2*i];
+ y = b[2*i+1];
+ j = 0;
+ do {
+ sum += j;
+ } while (++j < n);
+ a[2*i] = sum + x;
+ a[2*i+1] = sum + y;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ b[i] = i;
+
+ foo (N-1);
+
+ /* check results: */
+ for (i=0; i<N/2; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N-1; j++)
+ sum += j;
+ if (a[2*i] != sum + b[2*i] || a[2*i+1] != sum + b[2*i+1])
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-18.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-18.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-18.c (revision 0)
***************
*** 0 ****
--- 1,51 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (){
+ int i,j;
+ int sum;
+
+ for (i = 0; i < N/2; i++) {
+ sum = 0;
+ for (j = 0; j < N; j++) {
+ sum += j;
+ }
+ a[2*i] = sum;
+ a[2*i+1] = 2*sum;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ foo ();
+
+ /* check results: */
+ for (i=0; i<N/2; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += j;
+ if (a[2*i] != sum || a[2*i+1] != 2*sum)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/vect-outer-3a.c
===================================================================
*** testsuite/gcc.dg/vect/vect-outer-3a.c (revision 0)
--- testsuite/gcc.dg/vect/vect-outer-3a.c (revision 0)
***************
*** 0 ****
--- 1,53 ----
+ /* { dg-require-effective-target vect_float } */
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+ float image[N][N+1] __attribute__ ((__aligned__(16)));
+ float out[N];
+
+ /* Outer-loop vectorization with misaliged accesses in the inner-loop. */
+
+ void
+ foo (){
+ int i,j;
+ float diff;
+
+ for (i = 0; i < N; i++) {
+ diff = 0;
+ for (j = 0; j < N; j++) {
+ diff += image[j][i];
+ }
+ out[i]=diff;
+ }
+ }
+
+ int main (void)
+ {
+ check_vect ();
+ int i, j;
+ float diff;
+
+ for (i = 0; i < N; i++) {
+ for (j = 0; j < N; j++) {
+ image[i][j]=i+j;
+ }
+ }
+
+ foo ();
+
+ for (i = 0; i < N; i++) {
+ diff = 0;
+ for (j = 0; j < N; j++) {
+ diff += image[j][i];
+ }
+ if (out[i] != diff)
+ abort ();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail vect_no_align } } } */
+ /* { dg-final { scan-tree-dump-times "step doesn't divide the vector-size" 2 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/vect-outer-5.c
===================================================================
*** testsuite/gcc.dg/vect/vect-outer-5.c (revision 0)
--- testsuite/gcc.dg/vect/vect-outer-5.c (revision 0)
***************
*** 0 ****
--- 1,80 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include <signal.h>
+ #include "tree-vect.h"
+
+ #define N 64
+ #define MAX 42
+
+ extern void abort(void);
+
+ int main1 ()
+ {
+ float A[N] __attribute__ ((__aligned__(16)));
+ float B[N] __attribute__ ((__aligned__(16)));
+ float C[N] __attribute__ ((__aligned__(16)));
+ float D[N] __attribute__ ((__aligned__(16)));
+ float s;
+
+ int i, j;
+
+ for (i = 0; i < N; i++)
+ {
+ A[i] = i;
+ B[i] = i;
+ C[i] = i;
+ D[i] = i;
+ }
+
+ /* Outer-loop 1: Vectorizable with respect to dependence distance. */
+ for (i = 0; i < N-20; i++)
+ {
+ s = 0;
+ for (j=0; j<N; j+=4)
+ s += C[j];
+ A[i] = A[i+20] + s;
+ }
+
+ /* check results: */
+ for (i = 0; i < N-20; i++)
+ {
+ s = 0;
+ for (j=0; j<N; j+=4)
+ s += C[j];
+ if (A[i] != D[i+20] + s)
+ abort ();
+ }
+
+ /* Outer-loop 2: Not vectorizable because of dependence distance. */
+ for (i = 0; i < 4; i++)
+ {
+ s = 0;
+ for (j=0; j<N; j+=4)
+ s += C[j];
+ B[i] = B[i+3] + s;
+ }
+
+ /* check results: */
+ for (i = 0; i < 4; i++)
+ {
+ s = 0;
+ for (j=0; j<N; j+=4)
+ s += C[j];
+ if (B[i] != D[i+3] + s)
+ abort ();
+ }
+
+ return 0;
+ }
+
+ int main ()
+ {
+ check_vect ();
+ return main1();
+ }
+
+ /* { dg-final { scan-tree-dump-times "not vectorized: possible dependence between data-refs" 1 "vect" } } */
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
+ /* { dg-final { scan-tree-dump-times "zero step in outer loop." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/vect-outer-2c.c
===================================================================
*** testsuite/gcc.dg/vect/vect-outer-2c.c (revision 0)
--- testsuite/gcc.dg/vect/vect-outer-2c.c (revision 0)
***************
*** 0 ****
--- 1,41 ----
+ /* { dg-require-effective-target vect_float } */
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+ float image[2*N][2*N][N] __attribute__ ((__aligned__(16)));
+
+ void
+ foo (){
+ int i,j,k;
+
+ for (k=0; k<N; k++) {
+ for (i = 0; i < N; i++) {
+ for (j = 0; j < N; j+=2) {
+ image[k][j][i] = j+i+k;
+ }
+ }
+ }
+ }
+
+ int main (void)
+ {
+ check_vect ();
+ int i, j, k;
+
+ foo ();
+
+ for (k=0; k<N; k++) {
+ for (i = 0; i < N; i++) {
+ for (j = 0; j < N; j+=2) {
+ if (image[k][j][i] != j+i+k)
+ abort ();
+ }
+ }
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-8.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-8.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-8.c (revision 0)
***************
*** 0 ****
--- 1,50 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+
+ int
+ foo (int *a){
+ int i,j;
+ int sum;
+
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ for (j = 0; j < N; j++) {
+ sum += j;
+ }
+ a[i] = sum;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+ int a[N];
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ foo (a);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += j;
+ if (a[i] != sum)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-outer-fir.c
===================================================================
*** testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-outer-fir.c (revision 0)
--- testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-outer-fir.c (revision 0)
***************
*** 0 ****
--- 1,70 ----
+ /* { dg-require-effective-target vect_float } */
+
+ #include <stdarg.h>
+ #include "../../tree-vect.h"
+
+ #define N 32
+ #define M 16
+ float in[N+M];
+ float coeff[M];
+ float out[N];
+ float fir_out[N];
+
+ /* Vectorized. Fixed misaligment in the inner-loop. */
+ void foo (){
+ int i,j,k;
+ float diff;
+
+ for (i = 0; i < N; i++) {
+ out[i] = 0;
+ }
+
+ for (k = 0; k < 4; k++) {
+ for (i = 0; i < N; i++) {
+ diff = 0;
+ for (j = k; j < M; j+=4) {
+ diff += in[j+i]*coeff[j];
+ }
+ out[i] += diff;
+ }
+ }
+ }
+
+ /* Vectorized. Changing misalignment in the inner-loop. */
+ void fir (){
+ int i,j,k;
+ float diff;
+
+ for (i = 0; i < N; i++) {
+ diff = 0;
+ for (j = 0; j < M; j++) {
+ diff += in[j+i]*coeff[j];
+ }
+ fir_out[i] = diff;
+ }
+ }
+
+ int main (void)
+ {
+ check_vect ();
+ int i, j;
+ float diff;
+
+ for (i = 0; i < M; i++)
+ coeff[i] = i;
+ for (i = 0; i < N+M; i++)
+ in[i] = i;
+
+ foo ();
+ fir ();
+
+ for (i = 0; i < N; i++) {
+ if (out[i] != fir_out[i])
+ abort ();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 2 "vect" { xfail vect_no_align } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-11.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-11.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-11.c (revision 0)
***************
*** 0 ****
--- 1,50 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (int n){
+ int i,j;
+ int sum;
+
+ for (i = 0; i < n; i++) {
+ sum = 0;
+ for (j = 0; j < N; j++) {
+ sum += j;
+ }
+ a[i] = sum;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ foo (N);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += j;
+ if (a[i] != sum)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-10b.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-10b.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-10b.c (revision 0)
***************
*** 0 ****
--- 1,57 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+ int b[N];
+
+ int
+ foo (int n){
+ int i,j;
+ int sum,x,y;
+
+ if (n<=0)
+ return 0;
+
+ for (i = 0; i < N/2; i++) {
+ sum = 0;
+ x = b[2*i];
+ y = b[2*i+1];
+ for (j = 0; j < n; j++) {
+ sum += j;
+ }
+ a[2*i] = sum + x;
+ a[2*i+1] = sum + y;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ b[i] = i;
+
+ foo (N-1);
+
+ /* check results: */
+ for (i=0; i<N/2; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N-1; j++)
+ sum += j;
+ if (a[2*i] != sum + b[2*i] || a[2*i+1] != sum + b[2*i+1])
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-19.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-19.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-19.c (revision 0)
***************
*** 0 ****
--- 1,52 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 64
+
+ unsigned short a[N];
+ unsigned int b[N];
+
+ int
+ foo (){
+ unsigned short i,j;
+ unsigned short sum;
+
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ for (j = 0; j < N; j++) {
+ sum += j;
+ }
+ a[i] = sum;
+ b[i] = (unsigned int)sum;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ short sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ foo ();
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += j;
+ if (a[i] != sum || b[i] != (unsigned int)sum)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/vect-outer-3b.c
===================================================================
*** testsuite/gcc.dg/vect/vect-outer-3b.c (revision 0)
--- testsuite/gcc.dg/vect/vect-outer-3b.c (revision 0)
***************
*** 0 ****
--- 1,53 ----
+ /* { dg-require-effective-target vect_float } */
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+ float image[N][N] __attribute__ ((__aligned__(16)));
+ float out[N];
+
+ /* Outer-loop vectorization with non-consecutive access. Not vectorized yet. */
+
+ void
+ foo (){
+ int i,j;
+ float diff;
+
+ for (i = 0; i < N/2; i++) {
+ diff = 0;
+ for (j = 0; j < N; j++) {
+ diff += image[j][2*i];
+ }
+ out[i]=diff;
+ }
+ }
+
+ int main (void)
+ {
+ check_vect ();
+ int i, j;
+ float diff;
+
+ for (i = 0; i < N; i++) {
+ for (j = 0; j < N; j++) {
+ image[i][j]=i+j;
+ }
+ }
+
+ foo ();
+
+ for (i = 0; i < N/2; i++) {
+ diff = 0;
+ for (j = 0; j < N; j++) {
+ diff += image[j][2*i];
+ }
+ if (out[i] != diff)
+ abort ();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { scan-tree-dump-times "strided access in outer loop" 2 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-20.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-20.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-20.c (revision 0)
***************
*** 0 ****
--- 1,54 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+ int b[N];
+
+ int
+ foo (){
+ int i,j;
+ int sum,x,y;
+
+ for (i = 0; i < N/2; i++) {
+ sum = 0;
+ x = b[2*i];
+ y = b[2*i+1];
+ for (j = 0; j < N; j++) {
+ sum += j;
+ }
+ a[2*i] = sum + x;
+ a[2*i+1] = sum + y;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ b[i] = i;
+
+ foo ();
+
+ /* check results: */
+ for (i=0; i<N/2; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += j;
+ if (a[2*i] != sum + b[2*i] || a[2*i+1] != sum + b[2*i+1])
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-1.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-1.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-1.c (revision 0)
***************
*** 0 ****
--- 1,23 ----
+ /* { dg-do compile } */
+
+ #define N 40
+ signed short image[N][N];
+ signed short block[N][N];
+
+ /* memory references in the inner-loop */
+
+ unsigned int
+ foo (){
+ int i,j;
+ unsigned int diff = 0;
+
+ for (i = 0; i < N; i++) {
+ for (j = 0; j < N; j++) {
+ diff += (image[i][j] - block[i][j]);
+ }
+ }
+ return diff;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/vect-outer-4a.c
===================================================================
*** testsuite/gcc.dg/vect/vect-outer-4a.c (revision 0)
--- testsuite/gcc.dg/vect/vect-outer-4a.c (revision 0)
***************
*** 0 ****
--- 1,31 ----
+ /* { dg-do compile } */
+
+ #define N 40
+ #define M 128
+ signed short in[N+M];
+ signed short coeff[M];
+ signed short out[N];
+
+ /* Outer-loop vectorization.
+ Currently not vectorized because of multiple-data-types in the inner-loop. */
+
+ void
+ foo (){
+ int i,j;
+ int diff;
+
+ for (i = 0; i < N; i++) {
+ diff = 0;
+ for (j = 0; j < M; j+=8) {
+ diff += in[j+i]*coeff[j];
+ }
+ out[i]=diff;
+ }
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
+ /* FORNOW. not vectorized until we support 0-stride acceses like coeff[j]. should be:
+ { scan-tree-dump-not "multiple types in nested loop." "vect" { xfail *-*-* } } } */
+
+ /* { dg-final { scan-tree-dump-times "zero step in outer loop." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/vect-outer-6.c
===================================================================
*** testsuite/gcc.dg/vect/vect-outer-6.c (revision 0)
--- testsuite/gcc.dg/vect/vect-outer-6.c (revision 0)
***************
*** 0 ****
--- 1,65 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include <signal.h>
+ #include "tree-vect.h"
+
+ #define N 64
+ #define MAX 42
+
+ float A[N] __attribute__ ((__aligned__(16)));
+ float B[N] __attribute__ ((__aligned__(16)));
+ float C[N] __attribute__ ((__aligned__(16)));
+ float D[N] __attribute__ ((__aligned__(16)));
+ extern void abort(void);
+
+ int main1 ()
+ {
+ float s;
+
+ int i, j;
+
+ for (i = 0; i < 8; i++)
+ {
+ s = 0;
+ for (j=0; j<8; j+=4)
+ s += C[j];
+ A[i] = s;
+ }
+
+ return 0;
+ }
+
+ int main ()
+ {
+ int i,j;
+ float s;
+
+ check_vect ();
+
+ for (i = 0; i < N; i++)
+ {
+ A[i] = i;
+ B[i] = i;
+ C[i] = i;
+ D[i] = i;
+ }
+
+ main1();
+
+ /* check results: */
+ for (i = 0; i < 8; i++)
+ {
+ s = 0;
+ for (j=0; j<8; j+=4)
+ s += C[j];
+ if (A[i] != s)
+ abort ();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
+ /* { dg-final { scan-tree-dump-times "zero step in outer loop." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-9.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-9.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-9.c (revision 0)
***************
*** 0 ****
--- 1,50 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (int n){
+ int i,j;
+ int sum;
+
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ for (j = 0; j < n; j++) {
+ sum += j;
+ }
+ a[i] = sum;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ foo (N);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += j;
+ if (a[i] != sum)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/vect-outer-4i.c
===================================================================
*** testsuite/gcc.dg/vect/vect-outer-4i.c (revision 0)
--- testsuite/gcc.dg/vect/vect-outer-4i.c (revision 0)
***************
*** 0 ****
--- 1,28 ----
+ /* { dg-do compile } */
+
+ #define N 40
+ #define M 128
+ unsigned char in[N+M];
+ unsigned short out[N];
+
+ /* Outer-loop vectorization. */
+ /* Not vectorized due to multiple-types in the inner-loop. */
+
+ unsigned short
+ foo (){
+ int i,j;
+ unsigned short diff;
+ unsigned short s=0;
+
+ for (i = 0; i < N; i++) {
+ diff = 0;
+ for (j = 0; j < M; j+=8) {
+ diff += in[j+i];
+ }
+ s+=diff;
+ }
+ return s;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-12.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-12.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-12.c (revision 0)
***************
*** 0 ****
--- 1,50 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 64
+
+ int a[N];
+ short b[N];
+
+ int
+ foo (){
+ int i,j;
+ int sum;
+
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ for (j = 0; j < N; j++) {
+ sum += j;
+ }
+ a[i] = sum;
+ b[i] = (short)sum;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ foo ();
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += j;
+ if (a[i] != sum || b[i] != (short)sum)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* Until we support multiple types in the inner loop */
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/vect-outer-3c.c
===================================================================
*** testsuite/gcc.dg/vect/vect-outer-3c.c (revision 0)
--- testsuite/gcc.dg/vect/vect-outer-3c.c (revision 0)
***************
*** 0 ****
--- 1,52 ----
+ /* { dg-require-effective-target vect_float } */
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+ float image[N][N+1] __attribute__ ((__aligned__(16)));
+ float out[N];
+
+ /* Outer-loop vectorization. */
+
+ void
+ foo (){
+ int i,j;
+ float diff;
+
+ for (i = 0; i < N; i++) {
+ diff = 0;
+ for (j = 0; j < N; j+=4) {
+ diff += image[j][i];
+ }
+ out[i]=diff;
+ }
+ }
+
+ int main (void)
+ {
+ check_vect ();
+ int i, j;
+ float diff;
+
+ for (i = 0; i < N; i++) {
+ for (j = 0; j < N; j++) {
+ image[i][j]=i+j;
+ }
+ }
+
+ foo ();
+
+ for (i = 0; i < N; i++) {
+ diff = 0;
+ for (j = 0; j < N; j+=4) {
+ diff += image[j][i];
+ }
+ if (out[i] != diff)
+ abort ();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/vect-outer-fir-lb.c
===================================================================
*** testsuite/gcc.dg/vect/vect-outer-fir-lb.c (revision 0)
--- testsuite/gcc.dg/vect/vect-outer-fir-lb.c (revision 0)
***************
*** 0 ****
--- 1,80 ----
+ /* { dg-require-effective-target vect_float } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+ #define M 64
+ float in[N+M];
+ float coeff[M];
+ float out[N];
+ float fir_out[N];
+
+ /* Should be vectorized. Fixed misaligment in the inner-loop. */
+ /* Currently not vectorized because the loop-count for the inner-loop
+ has a maybe_zero component. Will be fixed when we incorporate the
+ "cond_expr in rhs" patch. */
+ void foo (){
+ int i,j,k;
+ float diff;
+
+ for (i = 0; i < N; i++) {
+ out[i] = 0;
+ }
+
+ for (k = 0; k < 4; k++) {
+ for (i = 0; i < N; i++) {
+ diff = 0;
+ j = k;
+
+ do {
+ diff += in[j+i]*coeff[j];
+ j+=4;
+ } while (j < M);
+
+ out[i] += diff;
+ }
+ }
+
+ }
+
+ /* Vectorized. Changing misalignment in the inner-loop. */
+ void fir (){
+ int i,j,k;
+ float diff;
+
+ for (i = 0; i < N; i++) {
+ diff = 0;
+ for (j = 0; j < M; j++) {
+ diff += in[j+i]*coeff[j];
+ }
+ fir_out[i] = diff;
+ }
+ }
+
+
+ int main (void)
+ {
+ check_vect ();
+ int i, j;
+ float diff;
+
+ for (i = 0; i < M; i++)
+ coeff[i] = i;
+ for (i = 0; i < N+M; i++)
+ in[i] = i;
+
+ foo ();
+ fir ();
+
+ for (i = 0; i < N; i++) {
+ if (out[i] != fir_out[i])
+ abort ();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 2 "vect" { xfail *-*-* } } } */
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail vect_no_align } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-21.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-21.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-21.c (revision 0)
***************
*** 0 ****
--- 1,62 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (){
+ int i;
+ unsigned short j;
+ int sum = 0;
+ unsigned short sum_j;
+
+ for (i = 0; i < N; i++) {
+ sum += i;
+
+ sum_j = i;
+ for (j = 0; j < N; j++) {
+ sum_j += j;
+ }
+ a[i] = sum_j + 5;
+ }
+ return sum;
+ }
+
+ int main (void)
+ {
+ int i;
+ unsigned short j, sum_j;
+ int sum = 0;
+ int res;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ res = foo ();
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum += i;
+
+ sum_j = i;
+ for (j = 0; j < N; j++){
+ sum_j += j;
+ }
+ if (a[i] != sum_j + 5)
+ abort();
+ }
+ if (res != sum)
+ abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-2.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-2.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-2.c (revision 0)
***************
*** 0 ****
--- 1,18 ----
+ /* { dg-do compile } */
+ #define N 40
+
+ int
+ foo (){
+ int i,j;
+ int diff = 0;
+
+ for (i = 0; i < N; i++) {
+ for (j = 0; j < N; j++) {
+ diff += j;
+ }
+ }
+ return diff;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/vect-outer-4b.c
===================================================================
*** testsuite/gcc.dg/vect/vect-outer-4b.c (revision 0)
--- testsuite/gcc.dg/vect/vect-outer-4b.c (revision 0)
***************
*** 0 ****
--- 1,31 ----
+ /* { dg-do compile } */
+
+ #define N 40
+ #define M 128
+ signed short in[N+M];
+ signed short coeff[M];
+ int out[N];
+
+ /* Outer-loop vectorization.
+ Currently not vectorized because of multiple-data-types in the inner-loop. */
+
+ void
+ foo (){
+ int i,j;
+ int diff;
+
+ for (i = 0; i < N; i++) {
+ diff = 0;
+ for (j = 0; j < M; j+=8) {
+ diff += in[j+i]*coeff[j];
+ }
+ out[i]=diff;
+ }
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
+ /* FORNOW. not vectorized until we support 0-stride acceses like coeff[j]. should be:
+ { scan-tree-dump-not "multiple types in nested loop." "vect" { xfail *-*-* } } } */
+
+ /* { dg-final { scan-tree-dump-times "zero step in outer loop." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/vect-117.c
===================================================================
*** testsuite/gcc.dg/vect/vect-117.c (revision 127202)
--- testsuite/gcc.dg/vect/vect-117.c (working copy)
*************** static int c[N][N] = {{ 1, 2, 3, 4, 5},
*** 20,26 ****
volatile int foo;
! int main1 (int A[N][N])
{
int i,j;
--- 20,26 ----
volatile int foo;
! int main1 (int A[N][N], int n)
{
int i,j;
*************** int main1 (int A[N][N])
*** 28,34 ****
/* vectorizable */
for (i = 1; i < N; i++)
{
! for (j = 0; j < N; j++)
{
A[i][j] = A[i-1][j] + A[i][j];
}
--- 28,34 ----
/* vectorizable */
for (i = 1; i < N; i++)
{
! for (j = 0; j < n; j++)
{
A[i][j] = A[i-1][j] + A[i][j];
}
*************** int main (void)
*** 42,48 ****
int i,j;
foo = 0;
! main1 (a);
/* check results: */
--- 42,48 ----
int i,j;
foo = 0;
! main1 (a, N);
/* check results: */
Index: testsuite/gcc.dg/vect/vect-outer-4j.c
===================================================================
*** testsuite/gcc.dg/vect/vect-outer-4j.c (revision 0)
--- testsuite/gcc.dg/vect/vect-outer-4j.c (revision 0)
***************
*** 0 ****
--- 1,26 ----
+ /* { dg-do compile } */
+
+ #define N 40
+ #define M 128
+ unsigned char in[N+M];
+ unsigned short out[N];
+
+ /* Outer-loop vectorization. */
+ /* Not vectorized due to multiple-types in the inner-loop. */
+
+ void
+ foo (){
+ int i,j;
+ unsigned short diff;
+
+ for (i = 0; i < N; i++) {
+ diff = 0;
+ for (j = 0; j < M; j+=8) {
+ diff += in[j+i];
+ }
+ out[i]=diff;
+ }
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-13.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-13.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-13.c (revision 0)
***************
*** 0 ****
--- 1,67 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 16
+
+ unsigned short in[N];
+
+ unsigned int
+ foo (short scale){
+ int i;
+ unsigned short j;
+ unsigned int sum = 0;
+ unsigned short sum_j;
+
+ for (i = 0; i < N; i++) {
+ sum_j = 0;
+ for (j = 0; j < N; j++) {
+ sum_j += j;
+ }
+ sum += ((unsigned int) in[i] * (unsigned int) sum_j) >> scale;
+ }
+ return sum;
+ }
+
+ unsigned short
+ bar (void)
+ {
+ unsigned short j;
+ unsigned short sum_j;
+ sum_j = 0;
+ for (j = 0; j < N; j++) {
+ sum_j += j;
+ }
+ return sum_j;
+ }
+
+ int main (void)
+ {
+ int i;
+ unsigned short j, sum_j;
+ unsigned int sum = 0;
+ unsigned int res;
+
+ check_vect ();
+
+ for (i=0; i<N; i++){
+ in[i] = i;
+ }
+
+ res = foo (2);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum_j = bar ();
+ sum += ((unsigned int) in[i] * (unsigned int) sum_j) >> 2;
+ }
+ if (res != sum)
+ abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target vect_widen_mult_hi_to_si } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-section-anchors-vect-outer-4h.c
===================================================================
*** testsuite/gcc.dg/vect/no-section-anchors-vect-outer-4h.c (revision 0)
--- testsuite/gcc.dg/vect/no-section-anchors-vect-outer-4h.c (revision 0)
***************
*** 0 ****
--- 1,47 ----
+ /* { dg-require-effective-target vect_int } */
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+
+ #define N 40
+ #define M 128
+ unsigned short a[M][N];
+ unsigned int out[N];
+
+ /* Outer-loop vectorization. */
+
+ void
+ foo (){
+ int i,j;
+ unsigned int diff;
+
+ for (i = 0; i < N; i++) {
+ for (j = 0; j < M; j++) {
+ a[j][i] = 4;
+ }
+ out[i]=5;
+ }
+ }
+
+ int main (void)
+ {
+ int i, j;
+ check_vect ();
+
+ foo ();
+
+ for (i = 0; i < N; i++) {
+ for (j = 0; j < M; j++) {
+ if (a[j][i] != 4)
+ abort ();
+ }
+ if (out[i] != 5)
+ abort ();
+ }
+
+ return 0;
+ }
+
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-22.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-22.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-22.c (revision 0)
***************
*** 0 ****
--- 1,54 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (int n){
+ int i,j;
+ int sum;
+
+ if (n<=0)
+ return 0;
+
+ /* inner-loop index j used after the inner-loop */
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ for (j = 0; j < n; j+=2) {
+ sum += j;
+ }
+ a[i] = sum + j;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ foo (N);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j+=2)
+ sum += j;
+ if (a[i] != sum + j)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-3.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-3.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-3.c (revision 0)
***************
*** 0 ****
--- 1,51 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (){
+ int i,j;
+ int sum;
+
+ /* inner-loop step > 1 */
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ for (j = 0; j < N; j+=2) {
+ sum += j;
+ }
+ a[i] = sum;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ foo ();
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j+=2)
+ sum += j;
+ if (a[i] != sum)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/vect-outer-4c.c
===================================================================
*** testsuite/gcc.dg/vect/vect-outer-4c.c (revision 0)
--- testsuite/gcc.dg/vect/vect-outer-4c.c (revision 0)
***************
*** 0 ****
--- 1,27 ----
+ /* { dg-do compile } */
+
+ #define N 40
+ #define M 128
+ unsigned short in[N+M];
+ unsigned short coeff[M];
+ unsigned int out[N];
+
+ /* Outer-loop vectorization. */
+
+ void
+ foo (){
+ int i,j;
+ unsigned short diff;
+
+ for (i = 0; i < N; i++) {
+ diff = 0;
+ for (j = 0; j < M; j+=8) {
+ diff += in[j+i]*coeff[j];
+ }
+ out[i]=diff;
+ }
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { target vect_short_mult } } } */
+ /* { dg-final { scan-tree-dump-times "zero step in outer loop." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/vect-outer-4k.c
===================================================================
*** testsuite/gcc.dg/vect/vect-outer-4k.c (revision 0)
--- testsuite/gcc.dg/vect/vect-outer-4k.c (revision 0)
***************
*** 0 ****
--- 1,70 ----
+ /* { dg-require-effective-target vect_int } */
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+ #define M 128
+ unsigned short in[N+M];
+ unsigned int out[N];
+ unsigned char arr[N];
+
+ /* Outer-loop vectorization. */
+ /* Not vectorized due to multiple-types in the inner-loop. */
+
+ unsigned int
+ foo (){
+ int i,j;
+ unsigned int diff;
+ unsigned int s=0;
+
+ for (i = 0; i < N; i++) {
+ arr[i] = 3;
+ diff = 0;
+ for (j = 0; j < M; j+=8) {
+ diff += in[j+i];
+ }
+ s+=diff;
+ }
+ return s;
+ }
+
+ unsigned int
+ bar (int i, unsigned int diff, unsigned short *in)
+ {
+ int j;
+ for (j = 0; j < M; j+=8) {
+ diff += in[j+i];
+ }
+ return diff;
+ }
+
+ int main (void)
+ {
+ int i, j;
+ unsigned int diff;
+ unsigned int s=0,sum=0;
+
+ check_vect ();
+
+ for (i = 0; i < N+M; i++) {
+ in[i] = i;
+ }
+
+ sum=foo ();
+
+ for (i = 0; i < N; i++) {
+ arr[i] = 3;
+ diff = 0;
+ diff = bar (i, diff, in);
+ s += diff;
+ }
+
+ if (s != sum)
+ abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { scan-tree-dump-times "vect_recog_widen_sum_pattern: not allowed" 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-14.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-14.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-14.c (revision 0)
***************
*** 0 ****
--- 1,61 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 64
+
+ unsigned short
+ foo (short scale){
+ int i;
+ unsigned short j;
+ unsigned short sum = 0;
+ unsigned short sum_j;
+
+ for (i = 0; i < N; i++) {
+ sum_j = 0;
+ for (j = 0; j < N; j++) {
+ sum_j += j;
+ }
+ sum += sum_j;
+ }
+ return sum;
+ }
+
+ unsigned short
+ bar (void)
+ {
+ unsigned short j;
+ unsigned short sum_j;
+ sum_j = 0;
+ for (j = 0; j < N; j++) {
+ sum_j += j;
+ }
+ return sum_j;
+ }
+
+ int main (void)
+ {
+ int i;
+ unsigned short j, sum_j;
+ unsigned short sum = 0;
+ unsigned short res;
+
+ check_vect ();
+
+ res = foo (2);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum_j = bar();
+ sum += sum_j;
+ }
+ if (res != sum)
+ abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target vect_widen_mult_hi_to_si } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-tree-scev-cprop-vect-iv-1.c
===================================================================
*** testsuite/gcc.dg/vect/no-tree-scev-cprop-vect-iv-1.c (revision 127202)
--- testsuite/gcc.dg/vect/no-tree-scev-cprop-vect-iv-1.c (working copy)
***************
*** 1,34 ****
- /* { dg-require-effective-target vect_int } */
-
- #include <stdarg.h>
- #include "tree-vect.h"
-
- #define N 26
-
- int main1 (int X)
- {
- int s = X;
- int i;
-
- /* vectorization of reduction with induction.
- Need -fno-tree-scev-cprop or else the loop is eliminated. */
- for (i = 0; i < N; i++)
- s += i;
-
- return s;
- }
-
- int main (void)
- {
- int s;
- check_vect ();
-
- s = main1 (3);
- if (s != 328)
- abort ();
-
- return 0;
- }
-
- /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
- /* { dg-final { cleanup-tree-dump "vect" } } */
--- 0 ----
Index: testsuite/gcc.dg/vect/vect-outer-1.c
===================================================================
*** testsuite/gcc.dg/vect/vect-outer-1.c (revision 0)
--- testsuite/gcc.dg/vect/vect-outer-1.c (revision 0)
***************
*** 0 ****
--- 1,26 ----
+ /* { dg-do compile } */
+
+ #define N 40
+ signed short image[N][N] __attribute__ ((__aligned__(16)));
+ signed short block[N][N] __attribute__ ((__aligned__(16)));
+ signed short out[N] __attribute__ ((__aligned__(16)));
+
+ /* Can't do outer-loop vectorization because of non-consecutive access. */
+
+ void
+ foo (){
+ int i,j;
+ int diff;
+
+ for (i = 0; i < N; i++) {
+ diff = 0;
+ for (j = 0; j < N; j+=8) {
+ diff += (image[i][j] - block[i][j]);
+ }
+ out[i]=diff;
+ }
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { scan-tree-dump-times "strided access in outer loop" 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-4.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-4.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-4.c (revision 0)
***************
*** 0 ****
--- 1,55 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ /* induction variable k advances through inner and outer loops. */
+
+ int
+ foo (int n){
+ int i,j,k=0;
+ int sum;
+
+ if (n<=0)
+ return 0;
+
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ for (j = 0; j < n; j+=2) {
+ sum += k++;
+ }
+ a[i] = sum + j;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j,k=0;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ foo (N);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j+=2)
+ sum += k++;
+ if (a[i] != sum + j)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/vect-outer-4d.c
===================================================================
*** testsuite/gcc.dg/vect/vect-outer-4d.c (revision 0)
--- testsuite/gcc.dg/vect/vect-outer-4d.c (revision 0)
***************
*** 0 ****
--- 1,51 ----
+ /* { dg-require-effective-target vect_float } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+ #define M 128
+ float in[N+M];
+ float out[N];
+
+ /* Outer-loop vectorization. */
+
+ void
+ foo (){
+ int i,j;
+ float diff;
+
+ for (i = 0; i < N; i++) {
+ diff = 0;
+ for (j = 0; j < M; j+=4) {
+ diff += in[j+i];
+ }
+ out[i]=diff;
+ }
+ }
+
+ int main (void)
+ {
+ check_vect ();
+ int i, j;
+ float diff;
+
+ for (i = 0; i < N; i++)
+ in[i] = i;
+
+ foo ();
+
+ for (i = 0; i < N; i++) {
+ diff = 0;
+ for (j = 0; j < M; j+=4) {
+ diff += in[j+i];
+ }
+ if (out[i] != diff)
+ abort ();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/vect-outer-4l.c
===================================================================
*** testsuite/gcc.dg/vect/vect-outer-4l.c (revision 0)
--- testsuite/gcc.dg/vect/vect-outer-4l.c (revision 0)
***************
*** 0 ****
--- 1,70 ----
+ /* { dg-require-effective-target vect_int } */
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+ #define M 128
+ unsigned short in[N+M];
+ unsigned int out[N];
+ unsigned char arr[N];
+
+ /* Outer-loop vectorization. */
+ /* Not vectorized due to multiple-types in the inner-loop. */
+
+ unsigned int
+ foo (){
+ int i,j;
+ unsigned int diff;
+ unsigned int s=0;
+
+ for (i = 0; i < N; i++) {
+ arr[i] = 3;
+ diff = 0;
+ for (j = 0; j < M; j+=8) {
+ diff += in[j+i];
+ }
+ s+=diff;
+ }
+ return s;
+ }
+
+ unsigned int
+ bar (int i, unsigned int diff, unsigned short *in)
+ {
+ int j;
+ for (j = 0; j < M; j+=8) {
+ diff += in[j+i];
+ }
+ return diff;
+ }
+
+ int main (void)
+ {
+ int i, j;
+ unsigned int diff;
+ unsigned int s=0,sum=0;
+
+ check_vect ();
+
+ for (i = 0; i < N+M; i++) {
+ in[i] = i;
+ }
+
+ sum=foo ();
+
+ for (i = 0; i < N; i++) {
+ arr[i] = 3;
+ diff = 0;
+ diff = bar (i, diff, in);
+ s += diff;
+ }
+
+ if (s != sum)
+ abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { scan-tree-dump-times "vect_recog_widen_sum_pattern: not allowed" 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-15.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-15.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-15.c (revision 0)
***************
*** 0 ****
--- 1,48 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (int x){
+ int i,j;
+ int sum;
+
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ for (j = 0; j < N; j++) {
+ sum += j;
+ }
+ a[i] = sum + i + x;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+ int aa[N];
+
+ check_vect ();
+
+ foo (3);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += j;
+ if (a[i] != sum + i + 3)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/vect-outer-1a.c
===================================================================
*** testsuite/gcc.dg/vect/vect-outer-1a.c (revision 0)
--- testsuite/gcc.dg/vect/vect-outer-1a.c (revision 0)
***************
*** 0 ****
--- 1,28 ----
+ /* { dg-do compile } */
+
+ #define N 40
+ signed short image[N][N] __attribute__ ((__aligned__(16)));
+ signed short block[N][N] __attribute__ ((__aligned__(16)));
+
+ /* Can't do outer-loop vectorization because of non-consecutive access.
+ Currently fails to vectorize because the reduction pattern is not
+ recognized. */
+
+ int
+ foo (){
+ int i,j;
+ int diff = 0;
+
+ for (i = 0; i < N; i++) {
+ for (j = 0; j < N; j+=8) {
+ diff += (image[i][j] - block[i][j]);
+ }
+ }
+ return diff;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
+ /* FORNOW */
+ /* { dg-final { scan-tree-dump-times "strided access in outer loop" 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { scan-tree-dump-times "unexpected pattern" 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-tree-scev-cprop-vect-iv-2.c
===================================================================
*** testsuite/gcc.dg/vect/no-tree-scev-cprop-vect-iv-2.c (revision 127202)
--- testsuite/gcc.dg/vect/no-tree-scev-cprop-vect-iv-2.c (working copy)
***************
*** 1,49 ****
- /* { dg-require-effective-target vect_int } */
-
- #include <stdarg.h>
- #include "tree-vect.h"
-
- #define N 16
-
- int main1 ()
- {
- int arr1[N];
- int k = 0;
- int m = 3, i = 0;
-
- /* Vectorization of induction that is used after the loop.
- Currently vectorizable because scev_ccp disconnects the
- use-after-the-loop from the iv def inside the loop. */
-
- do {
- k = k + 2;
- arr1[i] = k;
- m = m + k;
- i++;
- } while (i < N);
-
- /* check results: */
- for (i = 0; i < N; i++)
- {
- if (arr1[i] != 2+2*i)
- abort ();
- }
-
- return m + k;
- }
-
- int main (void)
- {
- int res;
-
- check_vect ();
-
- res = main1 ();
- if (res != 32 + 275)
- abort ();
-
- return 0;
- }
-
- /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail *-*-* } } } */
- /* { dg-final { cleanup-tree-dump "vect" } } */
--- 0 ----
Index: testsuite/gcc.dg/vect/vect.exp
===================================================================
*** testsuite/gcc.dg/vect/vect.exp (revision 127202)
--- testsuite/gcc.dg/vect/vect.exp (working copy)
*************** dg-runtest [lsort [glob -nocomplain $src
*** 176,183 ****
# -fno-tree-scev-cprop
set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
lappend DEFAULT_VECTCFLAGS "-fno-tree-scev-cprop"
! dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-tree-scev-cprop-*.\[cS\]]] \
! "" $DEFAULT_VECTCFLAGS
# -fno-tree-dominator-opts
set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
--- 176,195 ----
# -fno-tree-scev-cprop
set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
lappend DEFAULT_VECTCFLAGS "-fno-tree-scev-cprop"
! dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-scevccp-vect-*.\[cS\]]] \
! "" $DEFAULT_VECTCFLAGS
!
! # -fno-tree-scev-cprop
! set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
! lappend DEFAULT_VECTCFLAGS "-fno-tree-scev-cprop"
! dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-scevccp-outer-*.\[cS\]]] \
! "" $DEFAULT_VECTCFLAGS
!
! # -fno-tree-scev-cprop -fno-tree-reassoc
! set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
! lappend DEFAULT_VECTCFLAGS "-fno-tree-scev-cprop" "-fno-tree-reassoc"
! dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-scevccp-noreassoc-*.\[cS\]]] \
! "" $DEFAULT_VECTCFLAGS
# -fno-tree-dominator-opts
set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
Index: testsuite/gcc.dg/vect/vect-outer-2.c
===================================================================
*** testsuite/gcc.dg/vect/vect-outer-2.c (revision 0)
--- testsuite/gcc.dg/vect/vect-outer-2.c (revision 0)
***************
*** 0 ****
--- 1,40 ----
+ /* { dg-require-effective-target vect_float } */
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+ float image[N][N] __attribute__ ((__aligned__(16)));
+ float out[N];
+
+ /* Outer-loop vectorization. */
+
+ void
+ foo (){
+ int i,j;
+
+ for (i = 0; i < N; i++) {
+ for (j = 0; j < N; j++) {
+ image[j][i] = j+i;
+ }
+ }
+ }
+
+ int main (void)
+ {
+ check_vect ();
+ int i, j;
+
+ foo ();
+
+ for (i = 0; i < N; i++) {
+ for (j = 0; j < N; j++) {
+ if (image[j][i] != j+i)
+ abort ();
+ }
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-5.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-5.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-5.c (revision 0)
***************
*** 0 ****
--- 1,53 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (){
+ int i,j;
+ int sum;
+
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ for (j = 0; j < N; j++) {
+ sum += j;
+ }
+ a[i] += sum + i;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+ int aa[N];
+
+ check_vect ();
+
+ for (i=0; i<N; i++){
+ a[i] = i;
+ aa[i] = i;
+ }
+
+ foo ();
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += j;
+ if (a[i] != aa[i] + sum + i)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-9a.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-9a.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-9a.c (revision 0)
***************
*** 0 ****
--- 1,54 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (int n){
+ int i,j;
+ int sum;
+
+ if (n<=0)
+ return 0;
+
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ j = 0;
+ do {
+ sum += j;
+ }while (++j < n);
+ a[i] = sum;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ foo (N);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += j;
+ if (a[i] != sum)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/vect-outer-4e.c
===================================================================
*** testsuite/gcc.dg/vect/vect-outer-4e.c (revision 0)
--- testsuite/gcc.dg/vect/vect-outer-4e.c (revision 0)
***************
*** 0 ****
--- 1,25 ----
+ /* { dg-do compile } */
+
+ #define N 40
+ #define M 128
+ unsigned int in[N+M];
+ unsigned short out[N];
+
+ /* Outer-loop vectorization. */
+
+ void
+ foo (){
+ int i,j;
+ unsigned int diff;
+
+ for (i = 0; i < N; i++) {
+ diff = 0;
+ for (j = 0; j < M; j+=8) {
+ diff += in[j+i];
+ }
+ out[i]=(unsigned short)diff;
+ }
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/vect-outer-4m.c
===================================================================
*** testsuite/gcc.dg/vect/vect-outer-4m.c (revision 0)
--- testsuite/gcc.dg/vect/vect-outer-4m.c (revision 0)
***************
*** 0 ****
--- 1,58 ----
+ /* { dg-require-effective-target vect_int } */
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+ #define M 128
+ unsigned short in[N+M];
+ unsigned int out[N];
+
+ /* Outer-loop vectorization. */
+ /* Not vectorized due to multiple-types in the inner-loop. */
+
+ unsigned int
+ foo (){
+ int i,j;
+ unsigned int diff;
+ unsigned int s=0;
+
+ for (i = 0; i < N; i++) {
+ diff = 0;
+ for (j = 0; j < M; j+=8) {
+ diff += in[j+i];
+ }
+ s+=((unsigned short)diff>>3);
+ }
+ return s;
+ }
+
+ int main (void)
+ {
+ int i, j;
+ unsigned int diff;
+ unsigned int s=0,sum=0;
+
+ check_vect ();
+
+ for (i = 0; i < N+M; i++) {
+ in[i] = i;
+ }
+
+ sum=foo ();
+
+ for (i = 0; i < N; i++) {
+ diff = 0;
+ for (j = 0; j < M; j+=8) {
+ diff += in[j+i];
+ }
+ s += ((unsigned short)diff>>3);
+ }
+
+ if (s != sum)
+ abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-16.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-16.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-16.c (revision 0)
***************
*** 0 ****
--- 1,62 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (){
+ int i;
+ unsigned short j;
+ int sum = 0;
+ unsigned short sum_j;
+
+ for (i = 0; i < N; i++) {
+ sum += i;
+
+ sum_j = 0;
+ for (j = 0; j < N; j++) {
+ sum_j += j;
+ }
+ a[i] = sum_j + 5;
+ }
+ return sum;
+ }
+
+ int main (void)
+ {
+ int i;
+ unsigned short j, sum_j;
+ int sum = 0;
+ int res;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ res = foo ();
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum += i;
+
+ sum_j = 0;
+ for (j = 0; j < N; j++){
+ sum_j += j;
+ }
+ if (a[i] != sum_j + 5)
+ abort();
+ }
+ if (res != sum)
+ abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/vect-outer-1b.c
===================================================================
*** testsuite/gcc.dg/vect/vect-outer-1b.c (revision 0)
--- testsuite/gcc.dg/vect/vect-outer-1b.c (revision 0)
***************
*** 0 ****
--- 1,26 ----
+ /* { dg-do compile } */
+
+ #define N 40
+ signed short image[N][N];
+ signed short block[N][N];
+ signed short out[N];
+
+ /* Outer-loop cannot get vectorized because of non-consecutive access. */
+
+ void
+ foo (){
+ int i,j;
+ int diff;
+
+ for (i = 0; i < N; i++) {
+ diff = 0;
+ for (j = 0; j < N; j+=4) {
+ diff += (image[i][j] - block[i][j]);
+ }
+ out[i]=diff;
+ }
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { scan-tree-dump-times "strided access in outer loop" 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/vect-outer-fir.c
===================================================================
*** testsuite/gcc.dg/vect/vect-outer-fir.c (revision 0)
--- testsuite/gcc.dg/vect/vect-outer-fir.c (revision 0)
***************
*** 0 ****
--- 1,77 ----
+ /* { dg-require-effective-target vect_float } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+ #define M 128
+ float in[N+M];
+ float coeff[M];
+ float out[N];
+ float fir_out[N];
+
+ /* Should be vectorized. Fixed misaligment in the inner-loop. */
+ /* Currently not vectorized because we get too many BBs in the inner-loop,
+ because the compiler doesn't realize that the inner-loop executes at
+ least once (cause k<4), and so there's no need to create a guard code
+ to skip the inner-loop in case it doesn't execute. */
+ void foo (){
+ int i,j,k;
+ float diff;
+
+ for (i = 0; i < N; i++) {
+ out[i] = 0;
+ }
+
+ for (k = 0; k < 4; k++) {
+ for (i = 0; i < N; i++) {
+ diff = 0;
+ for (j = k; j < M; j+=4) {
+ diff += in[j+i]*coeff[j];
+ }
+ out[i] += diff;
+ }
+ }
+
+ }
+
+ /* Vectorized. Changing misalignment in the inner-loop. */
+ void fir (){
+ int i,j,k;
+ float diff;
+
+ for (i = 0; i < N; i++) {
+ diff = 0;
+ for (j = 0; j < M; j++) {
+ diff += in[j+i]*coeff[j];
+ }
+ fir_out[i] = diff;
+ }
+ }
+
+
+ int main (void)
+ {
+ check_vect ();
+ int i, j;
+ float diff;
+
+ for (i = 0; i < M; i++)
+ coeff[i] = i;
+ for (i = 0; i < N+M; i++)
+ in[i] = i;
+
+ foo ();
+ fir ();
+
+ for (i = 0; i < N; i++) {
+ if (out[i] != fir_out[i])
+ abort ();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 2 "vect" { xfail *-*-* } } } */
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail vect_no_align } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/vect-outer-2a.c
===================================================================
*** testsuite/gcc.dg/vect/vect-outer-2a.c (revision 0)
--- testsuite/gcc.dg/vect/vect-outer-2a.c (revision 0)
***************
*** 0 ****
--- 1,41 ----
+ /* { dg-require-effective-target vect_float } */
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+ float image[N][N][N] __attribute__ ((__aligned__(16)));
+
+ void
+ foo (){
+ int i,j,k;
+
+ for (k=0; k<N; k++) {
+ for (i = 0; i < N; i++) {
+ for (j = 0; j < N; j++) {
+ image[k][j][i] = j+i+k;
+ }
+ }
+ }
+ }
+
+ int main (void)
+ {
+ check_vect ();
+ int i, j, k;
+
+ foo ();
+
+ for (k=0; k<N; k++) {
+ for (i = 0; i < N; i++) {
+ for (j = 0; j < N; j++) {
+ if (image[k][j][i] != j+i+k)
+ abort ();
+ }
+ }
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/vect-outer-3.c
===================================================================
*** testsuite/gcc.dg/vect/vect-outer-3.c (revision 0)
--- testsuite/gcc.dg/vect/vect-outer-3.c (revision 0)
***************
*** 0 ****
--- 1,52 ----
+ /* { dg-require-effective-target vect_float } */
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+ float image[N][N] __attribute__ ((__aligned__(16)));
+ float out[N];
+
+ /* Outer-loop vectoriation. */
+
+ void
+ foo (){
+ int i,j;
+ float diff;
+
+ for (i = 0; i < N; i++) {
+ diff = 0;
+ for (j = 0; j < N; j++) {
+ diff += image[j][i];
+ }
+ out[i]=diff;
+ }
+ }
+
+ int main (void)
+ {
+ check_vect ();
+ int i, j;
+ float diff;
+
+ for (i = 0; i < N; i++) {
+ for (j = 0; j < N; j++) {
+ image[i][j]=i+j;
+ }
+ }
+
+ foo ();
+
+ for (i = 0; i < N; i++) {
+ diff = 0;
+ for (j = 0; j < N; j++) {
+ diff += image[j][i];
+ }
+ if (out[i] != diff)
+ abort ();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-6.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-6.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-6.c (revision 0)
***************
*** 0 ****
--- 1,56 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int
+ foo (int * __restrict__ b, int k){
+ int i,j;
+ int sum,x;
+ int a[N];
+
+ for (i = 0; i < N; i++) {
+ sum = b[i];
+ for (j = 0; j < N; j++) {
+ sum += j;
+ }
+ a[i] = sum;
+ }
+
+ return a[k];
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+ int b[N];
+ int a[N];
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ b[i] = i + 2;
+
+ for (i=0; i<N; i++)
+ a[i] = foo (b,i);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = b[i];
+ for (j = 0; j < N; j++){
+ sum += j;
+ }
+ if (a[i] != sum)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-9b.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-9b.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-9b.c (revision 0)
***************
*** 0 ****
--- 1,53 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (int n){
+ int i,j;
+ int sum;
+
+ if (n<=0)
+ return 0;
+
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ for (j = 0; j < n; j++) {
+ sum += j;
+ }
+ a[i] = sum;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ foo (N);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += j;
+ if (a[i] != sum)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/vect-outer-4f.c
===================================================================
*** testsuite/gcc.dg/vect/vect-outer-4f.c (revision 0)
--- testsuite/gcc.dg/vect/vect-outer-4f.c (revision 0)
***************
*** 0 ****
--- 1,70 ----
+ /* { dg-require-effective-target vect_int } */
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+ #define M 128
+ unsigned short in[N+M];
+ unsigned int out[N];
+ unsigned char arr[N];
+
+ /* Outer-loop vectorization. */
+ /* Not vectorized due to multiple-types in the inner-loop. */
+
+ unsigned int
+ foo (){
+ int i,j;
+ unsigned int diff;
+ unsigned int s=0;
+
+ for (i = 0; i < N; i++) {
+ arr[i] = 3;
+ diff = 0;
+ for (j = 0; j < M; j+=8) {
+ diff += in[j+i];
+ }
+ s+=diff;
+ }
+ return s;
+ }
+
+ unsigned int
+ bar (int i, unsigned int diff, unsigned short *in)
+ {
+ int j;
+ for (j = 0; j < M; j+=8) {
+ diff += in[j+i];
+ }
+ return diff;
+ }
+
+ int main (void)
+ {
+ int i, j;
+ unsigned int diff;
+ unsigned int s=0,sum=0;
+
+ check_vect ();
+
+ for (i = 0; i < N+M; i++) {
+ in[i] = i;
+ }
+
+ sum=foo ();
+
+ for (i = 0; i < N; i++) {
+ arr[i] = 3;
+ diff = 0;
+ diff = bar (i, diff, in);
+ s += diff;
+ }
+
+ if (s != sum)
+ abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { scan-tree-dump-times "vect_recog_widen_sum_pattern: not allowed" 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-17.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-17.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-17.c (revision 0)
***************
*** 0 ****
--- 1,68 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+ int b[N];
+ int c[N];
+
+ int
+ foo (){
+ int i;
+ unsigned short j;
+ int sum = 0;
+ unsigned short sum_j;
+
+ for (i = 0; i < N; i++) {
+ int diff = b[i] - c[i];
+
+ sum_j = 0;
+ for (j = 0; j < N; j++) {
+ sum_j += j;
+ }
+ a[i] = sum_j + 5;
+
+ sum += diff;
+ }
+ return sum;
+ }
+
+ int main (void)
+ {
+ int i;
+ unsigned short j, sum_j;
+ int sum = 0;
+ int res;
+
+ check_vect ();
+
+ for (i=0; i<N; i++){
+ b[i] = i;
+ c[i] = 2*i;
+ }
+
+ res = foo ();
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum += (b[i] - c[i]);
+
+ sum_j = 0;
+ for (j = 0; j < N; j++){
+ sum_j += j;
+ }
+ if (a[i] != sum_j + 5)
+ abort();
+ }
+ if (res != sum)
+ abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: tree-vectorizer.c
===================================================================
*** tree-vectorizer.c (revision 127202)
--- tree-vectorizer.c (working copy)
*************** new_stmt_vec_info (tree stmt, loop_vec_i
*** 1345,1351 ****
STMT_VINFO_IN_PATTERN_P (res) = false;
STMT_VINFO_RELATED_STMT (res) = NULL;
STMT_VINFO_DATA_REF (res) = NULL;
! if (TREE_CODE (stmt) == PHI_NODE)
STMT_VINFO_DEF_TYPE (res) = vect_unknown_def_type;
else
STMT_VINFO_DEF_TYPE (res) = vect_loop_def;
--- 1345,1358 ----
STMT_VINFO_IN_PATTERN_P (res) = false;
STMT_VINFO_RELATED_STMT (res) = NULL;
STMT_VINFO_DATA_REF (res) = NULL;
!
! STMT_VINFO_DR_BASE_ADDRESS (res) = NULL;
! STMT_VINFO_DR_OFFSET (res) = NULL;
! STMT_VINFO_DR_INIT (res) = NULL;
! STMT_VINFO_DR_STEP (res) = NULL;
! STMT_VINFO_DR_ALIGNED_TO (res) = NULL;
!
! if (TREE_CODE (stmt) == PHI_NODE && is_loop_header_bb_p (bb_for_stmt (stmt)))
STMT_VINFO_DEF_TYPE (res) = vect_unknown_def_type;
else
STMT_VINFO_DEF_TYPE (res) = vect_loop_def;
*************** new_stmt_vec_info (tree stmt, loop_vec_i
*** 1364,1369 ****
--- 1371,1390 ----
}
+ /* Function bb_in_loop_p
+
+ Used as predicate for dfs order traversal of the loop bbs. */
+
+ static bool
+ bb_in_loop_p (basic_block bb, void *data)
+ {
+ struct loop *loop = (struct loop *)data;
+ if (flow_bb_inside_loop_p (loop, bb))
+ return true;
+ return false;
+ }
+
+
/* Function new_loop_vec_info.
Create and initialize a new loop_vec_info struct for LOOP, as well as
*************** new_loop_vec_info (struct loop *loop)
*** 1375,1392 ****
loop_vec_info res;
basic_block *bbs;
block_stmt_iterator si;
! unsigned int i;
res = (loop_vec_info) xcalloc (1, sizeof (struct _loop_vec_info));
bbs = get_loop_body (loop);
! /* Create stmt_info for all stmts in the loop. */
for (i = 0; i < loop->num_nodes; i++)
{
basic_block bb = bbs[i];
tree phi;
for (phi = phi_nodes (bb); phi; phi = PHI_CHAIN (phi))
{
stmt_ann_t ann = get_stmt_ann (phi);
--- 1396,1444 ----
loop_vec_info res;
basic_block *bbs;
block_stmt_iterator si;
! unsigned int i, nbbs;
res = (loop_vec_info) xcalloc (1, sizeof (struct _loop_vec_info));
+ LOOP_VINFO_LOOP (res) = loop;
bbs = get_loop_body (loop);
! /* Create/Update stmt_info for all stmts in the loop. */
for (i = 0; i < loop->num_nodes; i++)
{
basic_block bb = bbs[i];
tree phi;
+ /* BBs in a nested inner-loop will have been already processed (because
+ we will have called vect_analyze_loop_form for any nested inner-loop).
+ Therefore, for stmts in an inner-loop we just want to update the
+ STMT_VINFO_LOOP_VINFO field of their stmt_info to point to the new
+ loop_info of the outer-loop we are currently considering to vectorize
+ (instead of the loop_info of the inner-loop).
+ For stmts in other BBs we need to create a stmt_info from scratch. */
+ if (bb->loop_father != loop)
+ {
+ /* Inner-loop bb. */
+ gcc_assert (loop->inner && bb->loop_father == loop->inner);
+ for (phi = phi_nodes (bb); phi; phi = PHI_CHAIN (phi))
+ {
+ stmt_vec_info stmt_info = vinfo_for_stmt (phi);
+ loop_vec_info inner_loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ gcc_assert (loop->inner == LOOP_VINFO_LOOP (inner_loop_vinfo));
+ STMT_VINFO_LOOP_VINFO (stmt_info) = res;
+ }
+ for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
+ {
+ tree stmt = bsi_stmt (si);
+ stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+ loop_vec_info inner_loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ gcc_assert (loop->inner == LOOP_VINFO_LOOP (inner_loop_vinfo));
+ STMT_VINFO_LOOP_VINFO (stmt_info) = res;
+ }
+ }
+ else
+ {
+ /* bb in current nest. */
for (phi = phi_nodes (bb); phi; phi = PHI_CHAIN (phi))
{
stmt_ann_t ann = get_stmt_ann (phi);
*************** new_loop_vec_info (struct loop *loop)
*** 1396,1411 ****
for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
{
tree stmt = bsi_stmt (si);
! stmt_ann_t ann;
!
! ann = stmt_ann (stmt);
set_stmt_info (ann, new_stmt_vec_info (stmt, res));
}
}
- LOOP_VINFO_LOOP (res) = loop;
LOOP_VINFO_BBS (res) = bbs;
- LOOP_VINFO_EXIT_COND (res) = NULL;
LOOP_VINFO_NITERS (res) = NULL;
LOOP_VINFO_COST_MODEL_MIN_ITERS (res) = 0;
LOOP_VINFO_VECTORIZABLE_P (res) = 0;
--- 1448,1471 ----
for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
{
tree stmt = bsi_stmt (si);
! stmt_ann_t ann = stmt_ann (stmt);
set_stmt_info (ann, new_stmt_vec_info (stmt, res));
}
}
+ }
+
+ /* CHECKME: We want to visit all BBs before their successors (except for
+ latch blocks, for which this assertion wouldn't hold). In the simple
+ case of the loop forms we allow, a dfs order of the BBs would the same
+ as reversed postorder traversal, so we are safe. */
+
+ free (bbs);
+ bbs = XCNEWVEC (basic_block, loop->num_nodes);
+ nbbs = dfs_enumerate_from (loop->header, 0, bb_in_loop_p,
+ bbs, loop->num_nodes, loop);
+ gcc_assert (nbbs == loop->num_nodes);
LOOP_VINFO_BBS (res) = bbs;
LOOP_VINFO_NITERS (res) = NULL;
LOOP_VINFO_COST_MODEL_MIN_ITERS (res) = 0;
LOOP_VINFO_VECTORIZABLE_P (res) = 0;
*************** new_loop_vec_info (struct loop *loop)
*** 1427,1433 ****
stmts in the loop. */
void
! destroy_loop_vec_info (loop_vec_info loop_vinfo)
{
struct loop *loop;
basic_block *bbs;
--- 1487,1493 ----
stmts in the loop. */
void
! destroy_loop_vec_info (loop_vec_info loop_vinfo, bool clean_stmts)
{
struct loop *loop;
basic_block *bbs;
*************** destroy_loop_vec_info (loop_vec_info loo
*** 1443,1448 ****
--- 1503,1520 ----
bbs = LOOP_VINFO_BBS (loop_vinfo);
nbbs = loop->num_nodes;
+ if (!clean_stmts)
+ {
+ free (LOOP_VINFO_BBS (loop_vinfo));
+ free_data_refs (LOOP_VINFO_DATAREFS (loop_vinfo));
+ free_dependence_relations (LOOP_VINFO_DDRS (loop_vinfo));
+ VEC_free (tree, heap, LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo));
+
+ free (loop_vinfo);
+ loop->aux = NULL;
+ return;
+ }
+
for (j = 0; j < nbbs; j++)
{
basic_block bb = bbs[j];
*************** get_vectype_for_scalar_type (tree scalar
*** 1586,1608 ****
enum dr_alignment_support
vect_supportable_dr_alignment (struct data_reference *dr)
{
! tree vectype = STMT_VINFO_VECTYPE (vinfo_for_stmt (DR_STMT (dr)));
enum machine_mode mode = (int) TYPE_MODE (vectype);
if (aligned_access_p (dr))
return dr_aligned;
/* Possibly unaligned access. */
!
if (DR_IS_READ (dr))
{
if (vec_realign_load_optab->handlers[mode].insn_code != CODE_FOR_nothing
&& (!targetm.vectorize.builtin_mask_for_load
|| targetm.vectorize.builtin_mask_for_load ()))
! return dr_unaligned_software_pipeline;
!
if (movmisalign_optab->handlers[mode].insn_code != CODE_FOR_nothing)
- /* Can't software pipeline the loads, but can at least do them. */
return dr_unaligned_supported;
}
--- 1658,1778 ----
enum dr_alignment_support
vect_supportable_dr_alignment (struct data_reference *dr)
{
! tree stmt = DR_STMT (dr);
! stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
! tree vectype = STMT_VINFO_VECTYPE (stmt_info);
enum machine_mode mode = (int) TYPE_MODE (vectype);
+ struct loop *vect_loop = LOOP_VINFO_LOOP (STMT_VINFO_LOOP_VINFO (stmt_info));
+ bool nested_in_vect_loop = nested_in_vect_loop_p (vect_loop, stmt);
+ bool invariant_in_outerloop = false;
if (aligned_access_p (dr))
return dr_aligned;
+ if (nested_in_vect_loop)
+ {
+ tree outerloop_step = STMT_VINFO_DR_STEP (stmt_info);
+ invariant_in_outerloop =
+ (tree_int_cst_compare (outerloop_step, size_zero_node) == 0);
+ }
+
/* Possibly unaligned access. */
!
! /* We can choose between using the implicit realignment scheme (generating
! a misaligned_move stmt) and the explicit realignment scheme (generating
! aligned loads with a REALIGN_LOAD). There are two variants to the explicit
! realignment scheme: optimized, and unoptimized.
! We can optimize the realignment only if the step between consecutive
! vector loads is equal to the vector size. Since the vector memory
! accesses advance in steps of VS (Vector Size) in the vectorized loop, it
! is guaranteed that the misalignment amount remains the same throughout the
! execution of the vectorized loop. Therefore, we can create the
! "realignment token" (the permutation mask that is passed to REALIGN_LOAD)
! at the loop preheader.
!
! However, in the case of outer-loop vectorization, when vectorizing a
! memory access in the inner-loop nested within the LOOP that is now being
! vectorized, while it is guaranteed that the misalignment of the
! vectorized memory access will remain the same in different outer-loop
! iterations, it is *not* guaranteed that is will remain the same throughout
! the execution of the inner-loop. This is because the inner-loop advances
! with the original scalar step (and not in steps of VS). If the inner-loop
! step happens to be a multiple of VS, then the misalignment remaines fixed
! and we can use the optimized relaignment scheme. For example:
!
! for (i=0; i<N; i++)
! for (j=0; j<M; j++)
! s += a[i+j];
!
! When vectorizing the i-loop in the above example, the step between
! consecutive vector loads is 1, and so the misalignment does not remain
! fixed across the execution of the inner-loop, and the realignment cannot
! be optimized (as illustrated in the following pseudo vectorized loop):
!
! for (i=0; i<N; i+=4)
! for (j=0; j<M; j++){
! vs += vp[i+j]; // misalignment of &vp[i+j] is {0,1,2,3,0,1,2,3,...}
! // when j is {0,1,2,3,4,5,6,7,...} respectively.
! // (assuming that we start from an aligned address).
! }
!
! We therefore have to use the unoptimized realignment scheme:
!
! for (i=0; i<N; i+=4)
! for (j=0; j<M; j++){
! rt = get_realignment_token (&vp[i+j]);
! v1 = vp[i+j];
! v2 = vp[i+j+VS-1];
! va = REALIGN_LOAD <v1,v2,rt>;
! vs += va;
! }
!
! On the other hand, when vectorizing the i-loop in the following example
! (that implements the same computation as above):
!
! for (k=0; k<4; k++)
! for (i=0; i<N; i++)
! for (j=k; j<M; j+=4)
! s += a[i+j];
!
! the step between consecutive vector loads is 4, which (if assuming that
! the vector size is also 4) can be optimized:
!
! for (k=0; k<4; k++)
! for (i=0; i<N; i+=4)
! for (j=k; j<M; j+=4)
! vs += vp[i+j]; // misalignment of &vp[i+j] is always k (assuming
! // that the misalignment of the initial address is
! // 0).
!
! The loop can then be vectorized as follows:
!
! for (k=0; k<4; k++){
! rt = get_realignment_token (&vp[k]);
! for (i=0; i<N; i+=4){
! v1 = vp[i+k];
! for (j=k; j<M; j+=4){
! v2 = vp[i+j+VS-1];
! va = REALIGN_LOAD <v1,v2,rt>;
! vs += va;
! v1 = v2;
! }
! }
! } */
!
if (DR_IS_READ (dr))
{
if (vec_realign_load_optab->handlers[mode].insn_code != CODE_FOR_nothing
&& (!targetm.vectorize.builtin_mask_for_load
|| targetm.vectorize.builtin_mask_for_load ()))
! {
! if (nested_in_vect_loop
! && TREE_INT_CST_LOW (DR_STEP (dr)) != UNITS_PER_SIMD_WORD)
! return dr_explicit_realign;
! else
! return dr_explicit_realign_optimized;
! }
if (movmisalign_optab->handlers[mode].insn_code != CODE_FOR_nothing)
return dr_unaligned_supported;
}
*************** vect_is_simple_use (tree operand, loop_v
*** 1714,1721 ****
{
case PHI_NODE:
*def = PHI_RESULT (*def_stmt);
- gcc_assert (*dt == vect_induction_def || *dt == vect_reduction_def
- || *dt == vect_invariant_def);
break;
case GIMPLE_MODIFY_STMT:
--- 1884,1889 ----
*************** supportable_widening_operation (enum tre
*** 1756,1761 ****
--- 1924,1931 ----
enum tree_code *code1, enum tree_code *code2)
{
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+ loop_vec_info loop_info = STMT_VINFO_LOOP_VINFO (stmt_info);
+ struct loop *vect_loop = LOOP_VINFO_LOOP (loop_info);
bool ordered_p;
enum machine_mode vec_mode;
enum insn_code icode1, icode2;
*************** supportable_widening_operation (enum tre
*** 1778,1786 ****
Some targets can take advantage of this and generate more efficient code.
For example, targets like Altivec, that support widen_mult using a sequence
of {mult_even,mult_odd} generate the following vectors:
! vect1: [res1,res3,res5,res7], vect2: [res2,res4,res6,res8]. */
! if (STMT_VINFO_RELEVANT (stmt_info) == vect_used_by_reduction)
ordered_p = false;
else
ordered_p = true;
--- 1948,1962 ----
Some targets can take advantage of this and generate more efficient code.
For example, targets like Altivec, that support widen_mult using a sequence
of {mult_even,mult_odd} generate the following vectors:
! vect1: [res1,res3,res5,res7], vect2: [res2,res4,res6,res8].
!
! When vectorizaing outer-loops, we execute the inner-loop sequentially
! (each vectorized inner-loop iteration contributes to VF outer-loop
! iterations in parallel). We therefore don't allow to change the order
! of the computation in the inner-loop during outer-loop vectorization. */
! if (STMT_VINFO_RELEVANT (stmt_info) == vect_used_by_reduction
! && !nested_in_vect_loop_p (vect_loop, stmt))
ordered_p = false;
else
ordered_p = true;
*************** reduction_code_for_scalar_code (enum tre
*** 2004,2011 ****
Conditions 2,3 are tested in vect_mark_stmts_to_be_vectorized. */
tree
! vect_is_simple_reduction (struct loop *loop, tree phi)
{
edge latch_e = loop_latch_edge (loop);
tree loop_arg = PHI_ARG_DEF_FROM_EDGE (phi, latch_e);
tree def_stmt, def1, def2;
--- 2180,2189 ----
Conditions 2,3 are tested in vect_mark_stmts_to_be_vectorized. */
tree
! vect_is_simple_reduction (loop_vec_info loop_info, tree phi)
{
+ struct loop *loop = (bb_for_stmt (phi))->loop_father;
+ struct loop *vect_loop = LOOP_VINFO_LOOP (loop_info);
edge latch_e = loop_latch_edge (loop);
tree loop_arg = PHI_ARG_DEF_FROM_EDGE (phi, latch_e);
tree def_stmt, def1, def2;
*************** vect_is_simple_reduction (struct loop *l
*** 2018,2023 ****
--- 2196,2203 ----
imm_use_iterator imm_iter;
use_operand_p use_p;
+ gcc_assert (loop == vect_loop || flow_loop_nested_p (vect_loop, loop));
+
name = PHI_RESULT (phi);
nloop_uses = 0;
FOR_EACH_IMM_USE_FAST (use_p, imm_iter, name)
*************** vect_is_simple_reduction (struct loop *l
*** 2129,2136 ****
return NULL_TREE;
}
/* CHECKME: check for !flag_finite_math_only too? */
! if (SCALAR_FLOAT_TYPE_P (type) && !flag_unsafe_math_optimizations)
{
/* Changing the order of operations changes the semantics. */
if (vect_print_dump_info (REPORT_DETAILS))
--- 2309,2324 ----
return NULL_TREE;
}
+ /* Generally, when vectorizing a reduction we change the order of the
+ computation. This may change the behavior of the program in some
+ cases, so we need to check that this is ok. One exception is when
+ vectorizing an outer-loop: the inner-loop is executed sequentially,
+ and therefore vectorizing reductions in the inner-loop durint
+ outer-loop vectorization is safe. */
+
/* CHECKME: check for !flag_finite_math_only too? */
! if (SCALAR_FLOAT_TYPE_P (type) && !flag_unsafe_math_optimizations
! && !nested_in_vect_loop_p (vect_loop, def_stmt))
{
/* Changing the order of operations changes the semantics. */
if (vect_print_dump_info (REPORT_DETAILS))
*************** vect_is_simple_reduction (struct loop *l
*** 2140,2146 ****
}
return NULL_TREE;
}
! else if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_TRAPS (type))
{
/* Changing the order of operations changes the semantics. */
if (vect_print_dump_info (REPORT_DETAILS))
--- 2328,2335 ----
}
return NULL_TREE;
}
! else if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_TRAPS (type)
! && !nested_in_vect_loop_p (vect_loop, def_stmt))
{
/* Changing the order of operations changes the semantics. */
if (vect_print_dump_info (REPORT_DETAILS))
*************** vect_is_simple_reduction (struct loop *l
*** 2169,2181 ****
/* Check that one def is the reduction def, defined by PHI,
! the other def is either defined in the loop by a GIMPLE_MODIFY_STMT,
! or it's an induction (defined by some phi node). */
if (def2 == phi
&& flow_bb_inside_loop_p (loop, bb_for_stmt (def1))
&& (TREE_CODE (def1) == GIMPLE_MODIFY_STMT
! || STMT_VINFO_DEF_TYPE (vinfo_for_stmt (def1)) == vect_induction_def))
{
if (vect_print_dump_info (REPORT_DETAILS))
{
--- 2358,2373 ----
/* Check that one def is the reduction def, defined by PHI,
! the other def is either defined in the loop ("vect_loop_def"),
! or it's an induction (defined by a loop-header phi-node). */
if (def2 == phi
&& flow_bb_inside_loop_p (loop, bb_for_stmt (def1))
&& (TREE_CODE (def1) == GIMPLE_MODIFY_STMT
! || STMT_VINFO_DEF_TYPE (vinfo_for_stmt (def1)) == vect_induction_def
! || (TREE_CODE (def1) == PHI_NODE
! && STMT_VINFO_DEF_TYPE (vinfo_for_stmt (def1)) == vect_loop_def
! && !is_loop_header_bb_p (bb_for_stmt (def1)))))
{
if (vect_print_dump_info (REPORT_DETAILS))
{
*************** vect_is_simple_reduction (struct loop *l
*** 2187,2193 ****
else if (def1 == phi
&& flow_bb_inside_loop_p (loop, bb_for_stmt (def2))
&& (TREE_CODE (def2) == GIMPLE_MODIFY_STMT
! || STMT_VINFO_DEF_TYPE (vinfo_for_stmt (def2)) == vect_induction_def))
{
/* Swap operands (just for simplicity - so that the rest of the code
can assume that the reduction variable is always the last (second)
--- 2379,2388 ----
else if (def1 == phi
&& flow_bb_inside_loop_p (loop, bb_for_stmt (def2))
&& (TREE_CODE (def2) == GIMPLE_MODIFY_STMT
! || STMT_VINFO_DEF_TYPE (vinfo_for_stmt (def2)) == vect_induction_def
! || (TREE_CODE (def2) == PHI_NODE
! && STMT_VINFO_DEF_TYPE (vinfo_for_stmt (def2)) == vect_loop_def
! && !is_loop_header_bb_p (bb_for_stmt (def2)))))
{
/* Swap operands (just for simplicity - so that the rest of the code
can assume that the reduction variable is always the last (second)
*************** vectorize_loops (void)
*** 2326,2332 ****
if (!loop)
continue;
loop_vinfo = loop->aux;
! destroy_loop_vec_info (loop_vinfo);
loop->aux = NULL;
}
--- 2521,2527 ----
if (!loop)
continue;
loop_vinfo = loop->aux;
! destroy_loop_vec_info (loop_vinfo, true);
loop->aux = NULL;
}
Index: tree-vectorizer.h
===================================================================
*** tree-vectorizer.h (revision 127202)
--- tree-vectorizer.h (working copy)
*************** enum operation_type {
*** 53,59 ****
enum dr_alignment_support {
dr_unaligned_unsupported,
dr_unaligned_supported,
! dr_unaligned_software_pipeline,
dr_aligned
};
--- 53,60 ----
enum dr_alignment_support {
dr_unaligned_unsupported,
dr_unaligned_supported,
! dr_explicit_realign,
! dr_explicit_realign_optimized,
dr_aligned
};
*************** typedef struct _loop_vec_info {
*** 92,100 ****
/* The loop basic blocks. */
basic_block *bbs;
- /* The loop exit_condition. */
- tree exit_cond;
-
/* Number of iterations. */
tree num_iters;
--- 93,98 ----
*************** typedef struct _loop_vec_info {
*** 144,150 ****
/* Access Functions. */
#define LOOP_VINFO_LOOP(L) (L)->loop
#define LOOP_VINFO_BBS(L) (L)->bbs
- #define LOOP_VINFO_EXIT_COND(L) (L)->exit_cond
#define LOOP_VINFO_NITERS(L) (L)->num_iters
#define LOOP_VINFO_COST_MODEL_MIN_ITERS(L) (L)->min_profitable_iters
#define LOOP_VINFO_VECTORIZABLE_P(L) (L)->vectorizable
--- 142,147 ----
*************** typedef struct _loop_vec_info {
*** 165,170 ****
--- 162,180 ----
#define LOOP_VINFO_NITERS_KNOWN_P(L) \
NITERS_KNOWN_P((L)->num_iters)
+ static inline loop_vec_info
+ loop_vec_info_for_loop (struct loop *loop)
+ {
+ return (loop_vec_info) loop->aux;
+ }
+
+ static inline bool
+ nested_in_vect_loop_p (struct loop *loop, tree stmt)
+ {
+ return (loop->inner
+ && (loop->inner == (bb_for_stmt (stmt))->loop_father));
+ }
+
/*-----------------------------------------------------------------*/
/* Info on vectorized defs. */
/*-----------------------------------------------------------------*/
*************** enum stmt_vec_info_type {
*** 180,191 ****
induc_vec_info_type,
type_promotion_vec_info_type,
type_demotion_vec_info_type,
! type_conversion_vec_info_type
};
/* Indicates whether/how a variable is used in the loop. */
enum vect_relevant {
vect_unused_in_loop = 0,
/* defs that feed computations that end up (only) in a reduction. These
defs may be used by non-reduction stmts, but eventually, any
--- 190,204 ----
induc_vec_info_type,
type_promotion_vec_info_type,
type_demotion_vec_info_type,
! type_conversion_vec_info_type,
! loop_exit_ctrl_vec_info_type
};
/* Indicates whether/how a variable is used in the loop. */
enum vect_relevant {
vect_unused_in_loop = 0,
+ vect_used_in_outer_by_reduction,
+ vect_used_in_outer,
/* defs that feed computations that end up (only) in a reduction. These
defs may be used by non-reduction stmts, but eventually, any
*************** typedef struct _stmt_vec_info {
*** 232,240 ****
data-ref (array/pointer/struct access). A GIMPLE stmt is expected to have
at most one such data-ref. **/
! /* Information about the data-ref (access function, etc). */
struct data_reference *data_ref_info;
/* Stmt is part of some pattern (computation idiom) */
bool in_pattern_p;
--- 245,262 ----
data-ref (array/pointer/struct access). A GIMPLE stmt is expected to have
at most one such data-ref. **/
! /* Information about the data-ref (access function, etc),
! relative to the inner-most containing loop. */
struct data_reference *data_ref_info;
+ /* Information about the data-ref relative to this loop
+ nest (the loop that is being considered for vectorization). */
+ tree dr_base_address;
+ tree dr_init;
+ tree dr_offset;
+ tree dr_step;
+ tree dr_aligned_to;
+
/* Stmt is part of some pattern (computation idiom) */
bool in_pattern_p;
*************** typedef struct _stmt_vec_info {
*** 293,298 ****
--- 315,327 ----
#define STMT_VINFO_VECTYPE(S) (S)->vectype
#define STMT_VINFO_VEC_STMT(S) (S)->vectorized_stmt
#define STMT_VINFO_DATA_REF(S) (S)->data_ref_info
+
+ #define STMT_VINFO_DR_BASE_ADDRESS(S) (S)->dr_base_address
+ #define STMT_VINFO_DR_INIT(S) (S)->dr_init
+ #define STMT_VINFO_DR_OFFSET(S) (S)->dr_offset
+ #define STMT_VINFO_DR_STEP(S) (S)->dr_step
+ #define STMT_VINFO_DR_ALIGNED_TO(S) (S)->dr_aligned_to
+
#define STMT_VINFO_IN_PATTERN_P(S) (S)->in_pattern_p
#define STMT_VINFO_RELATED_STMT(S) (S)->related_stmt
#define STMT_VINFO_SAME_ALIGN_REFS(S) (S)->same_align_refs
*************** is_pattern_stmt_p (stmt_vec_info stmt_in
*** 403,408 ****
--- 432,446 ----
return false;
}
+ static inline bool
+ is_loop_header_bb_p (basic_block bb)
+ {
+ if (bb == (bb->loop_father)->header)
+ return true;
+ gcc_assert (EDGE_COUNT (bb->preds) == 1);
+ return false;
+ }
+
/*-----------------------------------------------------------------*/
/* Info on data references alignment. */
/*-----------------------------------------------------------------*/
*************** extern tree get_vectype_for_scalar_type
*** 462,468 ****
extern bool vect_is_simple_use (tree, loop_vec_info, tree *, tree *,
enum vect_def_type *);
extern bool vect_is_simple_iv_evolution (unsigned, tree, tree *, tree *);
! extern tree vect_is_simple_reduction (struct loop *, tree);
extern bool vect_can_force_dr_alignment_p (tree, unsigned int);
extern enum dr_alignment_support vect_supportable_dr_alignment
(struct data_reference *);
--- 500,506 ----
extern bool vect_is_simple_use (tree, loop_vec_info, tree *, tree *,
enum vect_def_type *);
extern bool vect_is_simple_iv_evolution (unsigned, tree, tree *, tree *);
! extern tree vect_is_simple_reduction (loop_vec_info, tree);
extern bool vect_can_force_dr_alignment_p (tree, unsigned int);
extern enum dr_alignment_support vect_supportable_dr_alignment
(struct data_reference *);
*************** extern bool supportable_narrowing_operat
*** 474,480 ****
/* Creation and deletion of loop and stmt info structs. */
extern loop_vec_info new_loop_vec_info (struct loop *loop);
! extern void destroy_loop_vec_info (loop_vec_info);
extern stmt_vec_info new_stmt_vec_info (tree stmt, loop_vec_info);
--- 512,518 ----
/* Creation and deletion of loop and stmt info structs. */
extern loop_vec_info new_loop_vec_info (struct loop *loop);
! extern void destroy_loop_vec_info (loop_vec_info, bool);
extern stmt_vec_info new_stmt_vec_info (tree stmt, loop_vec_info);
Index: tree-data-ref.c
===================================================================
*** tree-data-ref.c (revision 127202)
--- tree-data-ref.c (working copy)
*************** dump_ddrs (FILE *file, VEC (ddr_p, heap)
*** 489,495 ****
/* Expresses EXP as VAR + OFF, where off is a constant. The type of OFF
will be ssizetype. */
! static void
split_constant_offset (tree exp, tree *var, tree *off)
{
tree type = TREE_TYPE (exp), otype;
--- 489,495 ----
/* Expresses EXP as VAR + OFF, where off is a constant. The type of OFF
will be ssizetype. */
! void
split_constant_offset (tree exp, tree *var, tree *off)
{
tree type = TREE_TYPE (exp), otype;
Index: tree-data-ref.h
===================================================================
*** tree-data-ref.h (revision 127202)
--- tree-data-ref.h (working copy)
*************** index_in_loop_nest (int var, VEC (loop_p
*** 388,391 ****
--- 388,394 ----
/* In lambda-code.c */
bool lambda_transform_legal_p (lambda_trans_matrix, int, VEC (ddr_p, heap) *);
+ /* In tree-data-refs.c */
+ void split_constant_offset (tree , tree *, tree *);
+
#endif /* GCC_TREE_DATA_REF_H */
Index: tree-vect-analyze.c
===================================================================
*** tree-vect-analyze.c (revision 127202)
--- tree-vect-analyze.c (working copy)
*************** vect_analyze_operations (loop_vec_info l
*** 325,330 ****
--- 325,348 ----
print_generic_expr (vect_dump, phi, TDF_SLIM);
}
+ if (! is_loop_header_bb_p (bb))
+ {
+ /* inner-loop loop-closed exit phi in outer-loop vectorization
+ (i.e. a phi in the tail of the outer-loop).
+ FORNOW: we currently don't support the case that these phis
+ are not used in the outerloop, cause this case requires
+ to actually do something here. */
+ if (!STMT_VINFO_RELEVANT_P (stmt_info)
+ || STMT_VINFO_LIVE_P (stmt_info))
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump,
+ "Unsupported loop-closed phi in outer-loop.");
+ return false;
+ }
+ continue;
+ }
+
gcc_assert (stmt_info);
if (STMT_VINFO_LIVE_P (stmt_info))
*************** vect_analyze_operations (loop_vec_info l
*** 398,404 ****
break;
case vect_reduction_def:
! gcc_assert (relevance == vect_unused_in_loop);
break;
case vect_induction_def:
--- 416,424 ----
break;
case vect_reduction_def:
! gcc_assert (relevance == vect_used_in_outer
! || relevance == vect_used_in_outer_by_reduction
! || relevance == vect_unused_in_loop);
break;
case vect_induction_def:
*************** exist_non_indexing_operands_for_use_p (t
*** 589,638 ****
}
! /* Function vect_analyze_scalar_cycles.
!
! Examine the cross iteration def-use cycles of scalar variables, by
! analyzing the loop (scalar) PHIs; Classify each cycle as one of the
! following: invariant, induction, reduction, unknown.
!
! Some forms of scalar cycles are not yet supported.
!
! Example1: reduction: (unsupported yet)
!
! loop1:
! for (i=0; i<N; i++)
! sum += a[i];
!
! Example2: induction: (unsupported yet)
!
! loop2:
! for (i=0; i<N; i++)
! a[i] = i;
!
! Note: the following loop *is* vectorizable:
!
! loop3:
! for (i=0; i<N; i++)
! a[i] = b[i];
! even though it has a def-use cycle caused by the induction variable i:
!
! loop: i_2 = PHI (i_0, i_1)
! a[i_2] = ...;
! i_1 = i_2 + 1;
! GOTO loop;
!
! because the def-use cycle in loop3 is considered "not relevant" - i.e.,
! it does not need to be vectorized because it is only used for array
! indexing (see 'mark_stmts_to_be_vectorized'). The def-use cycle in
! loop2 on the other hand is relevant (it is being written to memory).
! */
static void
! vect_analyze_scalar_cycles (loop_vec_info loop_vinfo)
{
tree phi;
- struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
basic_block bb = loop->header;
tree dumy;
VEC(tree,heap) *worklist = VEC_alloc (tree, heap, 64);
--- 609,625 ----
}
! /* Function vect_analyze_scalar_cycles_1.
! Examine the cross iteration def-use cycles of scalar variables
! in LOOP. LOOP_VINFO represents the loop that is noe being
! considered for vectorization (can be LOOP, or an outer-loop
! enclosing LOOP). */
static void
! vect_analyze_scalar_cycles_1 (loop_vec_info loop_vinfo, struct loop *loop)
{
tree phi;
basic_block bb = loop->header;
tree dumy;
VEC(tree,heap) *worklist = VEC_alloc (tree, heap, 64);
*************** vect_analyze_scalar_cycles (loop_vec_inf
*** 698,704 ****
gcc_assert (is_gimple_reg (SSA_NAME_VAR (def)));
gcc_assert (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_unknown_def_type);
! reduc_stmt = vect_is_simple_reduction (loop, phi);
if (reduc_stmt)
{
if (vect_print_dump_info (REPORT_DETAILS))
--- 685,691 ----
gcc_assert (is_gimple_reg (SSA_NAME_VAR (def)));
gcc_assert (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_unknown_def_type);
! reduc_stmt = vect_is_simple_reduction (loop_vinfo, phi);
if (reduc_stmt)
{
if (vect_print_dump_info (REPORT_DETAILS))
*************** vect_analyze_scalar_cycles (loop_vec_inf
*** 717,722 ****
--- 704,751 ----
}
+ /* Function vect_analyze_scalar_cycles.
+
+ Examine the cross iteration def-use cycles of scalar variables, by
+ analyzing the loop-header PHIs of scalar variables; Classify each
+ cycle as one of the following: invariant, induction, reduction, unknown.
+ We do that for the loop represented by LOOP_VINFO, and also to its
+ inner-loop, if exists.
+ Examples for scalar cycles:
+
+ Example1: reduction:
+
+ loop1:
+ for (i=0; i<N; i++)
+ sum += a[i];
+
+ Example2: induction:
+
+ loop2:
+ for (i=0; i<N; i++)
+ a[i] = i; */
+
+ static void
+ vect_analyze_scalar_cycles (loop_vec_info loop_vinfo)
+ {
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+
+ vect_analyze_scalar_cycles_1 (loop_vinfo, loop);
+
+ /* When vectorizing an outer-loop, the inner-loop is executed sequentially.
+ Reductions in such inner-loop therefore have different properties than
+ the reductions in the nest that gets vectorized:
+ 1. When vectorized, they are executed in the same order as in the original
+ scalar loop, so we can't change the order of computation when
+ vectorizing them.
+ 2. FIXME: Inner-loop reductions can be used in the inner-loop, so the
+ current checks are too strict. */
+
+ if (loop->inner)
+ vect_analyze_scalar_cycles_1 (loop_vinfo, loop->inner);
+ }
+
+
/* Function vect_insert_into_interleaving_chain.
Insert DRA into the interleaving chain of DRB according to DRA's INIT. */
*************** vect_compute_data_ref_alignment (struct
*** 1164,1169 ****
--- 1193,1200 ----
{
tree stmt = DR_STMT (dr);
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+ loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
tree ref = DR_REF (dr);
tree vectype;
tree base, base_addr;
*************** vect_compute_data_ref_alignment (struct
*** 1180,1192 ****
misalign = DR_INIT (dr);
aligned_to = DR_ALIGNED_TO (dr);
base_addr = DR_BASE_ADDRESS (dr);
base = build_fold_indirect_ref (base_addr);
vectype = STMT_VINFO_VECTYPE (stmt_info);
alignment = ssize_int (TYPE_ALIGN (vectype)/BITS_PER_UNIT);
! if (tree_int_cst_compare (aligned_to, alignment) < 0)
{
! if (vect_print_dump_info (REPORT_DETAILS))
{
fprintf (vect_dump, "Unknown alignment for access: ");
print_generic_expr (vect_dump, base, TDF_SLIM);
--- 1211,1252 ----
misalign = DR_INIT (dr);
aligned_to = DR_ALIGNED_TO (dr);
base_addr = DR_BASE_ADDRESS (dr);
+
+ /* In case the dataref is in an inner-loop of the loop that is being
+ vectorized (LOOP), we use the base and misalignment information
+ relative to the outer-loop (LOOP). This is ok only if the misalignment
+ stays the same throughout the execution of the inner-loop, which is why
+ we have to check that the stride of the dataref in the inner-loop evenly
+ divides by the vector size. */
+ if (nested_in_vect_loop_p (loop, stmt))
+ {
+ tree step = DR_STEP (dr);
+ HOST_WIDE_INT dr_step = TREE_INT_CST_LOW (step);
+
+ if (dr_step % UNITS_PER_SIMD_WORD == 0)
+ {
+ if (vect_print_dump_info (REPORT_ALIGNMENT))
+ fprintf (vect_dump, "inner step divides the vector-size.");
+ misalign = STMT_VINFO_DR_INIT (stmt_info);
+ aligned_to = STMT_VINFO_DR_ALIGNED_TO (stmt_info);
+ base_addr = STMT_VINFO_DR_BASE_ADDRESS (stmt_info);
+ }
+ else
+ {
+ if (vect_print_dump_info (REPORT_ALIGNMENT))
+ fprintf (vect_dump, "inner step doesn't divide the vector-size.");
+ misalign = NULL_TREE;
+ }
+ }
+
base = build_fold_indirect_ref (base_addr);
vectype = STMT_VINFO_VECTYPE (stmt_info);
alignment = ssize_int (TYPE_ALIGN (vectype)/BITS_PER_UNIT);
! if ((aligned_to && tree_int_cst_compare (aligned_to, alignment) < 0)
! || !misalign)
{
! if (vect_print_dump_info (REPORT_ALIGNMENT))
{
fprintf (vect_dump, "Unknown alignment for access: ");
print_generic_expr (vect_dump, base, TDF_SLIM);
*************** vect_enhance_data_refs_alignment (loop_v
*** 1722,1728 ****
4) all misaligned data refs with a known misalignment are supported, and
5) the number of runtime alignment checks is within reason. */
! do_versioning = flag_tree_vect_loop_version && (!optimize_size);
if (do_versioning)
{
--- 1782,1791 ----
4) all misaligned data refs with a known misalignment are supported, and
5) the number of runtime alignment checks is within reason. */
! do_versioning =
! flag_tree_vect_loop_version
! && (!optimize_size)
! && (!loop->inner);
if (do_versioning)
{
*************** static bool
*** 1855,1874 ****
vect_analyze_data_ref_access (struct data_reference *dr)
{
tree step = DR_STEP (dr);
- HOST_WIDE_INT dr_step = TREE_INT_CST_LOW (step);
tree scalar_type = TREE_TYPE (DR_REF (dr));
HOST_WIDE_INT type_size = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (scalar_type));
tree stmt = DR_STMT (dr);
! /* For interleaving, STRIDE is STEP counted in elements, i.e., the size of the
! interleaving group (including gaps). */
! HOST_WIDE_INT stride = dr_step / type_size;
! if (!step)
{
! if (vect_print_dump_info (REPORT_DETAILS))
! fprintf (vect_dump, "bad data-ref access");
! return false;
}
/* Consecutive? */
if (!tree_int_cst_compare (step, TYPE_SIZE_UNIT (scalar_type)))
--- 1918,1956 ----
vect_analyze_data_ref_access (struct data_reference *dr)
{
tree step = DR_STEP (dr);
tree scalar_type = TREE_TYPE (DR_REF (dr));
HOST_WIDE_INT type_size = TREE_INT_CST_LOW (TYPE_SIZE_UNIT (scalar_type));
tree stmt = DR_STMT (dr);
! stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
! loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
! struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
! HOST_WIDE_INT dr_step = TREE_INT_CST_LOW (step);
! HOST_WIDE_INT stride;
! /* Don't allow invariant accesses. */
! if (dr_step == 0)
! return false;
!
! if (nested_in_vect_loop_p (loop, stmt))
{
! /* For the rest of the analysis we use the outer-loop step. */
! step = STMT_VINFO_DR_STEP (stmt_info);
! dr_step = TREE_INT_CST_LOW (step);
!
! if (dr_step == 0)
! {
! if (vect_print_dump_info (REPORT_ALIGNMENT))
! fprintf (vect_dump, "zero step in outer loop.");
! if (DR_IS_READ (dr))
! return true;
! else
! return false;
! }
}
+
+ /* For interleaving, STRIDE is STEP counted in elements, i.e., the size of the
+ interleaving group (including gaps). */
+ stride = dr_step / type_size;
/* Consecutive? */
if (!tree_int_cst_compare (step, TYPE_SIZE_UNIT (scalar_type)))
*************** vect_analyze_data_ref_access (struct dat
*** 1878,1883 ****
--- 1960,1972 ----
return true;
}
+ if (nested_in_vect_loop_p (loop, stmt))
+ {
+ if (vect_print_dump_info (REPORT_ALIGNMENT))
+ fprintf (vect_dump, "strided access in outer loop.");
+ return false;
+ }
+
/* Not consecutive access is possible only if it is a part of interleaving. */
if (!DR_GROUP_FIRST_DR (vinfo_for_stmt (stmt)))
{
*************** vect_analyze_data_refs (loop_vec_info lo
*** 2105,2110 ****
--- 2194,2201 ----
{
tree stmt;
stmt_vec_info stmt_info;
+ basic_block bb;
+ tree base, offset, init;
if (!dr || !DR_REF (dr))
{
*************** vect_analyze_data_refs (loop_vec_info lo
*** 2112,2137 ****
fprintf (vect_dump, "not vectorized: unhandled data-ref ");
return false;
}
!
! /* Update DR field in stmt_vec_info struct. */
stmt = DR_STMT (dr);
stmt_info = vinfo_for_stmt (stmt);
- if (STMT_VINFO_DATA_REF (stmt_info))
- {
- if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
- {
- fprintf (vect_dump,
- "not vectorized: more than one data ref in stmt: ");
- print_generic_expr (vect_dump, stmt, TDF_SLIM);
- }
- return false;
- }
- STMT_VINFO_DATA_REF (stmt_info) = dr;
-
/* Check that analysis of the data-ref succeeded. */
if (!DR_BASE_ADDRESS (dr) || !DR_OFFSET (dr) || !DR_INIT (dr)
! || !DR_STEP (dr))
{
if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
{
--- 2203,2215 ----
fprintf (vect_dump, "not vectorized: unhandled data-ref ");
return false;
}
!
stmt = DR_STMT (dr);
stmt_info = vinfo_for_stmt (stmt);
/* Check that analysis of the data-ref succeeded. */
if (!DR_BASE_ADDRESS (dr) || !DR_OFFSET (dr) || !DR_INIT (dr)
! || !DR_STEP (dr))
{
if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
{
*************** vect_analyze_data_refs (loop_vec_info lo
*** 2158,2164 ****
}
return false;
}
!
/* Set vectype for STMT. */
scalar_type = TREE_TYPE (DR_REF (dr));
STMT_VINFO_VECTYPE (stmt_info) =
--- 2236,2362 ----
}
return false;
}
!
! base = unshare_expr (DR_BASE_ADDRESS (dr));
! offset = unshare_expr (DR_OFFSET (dr));
! init = unshare_expr (DR_INIT (dr));
!
! /* Update DR field in stmt_vec_info struct. */
! bb = bb_for_stmt (stmt);
!
! /* If the dataref is in an inner-loop of the loop that is considered for
! for vectorization, we also want to analyze the access relative to
! the outer-loop (DR contains information only relative to the
! inner-most enclosing loop). We do that by building a reference to the
! first location accessed by the inner-loop, and analyze it relative to
! the outer-loop. */
! if (nested_in_vect_loop_p (loop, stmt))
! {
! tree outer_step, outer_base, outer_init;
! HOST_WIDE_INT pbitsize, pbitpos;
! tree poffset;
! enum machine_mode pmode;
! int punsignedp, pvolatilep;
! affine_iv base_iv, offset_iv;
! tree dinit;
!
! /* Build a reference to the first location accessed by the
! inner-loop: *(BASE+INNER). (The first location is actually
! BASE+INNER+OFFSET, but we add OFFSET separately later. */
! tree inner_base = build_fold_indirect_ref
! (fold_build2 (PLUS_EXPR, TREE_TYPE (base), base, init));
!
! if (vect_print_dump_info (REPORT_DETAILS))
! {
! fprintf (dump_file, "analyze in outer-loop: ");
! print_generic_expr (dump_file, inner_base, TDF_SLIM);
! }
!
! outer_base = get_inner_reference (inner_base, &pbitsize, &pbitpos,
! &poffset, &pmode, &punsignedp, &pvolatilep, false);
! gcc_assert (outer_base != NULL_TREE);
!
! if (pbitpos % BITS_PER_UNIT != 0)
! {
! if (vect_print_dump_info (REPORT_DETAILS))
! fprintf (dump_file, "failed: bit offset alignment.\n");
! return false;
! }
!
! outer_base = build_fold_addr_expr (outer_base);
! if (!simple_iv (loop, stmt, outer_base, &base_iv, false))
! {
! if (vect_print_dump_info (REPORT_DETAILS))
! fprintf (dump_file, "failed: evolution of base is not affine.\n");
! return false;
! }
!
! if (offset)
! {
! if (poffset)
! poffset = fold_build2 (PLUS_EXPR, TREE_TYPE (offset), offset, poffset);
! else
! poffset = offset;
! }
!
! if (!poffset)
! {
! offset_iv.base = ssize_int (0);
! offset_iv.step = ssize_int (0);
! }
! else if (!simple_iv (loop, stmt, poffset, &offset_iv, false))
! {
! if (vect_print_dump_info (REPORT_DETAILS))
! fprintf (dump_file, "evolution of offset is not affine.\n");
! return false;
! }
!
! outer_init = ssize_int (pbitpos / BITS_PER_UNIT);
! split_constant_offset (base_iv.base, &base_iv.base, &dinit);
! outer_init = size_binop (PLUS_EXPR, outer_init, dinit);
! split_constant_offset (offset_iv.base, &offset_iv.base, &dinit);
! outer_init = size_binop (PLUS_EXPR, outer_init, dinit);
!
! outer_step = size_binop (PLUS_EXPR,
! fold_convert (ssizetype, base_iv.step),
! fold_convert (ssizetype, offset_iv.step));
!
! STMT_VINFO_DR_STEP (stmt_info) = outer_step;
! /* FIXME: Use canonicalize_base_object_address (base_iv.base); */
! STMT_VINFO_DR_BASE_ADDRESS (stmt_info) = base_iv.base;
! STMT_VINFO_DR_INIT (stmt_info) = outer_init;
! STMT_VINFO_DR_OFFSET (stmt_info) =
! fold_convert (ssizetype, offset_iv.base);
! STMT_VINFO_DR_ALIGNED_TO (stmt_info) =
! size_int (highest_pow2_factor (offset_iv.base));
!
! if (dump_file && (dump_flags & TDF_DETAILS))
! {
! fprintf (dump_file, "\touter base_address: ");
! print_generic_expr (dump_file, STMT_VINFO_DR_BASE_ADDRESS (stmt_info), TDF_SLIM);
! fprintf (dump_file, "\n\touter offset from base address: ");
! print_generic_expr (dump_file, STMT_VINFO_DR_OFFSET (stmt_info), TDF_SLIM);
! fprintf (dump_file, "\n\touter constant offset from base address: ");
! print_generic_expr (dump_file, STMT_VINFO_DR_INIT (stmt_info), TDF_SLIM);
! fprintf (dump_file, "\n\touter step: ");
! print_generic_expr (dump_file, STMT_VINFO_DR_STEP (stmt_info), TDF_SLIM);
! fprintf (dump_file, "\n\touter aligned to: ");
! print_generic_expr (dump_file, STMT_VINFO_DR_ALIGNED_TO (stmt_info), TDF_SLIM);
! }
! }
!
! if (STMT_VINFO_DATA_REF (stmt_info))
! {
! if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
! {
! fprintf (vect_dump,
! "not vectorized: more than one data ref in stmt: ");
! print_generic_expr (vect_dump, stmt, TDF_SLIM);
! }
! return false;
! }
! STMT_VINFO_DATA_REF (stmt_info) = dr;
!
/* Set vectype for STMT. */
scalar_type = TREE_TYPE (DR_REF (dr));
STMT_VINFO_VECTYPE (stmt_info) =
*************** vect_mark_relevant (VEC(tree,heap) **wor
*** 2204,2214 ****
/* This is the last stmt in a sequence that was detected as a
pattern that can potentially be vectorized. Don't mark the stmt
! as relevant/live because it's not going to vectorized.
Instead mark the pattern-stmt that replaces it. */
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "last stmt in pattern. don't mark relevant/live.");
- pattern_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
stmt_info = vinfo_for_stmt (pattern_stmt);
gcc_assert (STMT_VINFO_RELATED_STMT (stmt_info) == stmt);
save_relevant = STMT_VINFO_RELEVANT (stmt_info);
--- 2402,2414 ----
/* This is the last stmt in a sequence that was detected as a
pattern that can potentially be vectorized. Don't mark the stmt
! as relevant/live because it's not going to be vectorized.
Instead mark the pattern-stmt that replaces it. */
+
+ pattern_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
+
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "last stmt in pattern. don't mark relevant/live.");
stmt_info = vinfo_for_stmt (pattern_stmt);
gcc_assert (STMT_VINFO_RELATED_STMT (stmt_info) == stmt);
save_relevant = STMT_VINFO_RELEVANT (stmt_info);
*************** vect_stmt_relevant_p (tree stmt, loop_ve
*** 2258,2264 ****
*live_p = false;
/* cond stmt other than loop exit cond. */
! if (is_ctrl_stmt (stmt) && (stmt != LOOP_VINFO_EXIT_COND (loop_vinfo)))
*relevant = vect_used_in_loop;
/* changing memory. */
--- 2458,2465 ----
*live_p = false;
/* cond stmt other than loop exit cond. */
! if (is_ctrl_stmt (stmt)
! && STMT_VINFO_TYPE (vinfo_for_stmt (stmt)) != loop_exit_ctrl_vec_info_type)
*relevant = vect_used_in_loop;
/* changing memory. */
*************** vect_stmt_relevant_p (tree stmt, loop_ve
*** 2315,2320 ****
--- 2516,2523 ----
of the respective DEF_STMT is left unchanged.
- case 2: If STMT is a reduction phi and DEF_STMT is a reduction stmt, we
skip DEF_STMT cause it had already been processed.
+ - case 3: If DEF_STMT and STMT are in different nests, then "relevant" will
+ be modified accordingly.
Return true if everything is as expected. Return false otherwise. */
*************** process_use (tree stmt, tree use, loop_v
*** 2325,2331 ****
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt);
stmt_vec_info dstmt_vinfo;
! basic_block def_bb;
tree def, def_stmt;
enum vect_def_type dt;
--- 2528,2534 ----
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt);
stmt_vec_info dstmt_vinfo;
! basic_block bb, def_bb;
tree def, def_stmt;
enum vect_def_type dt;
*************** process_use (tree stmt, tree use, loop_v
*** 2346,2362 ****
def_bb = bb_for_stmt (def_stmt);
if (!flow_bb_inside_loop_p (loop, def_bb))
! return true;
! /* case 2: A reduction phi defining a reduction stmt (DEF_STMT). DEF_STMT
! must have already been processed, so we just check that everything is as
! expected, and we are done. */
dstmt_vinfo = vinfo_for_stmt (def_stmt);
if (TREE_CODE (stmt) == PHI_NODE
&& STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def
&& TREE_CODE (def_stmt) != PHI_NODE
! && STMT_VINFO_DEF_TYPE (dstmt_vinfo) == vect_reduction_def)
{
if (STMT_VINFO_IN_PATTERN_P (dstmt_vinfo))
dstmt_vinfo = vinfo_for_stmt (STMT_VINFO_RELATED_STMT (dstmt_vinfo));
gcc_assert (STMT_VINFO_RELEVANT (dstmt_vinfo) < vect_used_by_reduction);
--- 2549,2575 ----
def_bb = bb_for_stmt (def_stmt);
if (!flow_bb_inside_loop_p (loop, def_bb))
! {
! if (vect_print_dump_info (REPORT_DETAILS))
! fprintf (vect_dump, "def_stmt is out of loop.");
! return true;
! }
! /* case 2: A reduction phi (STMT) defined by a reduction stmt (DEF_STMT).
! DEF_STMT must have already been processed, because this should be the
! only way that STMT, which is a reduction-phi, was put in the worklist,
! as there should be no other uses for DEF_STMT in the loop. So we just
! check that everything is as expected, and we are done. */
dstmt_vinfo = vinfo_for_stmt (def_stmt);
+ bb = bb_for_stmt (stmt);
if (TREE_CODE (stmt) == PHI_NODE
&& STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def
&& TREE_CODE (def_stmt) != PHI_NODE
! && STMT_VINFO_DEF_TYPE (dstmt_vinfo) == vect_reduction_def
! && bb->loop_father == def_bb->loop_father)
{
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "reduc-stmt defining reduc-phi in the same nest.");
if (STMT_VINFO_IN_PATTERN_P (dstmt_vinfo))
dstmt_vinfo = vinfo_for_stmt (STMT_VINFO_RELATED_STMT (dstmt_vinfo));
gcc_assert (STMT_VINFO_RELEVANT (dstmt_vinfo) < vect_used_by_reduction);
*************** process_use (tree stmt, tree use, loop_v
*** 2365,2370 ****
--- 2578,2650 ----
return true;
}
+ /* case 3a: outer-loop stmt defining an inner-loop stmt:
+ outer-loop-header-bb:
+ d = def_stmt
+ inner-loop:
+ stmt # use (d)
+ outer-loop-tail-bb:
+ ... */
+ if (flow_loop_nested_p (def_bb->loop_father, bb->loop_father))
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "outer-loop def-stmt defining inner-loop stmt.");
+ switch (relevant)
+ {
+ case vect_unused_in_loop:
+ relevant = (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def) ?
+ vect_used_by_reduction : vect_unused_in_loop;
+ break;
+ case vect_used_in_outer_by_reduction:
+ relevant = vect_used_by_reduction;
+ break;
+ case vect_used_in_outer:
+ relevant = vect_used_in_loop;
+ break;
+ case vect_used_by_reduction:
+ case vect_used_in_loop:
+ break;
+
+ default:
+ gcc_unreachable ();
+ }
+ }
+
+ /* case 3b: inner-loop stmt defining an outer-loop stmt:
+ outer-loop-header-bb:
+ ...
+ inner-loop:
+ d = def_stmt
+ outer-loop-tail-bb:
+ stmt # use (d) */
+ else if (flow_loop_nested_p (bb->loop_father, def_bb->loop_father))
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "inner-loop def-stmt defining outer-loop stmt.");
+ switch (relevant)
+ {
+ case vect_unused_in_loop:
+ relevant = (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def) ?
+ vect_used_in_outer_by_reduction : vect_unused_in_loop;
+ break;
+
+ case vect_used_in_outer_by_reduction:
+ case vect_used_in_outer:
+ break;
+
+ case vect_used_by_reduction:
+ relevant = vect_used_in_outer_by_reduction;
+ break;
+
+ case vect_used_in_loop:
+ relevant = vect_used_in_outer;
+ break;
+
+ default:
+ gcc_unreachable ();
+ }
+ }
+
vect_mark_relevant (worklist, def_stmt, relevant, live_p);
return true;
}
*************** vect_mark_stmts_to_be_vectorized (loop_v
*** 2473,2497 ****
identify stmts that are used solely by a reduction, and therefore the
order of the results that they produce does not have to be kept.
! Reduction phis are expected to be used by a reduction stmt; Other
! reduction stmts are expected to be unused in the loop. These are the
! expected values of "relevant" for reduction phis/stmts in the loop:
relevance: phi stmt
vect_unused_in_loop ok
vect_used_by_reduction ok
vect_used_in_loop */
if (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def)
{
! switch (relevant)
{
case vect_unused_in_loop:
gcc_assert (TREE_CODE (stmt) != PHI_NODE);
break;
case vect_used_by_reduction:
if (TREE_CODE (stmt) == PHI_NODE)
break;
case vect_used_in_loop:
default:
if (vect_print_dump_info (REPORT_DETAILS))
--- 2753,2790 ----
identify stmts that are used solely by a reduction, and therefore the
order of the results that they produce does not have to be kept.
! Reduction phis are expected to be used by a reduction stmt, or by
! in an outer loop; Other reduction stmts are expected to be
! in the loop, and possibly used by a stmt in an outer loop.
! Here are the expected values of "relevant" for reduction phis/stmts:
relevance: phi stmt
vect_unused_in_loop ok
+ vect_used_in_outer_by_reduction ok ok
+ vect_used_in_outer ok ok
vect_used_by_reduction ok
vect_used_in_loop */
if (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def)
{
! enum vect_relevant tmp_relevant = relevant;
! switch (tmp_relevant)
{
case vect_unused_in_loop:
gcc_assert (TREE_CODE (stmt) != PHI_NODE);
+ relevant = vect_used_by_reduction;
+ break;
+
+ case vect_used_in_outer_by_reduction:
+ case vect_used_in_outer:
+ gcc_assert (TREE_CODE (stmt) != WIDEN_SUM_EXPR
+ && TREE_CODE (stmt) != DOT_PROD_EXPR);
break;
+
case vect_used_by_reduction:
if (TREE_CODE (stmt) == PHI_NODE)
break;
+ /* fall through */
case vect_used_in_loop:
default:
if (vect_print_dump_info (REPORT_DETAILS))
*************** vect_mark_stmts_to_be_vectorized (loop_v
*** 2499,2505 ****
VEC_free (tree, heap, worklist);
return false;
}
- relevant = vect_used_by_reduction;
live_p = false;
}
--- 2792,2797 ----
*************** vect_get_loop_niters (struct loop *loop,
*** 2641,2651 ****
}
/* Function vect_analyze_loop_form.
! Verify the following restrictions (some may be relaxed in the future):
! - it's an inner-most loop
! - number of BBs = 2 (which are the loop header and the latch)
- the loop has a pre-header
- the loop has a single entry and exit
- the loop exit condition is simple enough, and the number of iterations
--- 2933,2971 ----
}
+ /* Function vect_analyze_loop_1.
+
+ Apply a set of analyses on LOOP, and create a loop_vec_info struct
+ for it. The different analyses will record information in the
+ loop_vec_info struct. This is a subset of the analyses applied in
+ vect_analyze_loop, to be applied on an inner-loop nested in the loop
+ that is now considered for (outer-loop) vectorization. */
+
+ static loop_vec_info
+ vect_analyze_loop_1 (struct loop *loop)
+ {
+ loop_vec_info loop_vinfo;
+
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "===== analyze_loop_nest_1 =====");
+
+ /* Check the CFG characteristics of the loop (nesting, entry/exit, etc. */
+
+ loop_vinfo = vect_analyze_loop_form (loop);
+ if (!loop_vinfo)
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "bad inner-loop form.");
+ return NULL;
+ }
+
+ return loop_vinfo;
+ }
+
+
/* Function vect_analyze_loop_form.
! Verify that certain CFG restrictions hold, including:
- the loop has a pre-header
- the loop has a single entry and exit
- the loop exit condition is simple enough, and the number of iterations
*************** vect_analyze_loop_form (struct loop *loo
*** 2657,2687 ****
loop_vec_info loop_vinfo;
tree loop_cond;
tree number_of_iterations = NULL;
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "=== vect_analyze_loop_form ===");
! if (loop->inner)
{
! if (vect_print_dump_info (REPORT_OUTER_LOOPS))
! fprintf (vect_dump, "not vectorized: nested loop.");
return NULL;
}
if (!single_exit (loop)
- || loop->num_nodes != 2
|| EDGE_COUNT (loop->header->preds) != 2)
{
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
{
if (!single_exit (loop))
fprintf (vect_dump, "not vectorized: multiple exits.");
- else if (loop->num_nodes != 2)
- fprintf (vect_dump, "not vectorized: too many BBs in loop.");
else if (EDGE_COUNT (loop->header->preds) != 2)
fprintf (vect_dump, "not vectorized: too many incoming edges.");
}
!
return NULL;
}
--- 2977,3100 ----
loop_vec_info loop_vinfo;
tree loop_cond;
tree number_of_iterations = NULL;
+ loop_vec_info inner_loop_vinfo = NULL;
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "=== vect_analyze_loop_form ===");
! /* Different restrictions apply when we are considering an inner-most loop,
! vs. an outer (nested) loop.
! (FORNOW. May want to relax some of these restrictions in the future). */
!
! if (!loop->inner)
! {
! /* Inner-most loop. We currently require that the number of BBs is
! exactly 2 (the header and latch). Vectorizable inner-most loops
! look like this:
!
! (pre-header)
! |
! header <--------+
! | | |
! | +--> latch --+
! |
! (exit-bb) */
!
! if (loop->num_nodes != 2)
! {
! if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
! fprintf (vect_dump, "not vectorized: too many BBs in loop.");
! return NULL;
! }
!
! if (empty_block_p (loop->header))
{
! if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
! fprintf (vect_dump, "not vectorized: empty loop.");
return NULL;
}
+ }
+ else
+ {
+ struct loop *innerloop = loop->inner;
+ edge backedge, entryedge;
+
+ /* Nested loop. We currently require that the loop is doubly-nested,
+ contains a single inner loop, and the number of BBs is exactly 5.
+ Vectorizable outer-loops look like this:
+
+ (pre-header)
+ |
+ header <---+
+ | |
+ inner-loop |
+ | |
+ tail ------+
+ |
+ (exit-bb)
+
+ The inner-loop has the properties expected of inner-most loops
+ as described above. */
+
+ if ((loop->inner)->inner || (loop->inner)->next)
+ {
+ if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
+ fprintf (vect_dump, "not vectorized: multiple nested loops.");
+ return NULL;
+ }
+
+ /* Analyze the inner-loop. */
+ inner_loop_vinfo = vect_analyze_loop_1 (loop->inner);
+ if (!inner_loop_vinfo)
+ {
+ if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
+ fprintf (vect_dump, "not vectorized: Bad inner loop.");
+ return NULL;
+ }
+
+ if (loop->num_nodes != 5)
+ {
+ if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
+ fprintf (vect_dump, "not vectorized: too many BBs in loop.");
+ destroy_loop_vec_info (inner_loop_vinfo, true);
+ return NULL;
+ }
+
+ gcc_assert (EDGE_COUNT (innerloop->header->preds) == 2);
+ backedge = EDGE_PRED (innerloop->header, 1);
+ entryedge = EDGE_PRED (innerloop->header, 0);
+ if (EDGE_PRED (innerloop->header, 0)->src == innerloop->latch)
+ {
+ backedge = EDGE_PRED (innerloop->header, 0);
+ entryedge = EDGE_PRED (innerloop->header, 1);
+ }
+
+ if (entryedge->src != loop->header
+ || !single_exit (innerloop)
+ || single_exit (innerloop)->dest != EDGE_PRED (loop->latch, 0)->src)
+ {
+ if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
+ fprintf (vect_dump, "not vectorized: unsupported outerloop form.");
+ destroy_loop_vec_info (inner_loop_vinfo, true);
+ return NULL;
+ }
+
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "Considering outer-loop vectorization.");
+ }
if (!single_exit (loop)
|| EDGE_COUNT (loop->header->preds) != 2)
{
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
{
if (!single_exit (loop))
fprintf (vect_dump, "not vectorized: multiple exits.");
else if (EDGE_COUNT (loop->header->preds) != 2)
fprintf (vect_dump, "not vectorized: too many incoming edges.");
}
! if (inner_loop_vinfo)
! destroy_loop_vec_info (inner_loop_vinfo, true);
return NULL;
}
*************** vect_analyze_loop_form (struct loop *loo
*** 2694,2699 ****
--- 3107,3114 ----
{
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
fprintf (vect_dump, "not vectorized: unexpected loop form.");
+ if (inner_loop_vinfo)
+ destroy_loop_vec_info (inner_loop_vinfo, true);
return NULL;
}
*************** vect_analyze_loop_form (struct loop *loo
*** 2711,2732 ****
{
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
fprintf (vect_dump, "not vectorized: abnormal loop exit edge.");
return NULL;
}
}
- if (empty_block_p (loop->header))
- {
- if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
- fprintf (vect_dump, "not vectorized: empty loop.");
- return NULL;
- }
-
loop_cond = vect_get_loop_niters (loop, &number_of_iterations);
if (!loop_cond)
{
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
fprintf (vect_dump, "not vectorized: complicated exit condition.");
return NULL;
}
--- 3126,3144 ----
{
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
fprintf (vect_dump, "not vectorized: abnormal loop exit edge.");
+ if (inner_loop_vinfo)
+ destroy_loop_vec_info (inner_loop_vinfo, true);
return NULL;
}
}
loop_cond = vect_get_loop_niters (loop, &number_of_iterations);
if (!loop_cond)
{
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
fprintf (vect_dump, "not vectorized: complicated exit condition.");
+ if (inner_loop_vinfo)
+ destroy_loop_vec_info (inner_loop_vinfo, true);
return NULL;
}
*************** vect_analyze_loop_form (struct loop *loo
*** 2735,2740 ****
--- 3147,3154 ----
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
fprintf (vect_dump,
"not vectorized: number of iterations cannot be computed.");
+ if (inner_loop_vinfo)
+ destroy_loop_vec_info (inner_loop_vinfo, true);
return NULL;
}
*************** vect_analyze_loop_form (struct loop *loo
*** 2742,2748 ****
{
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
fprintf (vect_dump, "Infinite number of iterations.");
! return false;
}
if (!NITERS_KNOWN_P (number_of_iterations))
--- 3156,3164 ----
{
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
fprintf (vect_dump, "Infinite number of iterations.");
! if (inner_loop_vinfo)
! destroy_loop_vec_info (inner_loop_vinfo, true);
! return NULL;
}
if (!NITERS_KNOWN_P (number_of_iterations))
*************** vect_analyze_loop_form (struct loop *loo
*** 2757,2768 ****
{
if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
fprintf (vect_dump, "not vectorized: number of iterations = 0.");
return NULL;
}
loop_vinfo = new_loop_vec_info (loop);
LOOP_VINFO_NITERS (loop_vinfo) = number_of_iterations;
! LOOP_VINFO_EXIT_COND (loop_vinfo) = loop_cond;
gcc_assert (!loop->aux);
loop->aux = loop_vinfo;
--- 3173,3191 ----
{
if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
fprintf (vect_dump, "not vectorized: number of iterations = 0.");
+ if (inner_loop_vinfo)
+ destroy_loop_vec_info (inner_loop_vinfo, false);
return NULL;
}
loop_vinfo = new_loop_vec_info (loop);
LOOP_VINFO_NITERS (loop_vinfo) = number_of_iterations;
!
! STMT_VINFO_TYPE (vinfo_for_stmt (loop_cond)) = loop_exit_ctrl_vec_info_type;
!
! /* CHECKME: May want to keep it around it in the future. */
! if (inner_loop_vinfo)
! destroy_loop_vec_info (inner_loop_vinfo, false);
gcc_assert (!loop->aux);
loop->aux = loop_vinfo;
*************** vect_analyze_loop (struct loop *loop)
*** 2784,2789 ****
--- 3207,3221 ----
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "===== analyze_loop_nest =====");
+ if (loop_outer (loop)
+ && loop_vec_info_for_loop (loop_outer (loop))
+ && LOOP_VINFO_VECTORIZABLE_P (loop_vec_info_for_loop (loop_outer (loop))))
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "outer-loop already vectorized.");
+ return NULL;
+ }
+
/* Check the CFG characteristics of the loop (nesting, entry/exit, etc. */
loop_vinfo = vect_analyze_loop_form (loop);
*************** vect_analyze_loop (struct loop *loop)
*** 2805,2811 ****
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad data references.");
! destroy_loop_vec_info (loop_vinfo);
return NULL;
}
--- 3237,3243 ----
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad data references.");
! destroy_loop_vec_info (loop_vinfo, true);
return NULL;
}
*************** vect_analyze_loop (struct loop *loop)
*** 2823,2829 ****
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "unexpected pattern.");
! destroy_loop_vec_info (loop_vinfo);
return NULL;
}
--- 3255,3261 ----
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "unexpected pattern.");
! destroy_loop_vec_info (loop_vinfo, true);
return NULL;
}
*************** vect_analyze_loop (struct loop *loop)
*** 2835,2841 ****
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad data alignment.");
! destroy_loop_vec_info (loop_vinfo);
return NULL;
}
--- 3267,3273 ----
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad data alignment.");
! destroy_loop_vec_info (loop_vinfo, true);
return NULL;
}
*************** vect_analyze_loop (struct loop *loop)
*** 2844,2850 ****
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "can't determine vectorization factor.");
! destroy_loop_vec_info (loop_vinfo);
return NULL;
}
--- 3276,3282 ----
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "can't determine vectorization factor.");
! destroy_loop_vec_info (loop_vinfo, true);
return NULL;
}
*************** vect_analyze_loop (struct loop *loop)
*** 2856,2862 ****
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad data dependence.");
! destroy_loop_vec_info (loop_vinfo);
return NULL;
}
--- 3288,3294 ----
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad data dependence.");
! destroy_loop_vec_info (loop_vinfo, true);
return NULL;
}
*************** vect_analyze_loop (struct loop *loop)
*** 2868,2874 ****
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad data access.");
! destroy_loop_vec_info (loop_vinfo);
return NULL;
}
--- 3300,3306 ----
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad data access.");
! destroy_loop_vec_info (loop_vinfo, true);
return NULL;
}
*************** vect_analyze_loop (struct loop *loop)
*** 2880,2886 ****
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad data alignment.");
! destroy_loop_vec_info (loop_vinfo);
return NULL;
}
--- 3312,3318 ----
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad data alignment.");
! destroy_loop_vec_info (loop_vinfo, true);
return NULL;
}
*************** vect_analyze_loop (struct loop *loop)
*** 2892,2898 ****
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad operation or unsupported loop bound.");
! destroy_loop_vec_info (loop_vinfo);
return NULL;
}
--- 3324,3330 ----
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad operation or unsupported loop bound.");
! destroy_loop_vec_info (loop_vinfo, true);
return NULL;
}
Index: tree-vect-patterns.c
===================================================================
*** tree-vect-patterns.c (revision 127202)
--- tree-vect-patterns.c (working copy)
*************** widened_name_p (tree name, tree use_stmt
*** 148,154 ****
* Return value: A new stmt that will be used to replace the sequence of
stmts that constitute the pattern. In this case it will be:
WIDEN_DOT_PRODUCT <x_t, y_t, sum_0>
! */
static tree
vect_recog_dot_prod_pattern (tree last_stmt, tree *type_in, tree *type_out)
--- 148,161 ----
* Return value: A new stmt that will be used to replace the sequence of
stmts that constitute the pattern. In this case it will be:
WIDEN_DOT_PRODUCT <x_t, y_t, sum_0>
!
! Note: The dot-prod idiom is a widening reduction pattern that is
! vectorized without preserving all the intermediate results. It
! produces only N/2 (widened) results (by summing up pairs of
! intermediate results) rather than all N results. Therefore, we
! cannot allow this pattern when we want to get all the results and in
! the correct order (as is the case when this computation is in an
! inner-loop nested in an outer-loop that us being vectorized). */
static tree
vect_recog_dot_prod_pattern (tree last_stmt, tree *type_in, tree *type_out)
*************** vect_recog_dot_prod_pattern (tree last_s
*** 160,165 ****
--- 167,174 ----
tree type, half_type;
tree pattern_expr;
tree prod_type;
+ loop_vec_info loop_info = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
+ struct loop *loop = LOOP_VINFO_LOOP (loop_info);
if (TREE_CODE (last_stmt) != GIMPLE_MODIFY_STMT)
return NULL;
*************** vect_recog_dot_prod_pattern (tree last_s
*** 242,247 ****
--- 251,260 ----
gcc_assert (stmt_vinfo);
if (STMT_VINFO_DEF_TYPE (stmt_vinfo) != vect_loop_def)
return NULL;
+ /* FORNOW. Can continue analyzing the def-use chain when this stmt in a phi
+ inside the loop (in case we are analyzing an outer-loop). */
+ if (TREE_CODE (stmt) != GIMPLE_MODIFY_STMT)
+ return NULL;
expr = GIMPLE_STMT_OPERAND (stmt, 1);
if (TREE_CODE (expr) != MULT_EXPR)
return NULL;
*************** vect_recog_dot_prod_pattern (tree last_s
*** 295,300 ****
--- 308,323 ----
fprintf (vect_dump, "vect_recog_dot_prod_pattern: detected: ");
print_generic_expr (vect_dump, pattern_expr, TDF_SLIM);
}
+
+ /* We don't allow changing the order of the computation in the inner-loop
+ when doing outer-loop vectorization. */
+ if (nested_in_vect_loop_p (loop, last_stmt))
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "vect_recog_dot_prod_pattern: not allowed.");
+ return NULL;
+ }
+
return pattern_expr;
}
*************** vect_recog_pow_pattern (tree last_stmt,
*** 521,527 ****
* Return value: A new stmt that will be used to replace the sequence of
stmts that constitute the pattern. In this case it will be:
WIDEN_SUM <x_t, sum_0>
! */
static tree
vect_recog_widen_sum_pattern (tree last_stmt, tree *type_in, tree *type_out)
--- 544,557 ----
* Return value: A new stmt that will be used to replace the sequence of
stmts that constitute the pattern. In this case it will be:
WIDEN_SUM <x_t, sum_0>
!
! Note: The widneing-sum idiom is a widening reduction pattern that is
! vectorized without preserving all the intermediate results. It
! produces only N/2 (widened) results (by summing up pairs of
! intermediate results) rather than all N results. Therefore, we
! cannot allow this pattern when we want to get all the results and in
! the correct order (as is the case when this computation is in an
! inner-loop nested in an outer-loop that us being vectorized). */
static tree
vect_recog_widen_sum_pattern (tree last_stmt, tree *type_in, tree *type_out)
*************** vect_recog_widen_sum_pattern (tree last_
*** 531,536 ****
--- 561,568 ----
stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
tree type, half_type;
tree pattern_expr;
+ loop_vec_info loop_info = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
+ struct loop *loop = LOOP_VINFO_LOOP (loop_info);
if (TREE_CODE (last_stmt) != GIMPLE_MODIFY_STMT)
return NULL;
*************** vect_recog_widen_sum_pattern (tree last_
*** 580,585 ****
--- 612,627 ----
fprintf (vect_dump, "vect_recog_widen_sum_pattern: detected: ");
print_generic_expr (vect_dump, pattern_expr, TDF_SLIM);
}
+
+ /* We don't allow changing the order of the computation in the inner-loop
+ when doing outer-loop vectorization. */
+ if (nested_in_vect_loop_p (loop, last_stmt))
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "vect_recog_widen_sum_pattern: not allowed.");
+ return NULL;
+ }
+
return pattern_expr;
}
Index: tree-vect-transform.c
===================================================================
*** tree-vect-transform.c (revision 127202)
--- tree-vect-transform.c (working copy)
*************** along with GCC; see the file COPYING3.
*** 49,62 ****
static bool vect_transform_stmt (tree, block_stmt_iterator *, bool *);
static tree vect_create_destination_var (tree, tree);
static tree vect_create_data_ref_ptr
! (tree, block_stmt_iterator *, tree, tree *, tree *, bool, tree);
! static tree vect_create_addr_base_for_vector_ref (tree, tree *, tree);
! static tree vect_setup_realignment (tree, block_stmt_iterator *, tree *);
static tree vect_get_new_vect_var (tree, enum vect_var_kind, const char *);
static tree vect_get_vec_def_for_operand (tree, tree, tree *);
! static tree vect_init_vector (tree, tree, tree);
static void vect_finish_stmt_generation
! (tree stmt, tree vec_stmt, block_stmt_iterator *bsi);
static bool vect_is_simple_cond (tree, loop_vec_info);
static void update_vuses_to_preheader (tree, struct loop*);
static void vect_create_epilog_for_reduction (tree, tree, enum tree_code, tree);
--- 49,62 ----
static bool vect_transform_stmt (tree, block_stmt_iterator *, bool *);
static tree vect_create_destination_var (tree, tree);
static tree vect_create_data_ref_ptr
! (tree, struct loop*, tree, tree *, tree *, bool, tree, bool *);
! static tree vect_create_addr_base_for_vector_ref
! (tree, tree *, tree, struct loop *);
static tree vect_get_new_vect_var (tree, enum vect_var_kind, const char *);
static tree vect_get_vec_def_for_operand (tree, tree, tree *);
! static tree vect_init_vector (tree, tree, tree, block_stmt_iterator *);
static void vect_finish_stmt_generation
! (tree stmt, tree vec_stmt, block_stmt_iterator *);
static bool vect_is_simple_cond (tree, loop_vec_info);
static void update_vuses_to_preheader (tree, struct loop*);
static void vect_create_epilog_for_reduction (tree, tree, enum tree_code, tree);
*************** vect_estimate_min_profitable_iters (loop
*** 125,130 ****
--- 125,131 ----
basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
int nbbs = loop->num_nodes;
int byte_misalign;
+ int innerloop_iters, factor;
/* Cost model disabled. */
if (!flag_vect_cost_model)
*************** vect_estimate_min_profitable_iters (loop
*** 153,163 ****
--- 154,173 ----
TODO: Consider assigning different costs to different scalar
statements. */
+ /* FORNOW. */
+ if (loop->inner)
+ innerloop_iters = 50; /* FIXME */
+
for (i = 0; i < nbbs; i++)
{
block_stmt_iterator si;
basic_block bb = bbs[i];
+ if (bb->loop_father == loop->inner)
+ factor = innerloop_iters;
+ else
+ factor = 1;
+
for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
{
tree stmt = bsi_stmt (si);
*************** vect_estimate_min_profitable_iters (loop
*** 165,172 ****
if (!STMT_VINFO_RELEVANT_P (stmt_info)
&& !STMT_VINFO_LIVE_P (stmt_info))
continue;
! scalar_single_iter_cost += cost_for_stmt (stmt);
! vec_inside_cost += STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info);
vec_outside_cost += STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info);
}
}
--- 175,184 ----
if (!STMT_VINFO_RELEVANT_P (stmt_info)
&& !STMT_VINFO_LIVE_P (stmt_info))
continue;
! scalar_single_iter_cost += cost_for_stmt (stmt) * factor;
! vec_inside_cost += STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) * factor;
! /* FIXME: for stmts in the inner-loop in outer-loop vectorization,
! some of the "outside" costs are generated inside the outer-loop. */
vec_outside_cost += STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info);
}
}
*************** vect_model_load_cost (stmt_vec_info stmt
*** 598,604 ****
break;
}
! case dr_unaligned_software_pipeline:
{
int outer_cost = 0;
--- 610,628 ----
break;
}
! case dr_explicit_realign:
! {
! inner_cost += ncopies * (2*TARG_VEC_LOAD_COST + TARG_VEC_STMT_COST);
!
! /* FIXME: If the misalignment remains fixed across the iterations of
! the containing loop, the following cost should be added to the
! outside costs. */
! if (targetm.vectorize.builtin_mask_for_load)
! inner_cost += TARG_VEC_STMT_COST;
!
! break;
! }
! case dr_explicit_realign_optimized:
{
int outer_cost = 0;
*************** vect_get_new_vect_var (tree type, enum v
*** 695,700 ****
--- 719,737 ----
STMT: The statement containing the data reference.
NEW_STMT_LIST: Must be initialized to NULL_TREE or a statement list.
OFFSET: Optional. If supplied, it is be added to the initial address.
+ LOOP: Specify relative to which loop-nest should the address be computed.
+ For example, when the dataref is in an inner-loop nested in an
+ outer-loop that is now being vectorized, LOOP can be either the
+ outer-loop, or the inner-loop. The first memory location accessed
+ by the following dataref ('in' points to short):
+
+ for (i=0; i<N; i++)
+ for (j=0; j<M; j++)
+ s += in[i+j]
+
+ is as follows:
+ if LOOP=i_loop: &in (relative to i_loop)
+ if LOOP=j_loop: &in+i*2B (relative to j_loop)
Output:
1. Return an SSA_NAME whose value is the address of the memory location of
*************** vect_get_new_vect_var (tree type, enum v
*** 707,720 ****
static tree
vect_create_addr_base_for_vector_ref (tree stmt,
tree *new_stmt_list,
! tree offset)
{
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
! tree data_ref_base_expr = unshare_expr (DR_BASE_ADDRESS (dr));
! tree base_name = build_fold_indirect_ref (data_ref_base_expr);
tree data_ref_base_var;
- tree data_ref_base;
tree new_base_stmt;
tree vec_stmt;
tree addr_base, addr_expr;
--- 744,758 ----
static tree
vect_create_addr_base_for_vector_ref (tree stmt,
tree *new_stmt_list,
! tree offset,
! struct loop *loop)
{
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
! struct loop *containing_loop = (bb_for_stmt (stmt))->loop_father;
! tree data_ref_base = unshare_expr (DR_BASE_ADDRESS (dr));
! tree base_name;
tree data_ref_base_var;
tree new_base_stmt;
tree vec_stmt;
tree addr_base, addr_expr;
*************** vect_create_addr_base_for_vector_ref (tr
*** 722,733 ****
tree base_offset = unshare_expr (DR_OFFSET (dr));
tree init = unshare_expr (DR_INIT (dr));
tree vect_ptr_type, addr_expr2;
!
!
! /* Create data_ref_base */
! data_ref_base_var = create_tmp_var (TREE_TYPE (data_ref_base_expr), "batmp");
add_referenced_var (data_ref_base_var);
! data_ref_base = force_gimple_operand (data_ref_base_expr, &new_base_stmt,
true, data_ref_base_var);
append_to_statement_list_force(new_base_stmt, new_stmt_list);
--- 760,785 ----
tree base_offset = unshare_expr (DR_OFFSET (dr));
tree init = unshare_expr (DR_INIT (dr));
tree vect_ptr_type, addr_expr2;
! tree step = TYPE_SIZE_UNIT (TREE_TYPE (DR_REF (dr)));
!
! gcc_assert (loop);
! if (loop != containing_loop)
! {
! loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
! struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
!
! gcc_assert (nested_in_vect_loop_p (loop, stmt));
!
! data_ref_base = unshare_expr (STMT_VINFO_DR_BASE_ADDRESS (stmt_info));
! base_offset = unshare_expr (STMT_VINFO_DR_OFFSET (stmt_info));
! init = unshare_expr (STMT_VINFO_DR_INIT (stmt_info));
! }
!
! /* Create base_offset */
! base_name = build_fold_indirect_ref (data_ref_base);
! data_ref_base_var = create_tmp_var (TREE_TYPE (data_ref_base), "batmp");
add_referenced_var (data_ref_base_var);
! data_ref_base = force_gimple_operand (data_ref_base, &new_base_stmt,
true, data_ref_base_var);
append_to_statement_list_force(new_base_stmt, new_stmt_list);
*************** vect_create_addr_base_for_vector_ref (tr
*** 742,757 ****
if (offset)
{
tree tmp = create_tmp_var (sizetype, "offset");
- tree step;
-
- /* For interleaved access step we divide STEP by the size of the
- interleaving group. */
- if (DR_GROUP_SIZE (stmt_info))
- step = fold_build2 (TRUNC_DIV_EXPR, TREE_TYPE (offset), DR_STEP (dr),
- build_int_cst (TREE_TYPE (offset),
- DR_GROUP_SIZE (stmt_info)));
- else
- step = DR_STEP (dr);
add_referenced_var (tmp);
offset = fold_build2 (MULT_EXPR, TREE_TYPE (offset), offset, step);
--- 794,799 ----
*************** vect_create_addr_base_for_vector_ref (tr
*** 800,806 ****
1. STMT: a stmt that references memory. Expected to be of the form
GIMPLE_MODIFY_STMT <name, data-ref> or
GIMPLE_MODIFY_STMT <data-ref, name>.
! 2. BSI: block_stmt_iterator where new stmts can be added.
3. OFFSET (optional): an offset to be added to the initial address accessed
by the data-ref in STMT.
4. ONLY_INIT: indicate if vp is to be updated in the loop, or remain
--- 842,848 ----
1. STMT: a stmt that references memory. Expected to be of the form
GIMPLE_MODIFY_STMT <name, data-ref> or
GIMPLE_MODIFY_STMT <data-ref, name>.
! 2. AT_LOOP: the loop where the vector memref is to be created.
3. OFFSET (optional): an offset to be added to the initial address accessed
by the data-ref in STMT.
4. ONLY_INIT: indicate if vp is to be updated in the loop, or remain
*************** vect_create_addr_base_for_vector_ref (tr
*** 827,844 ****
Return the increment stmt that updates the pointer in PTR_INCR.
! 3. Return the pointer. */
static tree
! vect_create_data_ref_ptr (tree stmt,
! block_stmt_iterator *bsi ATTRIBUTE_UNUSED,
tree offset, tree *initial_address, tree *ptr_incr,
! bool only_init, tree type)
{
tree base_name;
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
tree vectype = STMT_VINFO_VECTYPE (stmt_info);
tree vect_ptr_type;
tree vect_ptr;
--- 869,890 ----
Return the increment stmt that updates the pointer in PTR_INCR.
! 3. Set INV_P to true if the access pattern of the data reference in the
! vectorized loop is invariant. Set it to false otherwise.
!
! 4. Return the pointer. */
static tree
! vect_create_data_ref_ptr (tree stmt, struct loop *at_loop,
tree offset, tree *initial_address, tree *ptr_incr,
! bool only_init, tree type, bool *inv_p)
{
tree base_name;
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+ bool nested_in_vect_loop = nested_in_vect_loop_p (loop, stmt);
+ struct loop *containing_loop = (bb_for_stmt (stmt))->loop_father;
tree vectype = STMT_VINFO_VECTYPE (stmt_info);
tree vect_ptr_type;
tree vect_ptr;
*************** vect_create_data_ref_ptr (tree stmt,
*** 846,856 ****
tree new_temp;
tree vec_stmt;
tree new_stmt_list = NULL_TREE;
! edge pe = loop_preheader_edge (loop);
basic_block new_bb;
tree vect_ptr_init;
struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
base_name = build_fold_indirect_ref (unshare_expr (DR_BASE_ADDRESS (dr)));
if (vect_print_dump_info (REPORT_DETAILS))
--- 892,922 ----
tree new_temp;
tree vec_stmt;
tree new_stmt_list = NULL_TREE;
! edge pe;
basic_block new_bb;
tree vect_ptr_init;
struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
+ tree vptr;
+ block_stmt_iterator incr_bsi;
+ bool insert_after;
+ tree indx_before_incr, indx_after_incr;
+ tree incr;
+ tree step;
+
+ /* Check the step (evolution) of the load in LOOP, and record
+ whether it's invariant. */
+ if (nested_in_vect_loop)
+ step = STMT_VINFO_DR_STEP (stmt_info);
+ else
+ step = DR_STEP (STMT_VINFO_DATA_REF (stmt_info));
+
+ if (tree_int_cst_compare (step, size_zero_node) == 0)
+ *inv_p = true;
+ else
+ *inv_p = false;
+ /* Create an expression for the first address accessed by this load
+ in LOOP. */
base_name = build_fold_indirect_ref (unshare_expr (DR_BASE_ADDRESS (dr)));
if (vect_print_dump_info (REPORT_DETAILS))
*************** vect_create_data_ref_ptr (tree stmt,
*** 893,904 ****
var_ann (vect_ptr)->subvars = DR_SUBVARS (dr);
/** (3) Calculate the initial address the vector-pointer, and set
the vector-pointer to point to it before the loop: **/
/* Create: (&(base[init_val+offset]) in the loop preheader. */
new_temp = vect_create_addr_base_for_vector_ref (stmt, &new_stmt_list,
! offset);
pe = loop_preheader_edge (loop);
new_bb = bsi_insert_on_edge_immediate (pe, new_stmt_list);
gcc_assert (!new_bb);
--- 959,1002 ----
var_ann (vect_ptr)->subvars = DR_SUBVARS (dr);
+ /** Note: If the dataref is in an inner-loop nested in LOOP, and we are
+ vectorizing LOOP (i.e. outer-loop vectorization), we need to create two
+ def-use update cycles for the pointer: One relative to the outer-loop
+ (LOOP), which is what steps (3) and (4) below do. The other is relative
+ to the inner-loop (which is the inner-most loop containing the dataref),
+ and this is done be step (5) below.
+
+ When vectorizing inner-most loops, the vectorized loop (LOOP) is also the
+ inner-most loop, and so steps (3),(4) work the same, and step (5) is
+ redundant. Steps (3),(4) create the following:
+
+ vp0 = &base_addr;
+ LOOP: vp1 = phi(vp0,vp2)
+ ...
+ ...
+ vp2 = vp1 + step
+ goto LOOP
+
+ If there is an inner-loop nested in loop, then step (5) will also be
+ applied, and an additional update in the inner-loop will be created:
+
+ vp0 = &base_addr;
+ LOOP: vp1 = phi(vp0,vp2)
+ ...
+ inner: vp3 = phi(vp1,vp4)
+ vp4 = vp3 + inner_step
+ if () goto inner
+ ...
+ vp2 = vp1 + step
+ if () goto LOOP */
+
/** (3) Calculate the initial address the vector-pointer, and set
the vector-pointer to point to it before the loop: **/
/* Create: (&(base[init_val+offset]) in the loop preheader. */
+
new_temp = vect_create_addr_base_for_vector_ref (stmt, &new_stmt_list,
! offset, loop);
pe = loop_preheader_edge (loop);
new_bb = bsi_insert_on_edge_immediate (pe, new_stmt_list);
gcc_assert (!new_bb);
*************** vect_create_data_ref_ptr (tree stmt,
*** 913,937 ****
gcc_assert (!new_bb);
! /** (4) Handle the updating of the vector-pointer inside the loop: **/
! if (only_init) /* No update in loop is required. */
{
/* Copy the points-to information if it exists. */
if (DR_PTR_INFO (dr))
duplicate_ssa_name_ptr_info (vect_ptr_init, DR_PTR_INFO (dr));
! return vect_ptr_init;
}
else
{
! block_stmt_iterator incr_bsi;
! bool insert_after;
! tree indx_before_incr, indx_after_incr;
! tree incr;
standard_iv_increment_position (loop, &incr_bsi, &insert_after);
create_iv (vect_ptr_init,
! fold_convert (vect_ptr_type, TYPE_SIZE_UNIT (vectype)),
NULL_TREE, loop, &incr_bsi, insert_after,
&indx_before_incr, &indx_after_incr);
incr = bsi_stmt (incr_bsi);
--- 1011,1041 ----
gcc_assert (!new_bb);
! /** (4) Handle the updating of the vector-pointer inside the loop.
! This is needed when ONLY_INIT is false, and also when AT_LOOP
! is the inner-loop nested in LOOP (during outer-loop vectorization).
! **/
! if (only_init && at_loop == loop) /* No update in loop is required. */
{
/* Copy the points-to information if it exists. */
if (DR_PTR_INFO (dr))
duplicate_ssa_name_ptr_info (vect_ptr_init, DR_PTR_INFO (dr));
! vptr = vect_ptr_init;
}
else
{
! /* The step of the vector pointer is the Vector Size. */
! tree step = TYPE_SIZE_UNIT (vectype);
! /* One exception to the above is when the scalar step of the load in
! LOOP is zero. In this case the step here is also zero. */
! if (*inv_p)
! step = size_zero_node;
standard_iv_increment_position (loop, &incr_bsi, &insert_after);
+
create_iv (vect_ptr_init,
! fold_convert (vect_ptr_type, step),
NULL_TREE, loop, &incr_bsi, insert_after,
&indx_before_incr, &indx_after_incr);
incr = bsi_stmt (incr_bsi);
*************** vect_create_data_ref_ptr (tree stmt,
*** 949,963 ****
if (ptr_incr)
*ptr_incr = incr;
! return indx_before_incr;
}
}
/* Function bump_vector_ptr
! Increment a pointer (to a vector type) by vector-size. Connect the new
! increment stmt to the existing def-use update-chain of the pointer.
The pointer def-use update-chain before this function:
DATAREF_PTR = phi (p_0, p_2)
--- 1053,1103 ----
if (ptr_incr)
*ptr_incr = incr;
! vptr = indx_before_incr;
}
+
+ if (!nested_in_vect_loop || only_init)
+ return vptr;
+
+
+ /** (5) Handle the updating of the vector-pointer inside the inner-loop
+ nested in LOOP, if exists: **/
+
+ gcc_assert (nested_in_vect_loop);
+ if (!only_init)
+ {
+ standard_iv_increment_position (containing_loop, &incr_bsi,
+ &insert_after);
+ create_iv (vptr, fold_convert (vect_ptr_type, DR_STEP (dr)), NULL_TREE,
+ containing_loop, &incr_bsi, insert_after, &indx_before_incr,
+ &indx_after_incr);
+ incr = bsi_stmt (incr_bsi);
+ set_stmt_info (stmt_ann (incr), new_stmt_vec_info (incr, loop_vinfo));
+
+ /* Copy the points-to information if it exists. */
+ if (DR_PTR_INFO (dr))
+ {
+ duplicate_ssa_name_ptr_info (indx_before_incr, DR_PTR_INFO (dr));
+ duplicate_ssa_name_ptr_info (indx_after_incr, DR_PTR_INFO (dr));
+ }
+ merge_alias_info (vect_ptr_init, indx_before_incr);
+ merge_alias_info (vect_ptr_init, indx_after_incr);
+ if (ptr_incr)
+ *ptr_incr = incr;
+
+ return indx_before_incr;
+ }
+ else
+ gcc_unreachable ();
}
/* Function bump_vector_ptr
! Increment a pointer (to a vector type) by vector-size. If requested,
! i.e. if PTR-INCR is given, then also connect the new increment stmt
! to the existing def-use update-chain of the pointer, by modifying
! the PTR_INCR as illustrated below:
The pointer def-use update-chain before this function:
DATAREF_PTR = phi (p_0, p_2)
*************** vect_create_data_ref_ptr (tree stmt,
*** 967,984 ****
The pointer def-use update-chain after this function:
DATAREF_PTR = phi (p_0, p_2)
....
! NEW_DATAREF_PTR = DATAREF_PTR + vector_size
....
PTR_INCR: p_2 = NEW_DATAREF_PTR + step
Input:
DATAREF_PTR - ssa_name of a pointer (to vector type) that is being updated
in the loop.
! PTR_INCR - the stmt that updates the pointer in each iteration of the loop.
! The increment amount across iterations is also expected to be
! vector_size.
BSI - location where the new update stmt is to be placed.
STMT - the original scalar memory-access stmt that is being vectorized.
Output: Return NEW_DATAREF_PTR as illustrated above.
--- 1107,1126 ----
The pointer def-use update-chain after this function:
DATAREF_PTR = phi (p_0, p_2)
....
! NEW_DATAREF_PTR = DATAREF_PTR + BUMP
....
PTR_INCR: p_2 = NEW_DATAREF_PTR + step
Input:
DATAREF_PTR - ssa_name of a pointer (to vector type) that is being updated
in the loop.
! PTR_INCR - optional. The stmt that updates the pointer in each iteration of
! the loop. The increment amount across iterations is expected
! to be vector_size.
BSI - location where the new update stmt is to be placed.
STMT - the original scalar memory-access stmt that is being vectorized.
+ BUMP - optional. The offset by which to bump the pointer. If not given,
+ the offset is assumed to be vector_size.
Output: Return NEW_DATAREF_PTR as illustrated above.
*************** vect_create_data_ref_ptr (tree stmt,
*** 986,992 ****
static tree
bump_vector_ptr (tree dataref_ptr, tree ptr_incr, block_stmt_iterator *bsi,
! tree stmt)
{
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
--- 1128,1134 ----
static tree
bump_vector_ptr (tree dataref_ptr, tree ptr_incr, block_stmt_iterator *bsi,
! tree stmt, tree bump)
{
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info);
*************** bump_vector_ptr (tree dataref_ptr, tree
*** 999,1004 ****
--- 1141,1149 ----
use_operand_p use_p;
tree new_dataref_ptr;
+ if (bump)
+ update = bump;
+
incr_stmt = build_gimple_modify_stmt (ptr_var,
build2 (POINTER_PLUS_EXPR, vptr_type,
dataref_ptr, update));
*************** bump_vector_ptr (tree dataref_ptr, tree
*** 1006,1011 ****
--- 1151,1164 ----
GIMPLE_STMT_OPERAND (incr_stmt, 0) = new_dataref_ptr;
vect_finish_stmt_generation (stmt, incr_stmt, bsi);
+ /* Copy the points-to information if it exists. */
+ if (DR_PTR_INFO (dr))
+ duplicate_ssa_name_ptr_info (new_dataref_ptr, DR_PTR_INFO (dr));
+ merge_alias_info (new_dataref_ptr, dataref_ptr);
+
+ if (!ptr_incr)
+ return new_dataref_ptr;
+
/* Update the vector-pointer's cross-iteration increment. */
FOR_EACH_SSA_USE_OPERAND (use_p, ptr_incr, iter, SSA_OP_USE)
{
*************** bump_vector_ptr (tree dataref_ptr, tree
*** 1017,1027 ****
gcc_assert (tree_int_cst_compare (use, update) == 0);
}
- /* Copy the points-to information if it exists. */
- if (DR_PTR_INFO (dr))
- duplicate_ssa_name_ptr_info (new_dataref_ptr, DR_PTR_INFO (dr));
- merge_alias_info (new_dataref_ptr, dataref_ptr);
-
return new_dataref_ptr;
}
--- 1170,1175 ----
*************** vect_create_destination_var (tree scalar
*** 1056,1070 ****
/* Function vect_init_vector.
Insert a new stmt (INIT_STMT) that initializes a new vector variable with
! the vector elements of VECTOR_VAR. Return the DEF of INIT_STMT. It will be
! used in the vectorization of STMT. */
static tree
! vect_init_vector (tree stmt, tree vector_var, tree vector_type)
{
stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt);
- loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
- struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
tree new_var;
tree init_stmt;
tree vec_oprnd;
--- 1204,1219 ----
/* Function vect_init_vector.
Insert a new stmt (INIT_STMT) that initializes a new vector variable with
! the vector elements of VECTOR_VAR. Place the initialization at BSI if it
! is not NULL. Otherwise, place the initialization at the loop preheader.
! Return the DEF of INIT_STMT.
! It will be used in the vectorization of STMT. */
static tree
! vect_init_vector (tree stmt, tree vector_var, tree vector_type,
! block_stmt_iterator *bsi)
{
stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt);
tree new_var;
tree init_stmt;
tree vec_oprnd;
*************** vect_init_vector (tree stmt, tree vector
*** 1074,1087 ****
new_var = vect_get_new_vect_var (vector_type, vect_simple_var, "cst_");
add_referenced_var (new_var);
-
init_stmt = build_gimple_modify_stmt (new_var, vector_var);
new_temp = make_ssa_name (new_var, init_stmt);
GIMPLE_STMT_OPERAND (init_stmt, 0) = new_temp;
! pe = loop_preheader_edge (loop);
! new_bb = bsi_insert_on_edge_immediate (pe, init_stmt);
! gcc_assert (!new_bb);
if (vect_print_dump_info (REPORT_DETAILS))
{
--- 1223,1245 ----
new_var = vect_get_new_vect_var (vector_type, vect_simple_var, "cst_");
add_referenced_var (new_var);
init_stmt = build_gimple_modify_stmt (new_var, vector_var);
new_temp = make_ssa_name (new_var, init_stmt);
GIMPLE_STMT_OPERAND (init_stmt, 0) = new_temp;
! if (bsi)
! vect_finish_stmt_generation (stmt, init_stmt, bsi);
! else
! {
! loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
! struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
!
! if (nested_in_vect_loop_p (loop, stmt))
! loop = loop->inner;
! pe = loop_preheader_edge (loop);
! new_bb = bsi_insert_on_edge_immediate (pe, init_stmt);
! gcc_assert (!new_bb);
! }
if (vect_print_dump_info (REPORT_DETAILS))
{
*************** vect_init_vector (tree stmt, tree vector
*** 1097,1102 ****
--- 1255,1261 ----
/* Function get_initial_def_for_induction
Input:
+ STMT - a stmt that performs an induction operation in the loop.
IV_PHI - the initial value of the induction variable
Output:
*************** get_initial_def_for_induction (tree iv_p
*** 1115,1122 ****
tree vectype = get_vectype_for_scalar_type (scalar_type);
int nunits = TYPE_VECTOR_SUBPARTS (vectype);
edge pe = loop_preheader_edge (loop);
basic_block new_bb;
- block_stmt_iterator bsi;
tree vec, vec_init, vec_step, t;
tree access_fn;
tree new_var;
--- 1274,1281 ----
tree vectype = get_vectype_for_scalar_type (scalar_type);
int nunits = TYPE_VECTOR_SUBPARTS (vectype);
edge pe = loop_preheader_edge (loop);
+ struct loop *iv_loop;
basic_block new_bb;
tree vec, vec_init, vec_step, t;
tree access_fn;
tree new_var;
*************** get_initial_def_for_induction (tree iv_p
*** 1130,1137 ****
int ncopies = vf / nunits;
tree expr;
stmt_vec_info phi_info = vinfo_for_stmt (iv_phi);
tree stmts;
! tree stmt = NULL_TREE;
block_stmt_iterator si;
basic_block bb = bb_for_stmt (iv_phi);
--- 1289,1301 ----
int ncopies = vf / nunits;
tree expr;
stmt_vec_info phi_info = vinfo_for_stmt (iv_phi);
+ bool nested_in_vect_loop = false;
tree stmts;
! imm_use_iterator imm_iter;
! use_operand_p use_p;
! tree exit_phi;
! edge latch_e;
! tree loop_arg;
block_stmt_iterator si;
basic_block bb = bb_for_stmt (iv_phi);
*************** get_initial_def_for_induction (tree iv_p
*** 1140,1204 ****
/* Find the first insertion point in the BB. */
si = bsi_after_labels (bb);
- stmt = bsi_stmt (si);
! access_fn = analyze_scalar_evolution (loop, PHI_RESULT (iv_phi));
gcc_assert (access_fn);
! ok = vect_is_simple_iv_evolution (loop->num, access_fn,
! &init_expr, &step_expr);
gcc_assert (ok);
/* Create the vector that holds the initial_value of the induction. */
! new_var = vect_get_new_vect_var (scalar_type, vect_scalar_var, "var_");
! add_referenced_var (new_var);
!
! new_name = force_gimple_operand (init_expr, &stmts, false, new_var);
! if (stmts)
{
! new_bb = bsi_insert_on_edge_immediate (pe, stmts);
! gcc_assert (!new_bb);
}
!
! t = NULL_TREE;
! t = tree_cons (NULL_TREE, new_name, t);
! for (i = 1; i < nunits; i++)
{
! tree tmp;
! /* Create: new_name = new_name + step_expr */
! tmp = fold_build2 (PLUS_EXPR, scalar_type, new_name, step_expr);
! init_stmt = build_gimple_modify_stmt (new_var, tmp);
! new_name = make_ssa_name (new_var, init_stmt);
! GIMPLE_STMT_OPERAND (init_stmt, 0) = new_name;
! new_bb = bsi_insert_on_edge_immediate (pe, init_stmt);
! gcc_assert (!new_bb);
! if (vect_print_dump_info (REPORT_DETAILS))
! {
! fprintf (vect_dump, "created new init_stmt: ");
! print_generic_expr (vect_dump, init_stmt, TDF_SLIM);
! }
! t = tree_cons (NULL_TREE, new_name, t);
}
- vec = build_constructor_from_list (vectype, nreverse (t));
- vec_init = vect_init_vector (stmt, vec, vectype);
/* Create the vector that holds the step of the induction. */
! expr = build_int_cst (scalar_type, vf);
! new_name = fold_build2 (MULT_EXPR, scalar_type, expr, step_expr);
t = NULL_TREE;
for (i = 0; i < nunits; i++)
t = tree_cons (NULL_TREE, unshare_expr (new_name), t);
vec = build_constructor_from_list (vectype, t);
! vec_step = vect_init_vector (stmt, vec, vectype);
/* Create the following def-use cycle:
loop prolog:
! vec_init = [X, X+S, X+2*S, X+3*S]
! vec_step = [VF*S, VF*S, VF*S, VF*S]
loop:
vec_iv = PHI <vec_init, vec_loop>
...
--- 1304,1410 ----
/* Find the first insertion point in the BB. */
si = bsi_after_labels (bb);
! if (INTEGRAL_TYPE_P (scalar_type))
! step_expr = build_int_cst (scalar_type, 0);
! else
! step_expr = build_real (scalar_type, dconst0);
!
! /* Is phi in an inner-loop, while vectorizing an enclosing outer-loop? */
! if (nested_in_vect_loop_p (loop, iv_phi))
! {
! nested_in_vect_loop = true;
! iv_loop = loop->inner;
! }
! else
! iv_loop = loop;
! gcc_assert (iv_loop == (bb_for_stmt (iv_phi))->loop_father);
!
! latch_e = loop_latch_edge (iv_loop);
! loop_arg = PHI_ARG_DEF_FROM_EDGE (iv_phi, latch_e);
!
! access_fn = analyze_scalar_evolution (iv_loop, PHI_RESULT (iv_phi));
gcc_assert (access_fn);
! ok = vect_is_simple_iv_evolution (iv_loop->num, access_fn,
! &init_expr, &step_expr);
gcc_assert (ok);
+ pe = loop_preheader_edge (iv_loop);
/* Create the vector that holds the initial_value of the induction. */
! if (nested_in_vect_loop)
{
! /* iv_loop is nested in the loop to be vectorized. init_expr had already
! been created during vectorization of previous stmts; We obtain it from
! the STMT_VINFO_VEC_STMT of the defining stmt. */
! tree iv_def = PHI_ARG_DEF_FROM_EDGE (iv_phi, loop_preheader_edge (iv_loop));
! vec_init = vect_get_vec_def_for_operand (iv_def, iv_phi, NULL);
}
! else
{
! /* iv_loop is the loop to be vectorized. Create:
! vec_init = [X, X+S, X+2*S, X+3*S] (S = step_expr, X = init_expr) */
! new_var = vect_get_new_vect_var (scalar_type, vect_scalar_var, "var_");
! add_referenced_var (new_var);
! new_name = force_gimple_operand (init_expr, &stmts, false, new_var);
! if (stmts)
! {
! new_bb = bsi_insert_on_edge_immediate (pe, stmts);
! gcc_assert (!new_bb);
! }
! t = NULL_TREE;
! t = tree_cons (NULL_TREE, init_expr, t);
! for (i = 1; i < nunits; i++)
! {
! tree tmp;
! /* Create: new_name_i = new_name + step_expr */
! tmp = fold_build2 (PLUS_EXPR, scalar_type, new_name, step_expr);
! init_stmt = build_gimple_modify_stmt (new_var, tmp);
! new_name = make_ssa_name (new_var, init_stmt);
! GIMPLE_STMT_OPERAND (init_stmt, 0) = new_name;
!
! new_bb = bsi_insert_on_edge_immediate (pe, init_stmt);
! gcc_assert (!new_bb);
!
! if (vect_print_dump_info (REPORT_DETAILS))
! {
! fprintf (vect_dump, "created new init_stmt: ");
! print_generic_expr (vect_dump, init_stmt, TDF_SLIM);
! }
! t = tree_cons (NULL_TREE, new_name, t);
! }
! /* Create a vector from [new_name_0, new_name_1, ..., new_name_nunits-1] */
! vec = build_constructor_from_list (vectype, nreverse (t));
! vec_init = vect_init_vector (iv_phi, vec, vectype, NULL);
}
/* Create the vector that holds the step of the induction. */
! if (nested_in_vect_loop)
! /* iv_loop is nested in the loop to be vectorized. Generate:
! vec_step = [S, S, S, S] */
! new_name = step_expr;
! else
! {
! /* iv_loop is the loop to be vectorized. Generate:
! vec_step = [VF*S, VF*S, VF*S, VF*S] */
! expr = build_int_cst (scalar_type, vf);
! new_name = fold_build2 (MULT_EXPR, scalar_type, expr, step_expr);
! }
!
t = NULL_TREE;
for (i = 0; i < nunits; i++)
t = tree_cons (NULL_TREE, unshare_expr (new_name), t);
vec = build_constructor_from_list (vectype, t);
! vec_step = vect_init_vector (iv_phi, vec, vectype, NULL);
/* Create the following def-use cycle:
loop prolog:
! vec_init = ...
! vec_step = ...
loop:
vec_iv = PHI <vec_init, vec_loop>
...
*************** get_initial_def_for_induction (tree iv_p
*** 1209,1215 ****
/* Create the induction-phi that defines the induction-operand. */
vec_dest = vect_get_new_vect_var (vectype, vect_simple_var, "vec_iv_");
add_referenced_var (vec_dest);
! induction_phi = create_phi_node (vec_dest, loop->header);
set_stmt_info (get_stmt_ann (induction_phi),
new_stmt_vec_info (induction_phi, loop_vinfo));
induc_def = PHI_RESULT (induction_phi);
--- 1415,1421 ----
/* Create the induction-phi that defines the induction-operand. */
vec_dest = vect_get_new_vect_var (vectype, vect_simple_var, "vec_iv_");
add_referenced_var (vec_dest);
! induction_phi = create_phi_node (vec_dest, iv_loop->header);
set_stmt_info (get_stmt_ann (induction_phi),
new_stmt_vec_info (induction_phi, loop_vinfo));
induc_def = PHI_RESULT (induction_phi);
*************** get_initial_def_for_induction (tree iv_p
*** 1220,1234 ****
induc_def, vec_step));
vec_def = make_ssa_name (vec_dest, new_stmt);
GIMPLE_STMT_OPERAND (new_stmt, 0) = vec_def;
! bsi = bsi_for_stmt (stmt);
! vect_finish_stmt_generation (stmt, new_stmt, &bsi);
/* Set the arguments of the phi node: */
! add_phi_arg (induction_phi, vec_init, loop_preheader_edge (loop));
! add_phi_arg (induction_phi, vec_def, loop_latch_edge (loop));
! /* In case the vectorization factor (VF) is bigger than the number
of elements that we can fit in a vectype (nunits), we have to generate
more than one vector stmt - i.e - we need to "unroll" the
vector stmt by a factor VF/nunits. For more details see documentation
--- 1426,1441 ----
induc_def, vec_step));
vec_def = make_ssa_name (vec_dest, new_stmt);
GIMPLE_STMT_OPERAND (new_stmt, 0) = vec_def;
! bsi_insert_before (&si, new_stmt, BSI_SAME_STMT);
! set_stmt_info (get_stmt_ann (new_stmt),
! new_stmt_vec_info (new_stmt, loop_vinfo));
/* Set the arguments of the phi node: */
! add_phi_arg (induction_phi, vec_init, pe);
! add_phi_arg (induction_phi, vec_def, loop_latch_edge (iv_loop));
! /* In case that vectorization factor (VF) is bigger than the number
of elements that we can fit in a vectype (nunits), we have to generate
more than one vector stmt - i.e - we need to "unroll" the
vector stmt by a factor VF/nunits. For more details see documentation
*************** get_initial_def_for_induction (tree iv_p
*** 1237,1242 ****
--- 1444,1451 ----
if (ncopies > 1)
{
stmt_vec_info prev_stmt_vinfo;
+ /* FORNOW. This restriction should be relaxed. */
+ gcc_assert (!nested_in_vect_loop);
/* Create the vector that holds the step of the induction. */
expr = build_int_cst (scalar_type, nunits);
*************** get_initial_def_for_induction (tree iv_p
*** 1245,1251 ****
for (i = 0; i < nunits; i++)
t = tree_cons (NULL_TREE, unshare_expr (new_name), t);
vec = build_constructor_from_list (vectype, t);
! vec_step = vect_init_vector (stmt, vec, vectype);
vec_def = induc_def;
prev_stmt_vinfo = vinfo_for_stmt (induction_phi);
--- 1454,1460 ----
for (i = 0; i < nunits; i++)
t = tree_cons (NULL_TREE, unshare_expr (new_name), t);
vec = build_constructor_from_list (vectype, t);
! vec_step = vect_init_vector (iv_phi, vec, vectype, NULL);
vec_def = induc_def;
prev_stmt_vinfo = vinfo_for_stmt (induction_phi);
*************** get_initial_def_for_induction (tree iv_p
*** 1253,1271 ****
{
tree tmp;
! /* vec_i = vec_prev + vec_{step*nunits} */
tmp = build2 (PLUS_EXPR, vectype, vec_def, vec_step);
new_stmt = build_gimple_modify_stmt (NULL_TREE, tmp);
vec_def = make_ssa_name (vec_dest, new_stmt);
GIMPLE_STMT_OPERAND (new_stmt, 0) = vec_def;
! bsi = bsi_for_stmt (stmt);
! vect_finish_stmt_generation (stmt, new_stmt, &bsi);
!
STMT_VINFO_RELATED_STMT (prev_stmt_vinfo) = new_stmt;
prev_stmt_vinfo = vinfo_for_stmt (new_stmt);
}
}
if (vect_print_dump_info (REPORT_DETAILS))
{
fprintf (vect_dump, "transform induction: created def-use cycle:");
--- 1462,1511 ----
{
tree tmp;
! /* vec_i = vec_prev + vec_step */
tmp = build2 (PLUS_EXPR, vectype, vec_def, vec_step);
new_stmt = build_gimple_modify_stmt (NULL_TREE, tmp);
vec_def = make_ssa_name (vec_dest, new_stmt);
GIMPLE_STMT_OPERAND (new_stmt, 0) = vec_def;
! bsi_insert_before (&si, new_stmt, BSI_SAME_STMT);
! set_stmt_info (get_stmt_ann (new_stmt),
! new_stmt_vec_info (new_stmt, loop_vinfo));
STMT_VINFO_RELATED_STMT (prev_stmt_vinfo) = new_stmt;
prev_stmt_vinfo = vinfo_for_stmt (new_stmt);
}
}
+ if (nested_in_vect_loop)
+ {
+ /* Find the loop-closed exit-phi of the induction, and record
+ the final vector of induction results: */
+ exit_phi = NULL;
+ FOR_EACH_IMM_USE_FAST (use_p, imm_iter, loop_arg)
+ {
+ if (!flow_bb_inside_loop_p (iv_loop, bb_for_stmt (USE_STMT (use_p))))
+ {
+ exit_phi = USE_STMT (use_p);
+ break;
+ }
+ }
+ if (exit_phi)
+ {
+ stmt_vec_info stmt_vinfo = vinfo_for_stmt (exit_phi);
+ /* FORNOW. Currently not supporting the case that an inner-loop induction
+ is not used in the outer-loop (i.e. only outside the outer-loop). */
+ gcc_assert (STMT_VINFO_RELEVANT_P (stmt_vinfo)
+ && !STMT_VINFO_LIVE_P (stmt_vinfo));
+
+ STMT_VINFO_VEC_STMT (stmt_vinfo) = new_stmt;
+ if (vect_print_dump_info (REPORT_DETAILS))
+ {
+ fprintf (vect_dump, "vector of inductions after inner-loop:");
+ print_generic_expr (vect_dump, new_stmt, TDF_SLIM);
+ }
+ }
+ }
+
+
if (vect_print_dump_info (REPORT_DETAILS))
{
fprintf (vect_dump, "transform induction: created def-use cycle:");
*************** vect_get_vec_def_for_operand (tree op, t
*** 1301,1307 ****
tree vectype = STMT_VINFO_VECTYPE (stmt_vinfo);
int nunits = TYPE_VECTOR_SUBPARTS (vectype);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
- struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
tree vec_inv;
tree vec_cst;
tree t = NULL_TREE;
--- 1541,1546 ----
*************** vect_get_vec_def_for_operand (tree op, t
*** 1352,1358 ****
vector_type = get_vectype_for_scalar_type (TREE_TYPE (op));
vec_cst = build_vector (vector_type, t);
! return vect_init_vector (stmt, vec_cst, vector_type);
}
/* Case 2: operand is defined outside the loop - loop invariant. */
--- 1591,1597 ----
vector_type = get_vectype_for_scalar_type (TREE_TYPE (op));
vec_cst = build_vector (vector_type, t);
! return vect_init_vector (stmt, vec_cst, vector_type, NULL);
}
/* Case 2: operand is defined outside the loop - loop invariant. */
*************** vect_get_vec_def_for_operand (tree op, t
*** 1373,1380 ****
/* FIXME: use build_constructor directly. */
vector_type = get_vectype_for_scalar_type (TREE_TYPE (def));
vec_inv = build_constructor_from_list (vector_type, t);
!
! return vect_init_vector (stmt, vec_inv, vector_type);
}
/* Case 3: operand is defined inside the loop. */
--- 1612,1618 ----
/* FIXME: use build_constructor directly. */
vector_type = get_vectype_for_scalar_type (TREE_TYPE (def));
vec_inv = build_constructor_from_list (vector_type, t);
! return vect_init_vector (stmt, vec_inv, vector_type, NULL);
}
/* Case 3: operand is defined inside the loop. */
*************** vect_get_vec_def_for_operand (tree op, t
*** 1387,1400 ****
def_stmt_info = vinfo_for_stmt (def_stmt);
vec_stmt = STMT_VINFO_VEC_STMT (def_stmt_info);
gcc_assert (vec_stmt);
! vec_oprnd = GIMPLE_STMT_OPERAND (vec_stmt, 0);
return vec_oprnd;
}
/* Case 4: operand is defined by a loop header phi - reduction */
case vect_reduction_def:
{
gcc_assert (TREE_CODE (def_stmt) == PHI_NODE);
/* Get the def before the loop */
op = PHI_ARG_DEF_FROM_EDGE (def_stmt, loop_preheader_edge (loop));
--- 1625,1644 ----
def_stmt_info = vinfo_for_stmt (def_stmt);
vec_stmt = STMT_VINFO_VEC_STMT (def_stmt_info);
gcc_assert (vec_stmt);
! if (TREE_CODE (vec_stmt) == PHI_NODE)
! vec_oprnd = PHI_RESULT (vec_stmt);
! else
! vec_oprnd = GIMPLE_STMT_OPERAND (vec_stmt, 0);
return vec_oprnd;
}
/* Case 4: operand is defined by a loop header phi - reduction */
case vect_reduction_def:
{
+ struct loop *loop;
+
gcc_assert (TREE_CODE (def_stmt) == PHI_NODE);
+ loop = (bb_for_stmt (def_stmt))->loop_father;
/* Get the def before the loop */
op = PHI_ARG_DEF_FROM_EDGE (def_stmt, loop_preheader_edge (loop));
*************** vect_get_vec_def_for_operand (tree op, t
*** 1406,1413 ****
{
gcc_assert (TREE_CODE (def_stmt) == PHI_NODE);
! /* Get the def before the loop */
! return get_initial_def_for_induction (def_stmt);
}
default:
--- 1650,1661 ----
{
gcc_assert (TREE_CODE (def_stmt) == PHI_NODE);
! /* Get the def from the vectorized stmt. */
! def_stmt_info = vinfo_for_stmt (def_stmt);
! vec_stmt = STMT_VINFO_VEC_STMT (def_stmt_info);
! gcc_assert (vec_stmt && (TREE_CODE (vec_stmt) == PHI_NODE));
! vec_oprnd = PHI_RESULT (vec_stmt);
! return vec_oprnd;
}
default:
*************** vect_get_vec_def_for_stmt_copy (enum vec
*** 1488,1494 ****
vec_stmt_for_operand = STMT_VINFO_RELATED_STMT (def_stmt_info);
gcc_assert (vec_stmt_for_operand);
vec_oprnd = GIMPLE_STMT_OPERAND (vec_stmt_for_operand, 0);
-
return vec_oprnd;
}
--- 1736,1741 ----
*************** vect_finish_stmt_generation (tree stmt,
*** 1504,1510 ****
--- 1751,1761 ----
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ gcc_assert (stmt == bsi_stmt (*bsi));
+ gcc_assert (TREE_CODE (stmt) != LABEL_EXPR);
+
bsi_insert_before (bsi, vec_stmt, BSI_SAME_STMT);
+
set_stmt_info (get_stmt_ann (vec_stmt),
new_stmt_vec_info (vec_stmt, loop_vinfo));
*************** static tree
*** 1572,1577 ****
--- 1823,1830 ----
get_initial_def_for_reduction (tree stmt, tree init_val, tree *adjustment_def)
{
stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt);
+ loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
tree vectype = STMT_VINFO_VECTYPE (stmt_vinfo);
int nunits = TYPE_VECTOR_SUBPARTS (vectype);
enum tree_code code = TREE_CODE (GIMPLE_STMT_OPERAND (stmt, 1));
*************** get_initial_def_for_reduction (tree stmt
*** 1582,1589 ****
--- 1835,1848 ----
tree t = NULL_TREE;
int i;
tree vector_type;
+ bool nested_in_vect_loop = false;
gcc_assert (INTEGRAL_TYPE_P (type) || SCALAR_FLOAT_TYPE_P (type));
+ if (nested_in_vect_loop_p (loop, stmt))
+ nested_in_vect_loop = true;
+ else
+ gcc_assert (loop == (bb_for_stmt (stmt))->loop_father);
+
vecdef = vect_get_vec_def_for_operand (init_val, stmt, NULL);
switch (code)
*************** get_initial_def_for_reduction (tree stmt
*** 1591,1597 ****
case WIDEN_SUM_EXPR:
case DOT_PROD_EXPR:
case PLUS_EXPR:
! *adjustment_def = init_val;
/* Create a vector of zeros for init_def. */
if (INTEGRAL_TYPE_P (type))
def_for_init = build_int_cst (type, 0);
--- 1850,1859 ----
case WIDEN_SUM_EXPR:
case DOT_PROD_EXPR:
case PLUS_EXPR:
! if (nested_in_vect_loop)
! *adjustment_def = vecdef;
! else
! *adjustment_def = init_val;
/* Create a vector of zeros for init_def. */
if (INTEGRAL_TYPE_P (type))
def_for_init = build_int_cst (type, 0);
*************** vect_create_epilog_for_reduction (tree v
*** 1680,1703 ****
tree new_phi;
block_stmt_iterator exit_bsi;
tree vec_dest;
! tree new_temp;
tree new_name;
! tree epilog_stmt;
! tree new_scalar_dest, exit_phi;
tree bitsize, bitpos, bytesize;
enum tree_code code = TREE_CODE (GIMPLE_STMT_OPERAND (stmt, 1));
! tree scalar_initial_def;
tree vec_initial_def;
tree orig_name;
imm_use_iterator imm_iter;
use_operand_p use_p;
! bool extract_scalar_result;
! tree reduction_op;
tree orig_stmt;
tree use_stmt;
tree operation = GIMPLE_STMT_OPERAND (stmt, 1);
int op_type;
op_type = TREE_OPERAND_LENGTH (operation);
reduction_op = TREE_OPERAND (operation, op_type-1);
vectype = get_vectype_for_scalar_type (TREE_TYPE (reduction_op));
--- 1942,1972 ----
tree new_phi;
block_stmt_iterator exit_bsi;
tree vec_dest;
! tree new_temp = NULL_TREE;
tree new_name;
! tree epilog_stmt = NULL_TREE;
! tree new_scalar_dest, exit_phi, new_dest;
tree bitsize, bitpos, bytesize;
enum tree_code code = TREE_CODE (GIMPLE_STMT_OPERAND (stmt, 1));
! tree adjustment_def;
tree vec_initial_def;
tree orig_name;
imm_use_iterator imm_iter;
use_operand_p use_p;
! bool extract_scalar_result = false;
! tree reduction_op, expr;
tree orig_stmt;
tree use_stmt;
tree operation = GIMPLE_STMT_OPERAND (stmt, 1);
+ bool nested_in_vect_loop = false;
int op_type;
+ if (nested_in_vect_loop_p (loop, stmt))
+ {
+ loop = loop->inner;
+ nested_in_vect_loop = true;
+ }
+
op_type = TREE_OPERAND_LENGTH (operation);
reduction_op = TREE_OPERAND (operation, op_type-1);
vectype = get_vectype_for_scalar_type (TREE_TYPE (reduction_op));
*************** vect_create_epilog_for_reduction (tree v
*** 1710,1716 ****
the scalar def before the loop, that defines the initial value
of the reduction variable. */
vec_initial_def = vect_get_vec_def_for_operand (reduction_op, stmt,
! &scalar_initial_def);
add_phi_arg (reduction_phi, vec_initial_def, loop_preheader_edge (loop));
/* 1.2 set the loop-latch arg for the reduction-phi: */
--- 1979,1985 ----
the scalar def before the loop, that defines the initial value
of the reduction variable. */
vec_initial_def = vect_get_vec_def_for_operand (reduction_op, stmt,
! &adjustment_def);
add_phi_arg (reduction_phi, vec_initial_def, loop_preheader_edge (loop));
/* 1.2 set the loop-latch arg for the reduction-phi: */
*************** vect_create_epilog_for_reduction (tree v
*** 1789,1794 ****
--- 2058,2072 ----
bitsize = TYPE_SIZE (scalar_type);
bytesize = TYPE_SIZE_UNIT (scalar_type);
+
+ /* In case this is a reduction in an inner-loop while vectorizing an outer
+ loop - we don't need to extract a single scalar result at the end of the
+ inner-loop. The final vector of partial results will be used in the
+ vectorized outer-loop, or reduced to a scalar result at the end of the
+ outer-loop. */
+ if (nested_in_vect_loop)
+ goto vect_finalize_reduction;
+
/* 2.3 Create the reduction code, using one of the three schemes described
above. */
*************** vect_create_epilog_for_reduction (tree v
*** 1935,1940 ****
--- 2213,2219 ----
{
tree rhs;
+ gcc_assert (!nested_in_vect_loop);
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "extract scalar result");
*************** vect_create_epilog_for_reduction (tree v
*** 1953,1977 ****
bsi_insert_before (&exit_bsi, epilog_stmt, BSI_SAME_STMT);
}
! /* 2.4 Adjust the final result by the initial value of the reduction
variable. (When such adjustment is not needed, then
! 'scalar_initial_def' is zero).
! Create:
! s_out4 = scalar_expr <s_out3, scalar_initial_def> */
!
! if (scalar_initial_def)
{
! tree tmp = build2 (code, scalar_type, new_temp, scalar_initial_def);
! epilog_stmt = build_gimple_modify_stmt (new_scalar_dest, tmp);
! new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
GIMPLE_STMT_OPERAND (epilog_stmt, 0) = new_temp;
bsi_insert_before (&exit_bsi, epilog_stmt, BSI_SAME_STMT);
}
- /* 2.6 Replace uses of s_out0 with uses of s_out3 */
! /* Find the loop-closed-use at the loop exit of the original scalar result.
(The reduction result is expected to have two immediate uses - one at the
latch block, and one at the loop exit). */
exit_phi = NULL;
--- 2232,2273 ----
bsi_insert_before (&exit_bsi, epilog_stmt, BSI_SAME_STMT);
}
! vect_finalize_reduction:
!
! /* 2.5 Adjust the final result by the initial value of the reduction
variable. (When such adjustment is not needed, then
! 'adjustment_def' is zero). For example, if code is PLUS we create:
! new_temp = loop_exit_def + adjustment_def */
! if (adjustment_def)
{
! if (nested_in_vect_loop)
! {
! gcc_assert (TREE_CODE (TREE_TYPE (adjustment_def)) == VECTOR_TYPE);
! expr = build2 (code, vectype, PHI_RESULT (new_phi), adjustment_def);
! new_dest = vect_create_destination_var (scalar_dest, vectype);
! }
! else
! {
! gcc_assert (TREE_CODE (TREE_TYPE (adjustment_def)) != VECTOR_TYPE);
! expr = build2 (code, scalar_type, new_temp, adjustment_def);
! new_dest = vect_create_destination_var (scalar_dest, scalar_type);
! }
! epilog_stmt = build_gimple_modify_stmt (new_dest, expr);
! new_temp = make_ssa_name (new_dest, epilog_stmt);
GIMPLE_STMT_OPERAND (epilog_stmt, 0) = new_temp;
+ #if 0
+ bsi_insert_after (&exit_bsi, epilog_stmt, BSI_NEW_STMT);
+ #else
bsi_insert_before (&exit_bsi, epilog_stmt, BSI_SAME_STMT);
+ #endif
}
! /* 2.6 Handle the loop-exit phi */
!
! /* Replace uses of s_out0 with uses of s_out3:
! Find the loop-closed-use at the loop exit of the original scalar result.
(The reduction result is expected to have two immediate uses - one at the
latch block, and one at the loop exit). */
exit_phi = NULL;
*************** vect_create_epilog_for_reduction (tree v
*** 1985,1990 ****
--- 2281,2309 ----
}
/* We expect to have found an exit_phi because of loop-closed-ssa form. */
gcc_assert (exit_phi);
+
+ if (nested_in_vect_loop)
+ {
+ stmt_vec_info stmt_vinfo = vinfo_for_stmt (exit_phi);
+
+ /* FORNOW. Currently not supporting the case that an inner-loop reduction
+ is not used in the outer-loop (but only outside the outer-loop). */
+ gcc_assert (STMT_VINFO_RELEVANT_P (stmt_vinfo)
+ && !STMT_VINFO_LIVE_P (stmt_vinfo));
+
+ epilog_stmt = adjustment_def ? epilog_stmt : new_phi;
+ STMT_VINFO_VEC_STMT (stmt_vinfo) = epilog_stmt;
+ set_stmt_info (get_stmt_ann (epilog_stmt),
+ new_stmt_vec_info (epilog_stmt, loop_vinfo));
+
+ if (vect_print_dump_info (REPORT_DETAILS))
+ {
+ fprintf (vect_dump, "vector of partial results after inner-loop:");
+ print_generic_expr (vect_dump, epilog_stmt, TDF_SLIM);
+ }
+ return;
+ }
+
/* Replace the uses: */
orig_name = PHI_RESULT (exit_phi);
FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, orig_name)
*************** vectorizable_reduction (tree stmt, block
*** 2066,2080 ****
tree new_stmt = NULL_TREE;
int j;
gcc_assert (ncopies >= 1);
/* 1. Is vectorizable reduction? */
/* Not supportable if the reduction variable is used in the loop. */
! if (STMT_VINFO_RELEVANT_P (stmt_info))
return false;
! if (!STMT_VINFO_LIVE_P (stmt_info))
return false;
/* Make sure it was already recognized as a reduction computation. */
--- 2385,2414 ----
tree new_stmt = NULL_TREE;
int j;
+ if (nested_in_vect_loop_p (loop, stmt))
+ {
+ loop = loop->inner;
+ /* FORNOW. This restriction should be relaxed. */
+ if (ncopies > 1)
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "multiple types in nested loop.");
+ return false;
+ }
+ }
+
gcc_assert (ncopies >= 1);
/* 1. Is vectorizable reduction? */
/* Not supportable if the reduction variable is used in the loop. */
! if (STMT_VINFO_RELEVANT (stmt_info) > vect_used_in_outer)
return false;
! /* Reductions that are not used even in an enclosing outer-loop,
! are expected to be "live" (used out of the loop). */
! if (STMT_VINFO_RELEVANT (stmt_info) == vect_unused_in_loop
! && !STMT_VINFO_LIVE_P (stmt_info))
return false;
/* Make sure it was already recognized as a reduction computation. */
*************** vectorizable_reduction (tree stmt, block
*** 2131,2139 ****
gcc_assert (dt == vect_reduction_def);
gcc_assert (TREE_CODE (def_stmt) == PHI_NODE);
if (orig_stmt)
! gcc_assert (orig_stmt == vect_is_simple_reduction (loop, def_stmt));
else
! gcc_assert (stmt == vect_is_simple_reduction (loop, def_stmt));
if (STMT_VINFO_LIVE_P (vinfo_for_stmt (def_stmt)))
return false;
--- 2465,2473 ----
gcc_assert (dt == vect_reduction_def);
gcc_assert (TREE_CODE (def_stmt) == PHI_NODE);
if (orig_stmt)
! gcc_assert (orig_stmt == vect_is_simple_reduction (loop_vinfo, def_stmt));
else
! gcc_assert (stmt == vect_is_simple_reduction (loop_vinfo, def_stmt));
if (STMT_VINFO_LIVE_P (vinfo_for_stmt (def_stmt)))
return false;
*************** vectorizable_call (tree stmt, block_stmt
*** 2358,2363 ****
--- 2692,2698 ----
int nunits_in;
int nunits_out;
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
tree fndecl, rhs, new_temp, def, def_stmt, rhs_type, lhs_type;
enum vect_def_type dt[2] = {vect_unknown_def_type, vect_unknown_def_type};
tree new_stmt;
*************** vectorizable_call (tree stmt, block_stmt
*** 2467,2472 ****
--- 2802,2815 ----
needs to be generated. */
gcc_assert (ncopies >= 1);
+ /* FORNOW. This restriction should be relaxed. */
+ if (nested_in_vect_loop_p (loop, stmt) && ncopies > 1)
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "multiple types in nested loop.");
+ return false;
+ }
+
if (!vec_stmt) /* transformation not required. */
{
STMT_VINFO_TYPE (stmt_info) = call_vec_info_type;
*************** vectorizable_call (tree stmt, block_stmt
*** 2481,2486 ****
--- 2824,2837 ----
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "transform operation.");
+ /* FORNOW. This restriction should be relaxed. */
+ if (nested_in_vect_loop_p (loop, stmt) && ncopies > 1)
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "multiple types in nested loop.");
+ return false;
+ }
+
/* Handle def. */
scalar_dest = GIMPLE_STMT_OPERAND (stmt, 0);
vec_dest = vect_create_destination_var (scalar_dest, vectype_out);
*************** vectorizable_conversion (tree stmt, bloc
*** 2672,2677 ****
--- 3023,3029 ----
tree vec_oprnd0 = NULL_TREE, vec_oprnd1 = NULL_TREE;
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
enum tree_code code, code1 = ERROR_MARK, code2 = ERROR_MARK;
tree decl1 = NULL_TREE, decl2 = NULL_TREE;
tree new_temp;
*************** vectorizable_conversion (tree stmt, bloc
*** 2753,2758 ****
--- 3105,3118 ----
needs to be generated. */
gcc_assert (ncopies >= 1);
+ /* FORNOW. This restriction should be relaxed. */
+ if (nested_in_vect_loop_p (loop, stmt) && ncopies > 1)
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "multiple types in nested loop.");
+ return false;
+ }
+
/* Check the operands of the operation. */
if (!vect_is_simple_use (op0, loop_vinfo, &def_stmt, &def, &dt0))
{
*************** vectorizable_operation (tree stmt, block
*** 3094,3099 ****
--- 3454,3460 ----
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
tree vectype = STMT_VINFO_VECTYPE (stmt_info);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
enum tree_code code;
enum machine_mode vec_mode;
tree new_temp;
*************** vectorizable_operation (tree stmt, block
*** 3112,3117 ****
--- 3473,3485 ----
int j;
gcc_assert (ncopies >= 1);
+ /* FORNOW. This restriction should be relaxed. */
+ if (nested_in_vect_loop_p (loop, stmt) && ncopies > 1)
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "multiple types in nested loop.");
+ return false;
+ }
if (!STMT_VINFO_RELEVANT_P (stmt_info))
return false;
*************** vectorizable_type_demotion (tree stmt, b
*** 3374,3379 ****
--- 3742,3748 ----
tree vec_oprnd0=NULL, vec_oprnd1=NULL;
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
enum tree_code code, code1 = ERROR_MARK;
tree new_temp;
tree def, def_stmt;
*************** vectorizable_type_demotion (tree stmt, b
*** 3426,3431 ****
--- 3795,3807 ----
ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_out;
gcc_assert (ncopies >= 1);
+ /* FORNOW. This restriction should be relaxed. */
+ if (nested_in_vect_loop_p (loop, stmt) && ncopies > 1)
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "multiple types in nested loop.");
+ return false;
+ }
if (! ((INTEGRAL_TYPE_P (TREE_TYPE (scalar_dest))
&& INTEGRAL_TYPE_P (TREE_TYPE (op0)))
*************** vectorizable_type_promotion (tree stmt,
*** 3523,3528 ****
--- 3899,3905 ----
tree vec_oprnd0=NULL, vec_oprnd1=NULL;
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
enum tree_code code, code1 = ERROR_MARK, code2 = ERROR_MARK;
tree decl1 = NULL_TREE, decl2 = NULL_TREE;
int op_type;
*************** vectorizable_type_promotion (tree stmt,
*** 3576,3581 ****
--- 3953,3965 ----
ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_in;
gcc_assert (ncopies >= 1);
+ /* FORNOW. This restriction should be relaxed. */
+ if (nested_in_vect_loop_p (loop, stmt) && ncopies > 1)
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "multiple types in nested loop.");
+ return false;
+ }
if (! ((INTEGRAL_TYPE_P (TREE_TYPE (scalar_dest))
&& INTEGRAL_TYPE_P (TREE_TYPE (op0)))
*************** vectorizable_store (tree stmt, block_stm
*** 3868,3876 ****
struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info), *first_dr = NULL;
tree vectype = STMT_VINFO_VECTYPE (stmt_info);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
enum machine_mode vec_mode;
tree dummy;
! enum dr_alignment_support alignment_support_cheme;
ssa_op_iter iter;
def_operand_p def_p;
tree def, def_stmt;
--- 4252,4261 ----
struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info), *first_dr = NULL;
tree vectype = STMT_VINFO_VECTYPE (stmt_info);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
enum machine_mode vec_mode;
tree dummy;
! enum dr_alignment_support alignment_support_scheme;
ssa_op_iter iter;
def_operand_p def_p;
tree def, def_stmt;
*************** vectorizable_store (tree stmt, block_stm
*** 3884,3891 ****
--- 4269,4286 ----
bool strided_store = false;
unsigned int group_size, i;
VEC(tree,heap) *dr_chain = NULL, *oprnds = NULL, *result_chain = NULL;
+ bool inv_p;
+
gcc_assert (ncopies >= 1);
+ /* FORNOW. This restriction should be relaxed. */
+ if (nested_in_vect_loop_p (loop, stmt) && ncopies > 1)
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "multiple types in nested loop.");
+ return false;
+ }
+
if (!STMT_VINFO_RELEVANT_P (stmt_info))
return false;
*************** vectorizable_store (tree stmt, block_stm
*** 3951,3956 ****
--- 4346,4354 ----
DR_GROUP_STORE_COUNT (vinfo_for_stmt (first_stmt))++;
+ /* FORNOW */
+ gcc_assert (!nested_in_vect_loop_p (loop, stmt));
+
/* We vectorize all the stmts of the interleaving group when we
reach the last stmt in the group. */
if (DR_GROUP_STORE_COUNT (vinfo_for_stmt (first_stmt))
*************** vectorizable_store (tree stmt, block_stm
*** 3973,3981 ****
dr_chain = VEC_alloc (tree, heap, group_size);
oprnds = VEC_alloc (tree, heap, group_size);
! alignment_support_cheme = vect_supportable_dr_alignment (first_dr);
! gcc_assert (alignment_support_cheme);
! gcc_assert (alignment_support_cheme == dr_aligned); /* FORNOW */
/* In case the vectorization factor (VF) is bigger than the number
of elements that we can fit in a vectype (nunits), we have to generate
--- 4371,4379 ----
dr_chain = VEC_alloc (tree, heap, group_size);
oprnds = VEC_alloc (tree, heap, group_size);
! alignment_support_scheme = vect_supportable_dr_alignment (first_dr);
! gcc_assert (alignment_support_scheme);
! gcc_assert (alignment_support_scheme == dr_aligned); /* FORNOW */
/* In case the vectorization factor (VF) is bigger than the number
of elements that we can fit in a vectype (nunits), we have to generate
*************** vectorizable_store (tree stmt, block_stm
*** 4045,4053 ****
VEC_quick_push(tree, oprnds, vec_oprnd);
next_stmt = DR_GROUP_NEXT_DR (vinfo_for_stmt (next_stmt));
}
! dataref_ptr = vect_create_data_ref_ptr (first_stmt, bsi, NULL_TREE,
&dummy, &ptr_incr, false,
! TREE_TYPE (vec_oprnd));
}
else
{
--- 4443,4452 ----
VEC_quick_push(tree, oprnds, vec_oprnd);
next_stmt = DR_GROUP_NEXT_DR (vinfo_for_stmt (next_stmt));
}
! dataref_ptr = vect_create_data_ref_ptr (first_stmt, NULL, NULL_TREE,
&dummy, &ptr_incr, false,
! TREE_TYPE (vec_oprnd), &inv_p);
! gcc_assert (!inv_p);
}
else
{
*************** vectorizable_store (tree stmt, block_stm
*** 4065,4071 ****
VEC_replace(tree, dr_chain, i, vec_oprnd);
VEC_replace(tree, oprnds, i, vec_oprnd);
}
! dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, bsi, stmt);
}
if (strided_store)
--- 4464,4471 ----
VEC_replace(tree, dr_chain, i, vec_oprnd);
VEC_replace(tree, oprnds, i, vec_oprnd);
}
! dataref_ptr =
! bump_vector_ptr (dataref_ptr, ptr_incr, bsi, stmt, NULL_TREE);
}
if (strided_store)
*************** vectorizable_store (tree stmt, block_stm
*** 4125,4131 ****
if (!next_stmt)
break;
/* Bump the vector pointer. */
! dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, bsi, stmt);
}
}
--- 4525,4532 ----
if (!next_stmt)
break;
/* Bump the vector pointer. */
! dataref_ptr =
! bump_vector_ptr (dataref_ptr, ptr_incr, bsi, stmt, NULL_TREE);
}
}
*************** vectorizable_store (tree stmt, block_stm
*** 4136,4149 ****
/* Function vect_setup_realignment
This function is called when vectorizing an unaligned load using
! the dr_unaligned_software_pipeline scheme.
This function generates the following code at the loop prolog:
p = initial_addr;
! msq_init = *(floor(p)); # prolog load
realignment_token = call target_builtin;
loop:
! msq = phi (msq_init, ---)
The code above sets up a new (vector) pointer, pointing to the first
location accessed by STMT, and a "floor-aligned" load using that pointer.
--- 4537,4553 ----
/* Function vect_setup_realignment
This function is called when vectorizing an unaligned load using
! the dr_explicit_realign[_optimized] scheme.
This function generates the following code at the loop prolog:
p = initial_addr;
! x msq_init = *(floor(p)); # prolog load
realignment_token = call target_builtin;
loop:
! x msq = phi (msq_init, ---)
!
! The stmts marked with x are generated only for the case of
! dr_explicit_realign_optimized.
The code above sets up a new (vector) pointer, pointing to the first
location accessed by STMT, and a "floor-aligned" load using that pointer.
*************** vectorizable_store (tree stmt, block_stm
*** 4152,4170 ****
whose arguments are the result of the prolog-load (created by this
function) and the result of a load that takes place in the loop (to be
created by the caller to this function).
The caller to this function uses the phi-result (msq) to create the
realignment code inside the loop, and sets up the missing phi argument,
as follows:
-
loop:
msq = phi (msq_init, lsq)
lsq = *(floor(p')); # load in loop
result = realign_load (msq, lsq, realignment_token);
Input:
STMT - (scalar) load stmt to be vectorized. This load accesses
a memory location that may be unaligned.
BSI - place where new code is to be inserted.
Output:
REALIGNMENT_TOKEN - the result of a call to the builtin_mask_for_load
--- 4556,4584 ----
whose arguments are the result of the prolog-load (created by this
function) and the result of a load that takes place in the loop (to be
created by the caller to this function).
+
+ For the case of dr_explicit_realign_optimizedr:
The caller to this function uses the phi-result (msq) to create the
realignment code inside the loop, and sets up the missing phi argument,
as follows:
loop:
msq = phi (msq_init, lsq)
lsq = *(floor(p')); # load in loop
result = realign_load (msq, lsq, realignment_token);
+ For the case of dr_explicit_realign:
+ loop:
+ msq = *(floor(p)); # load in loop
+ p' = p + (VS-1);
+ lsq = *(floor(p')); # load in loop
+ result = realign_load (msq, lsq, realignment_token);
+
Input:
STMT - (scalar) load stmt to be vectorized. This load accesses
a memory location that may be unaligned.
BSI - place where new code is to be inserted.
+ ALIGNMENT_SUPPORT_SCHEME - which of the two misalignment handling schemes
+ is used.
Output:
REALIGNMENT_TOKEN - the result of a call to the builtin_mask_for_load
*************** vectorizable_store (tree stmt, block_stm
*** 4173,4217 ****
static tree
vect_setup_realignment (tree stmt, block_stmt_iterator *bsi,
! tree *realignment_token)
{
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
tree vectype = STMT_VINFO_VECTYPE (stmt_info);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
! edge pe = loop_preheader_edge (loop);
tree scalar_dest = GIMPLE_STMT_OPERAND (stmt, 0);
tree vec_dest;
- tree init_addr;
tree inc;
tree ptr;
tree data_ref;
tree new_stmt;
basic_block new_bb;
! tree msq_init;
tree new_temp;
tree phi_stmt;
! tree msq;
! /* 1. Create msq_init = *(floor(p1)) in the loop preheader */
! vec_dest = vect_create_destination_var (scalar_dest, vectype);
! ptr = vect_create_data_ref_ptr (stmt, bsi, NULL_TREE, &init_addr, &inc, true,
! NULL_TREE);
! data_ref = build1 (ALIGN_INDIRECT_REF, vectype, ptr);
! new_stmt = build_gimple_modify_stmt (vec_dest, data_ref);
! new_temp = make_ssa_name (vec_dest, new_stmt);
! GIMPLE_STMT_OPERAND (new_stmt, 0) = new_temp;
! new_bb = bsi_insert_on_edge_immediate (pe, new_stmt);
! gcc_assert (!new_bb);
! msq_init = GIMPLE_STMT_OPERAND (new_stmt, 0);
! copy_virtual_operands (new_stmt, stmt);
! update_vuses_to_preheader (new_stmt, loop);
- /* 2. Create permutation mask, if required, in loop preheader. */
if (targetm.vectorize.builtin_mask_for_load)
{
tree builtin_decl;
builtin_decl = targetm.vectorize.builtin_mask_for_load ();
new_stmt = build_call_expr (builtin_decl, 1, init_addr);
vec_dest = vect_create_destination_var (scalar_dest,
--- 4587,4733 ----
static tree
vect_setup_realignment (tree stmt, block_stmt_iterator *bsi,
! tree *realignment_token,
! enum dr_alignment_support alignment_support_scheme,
! tree init_addr,
! struct loop **at_loop)
{
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
tree vectype = STMT_VINFO_VECTYPE (stmt_info);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
! edge pe;
tree scalar_dest = GIMPLE_STMT_OPERAND (stmt, 0);
tree vec_dest;
tree inc;
tree ptr;
tree data_ref;
tree new_stmt;
basic_block new_bb;
! tree msq_init = NULL_TREE;
tree new_temp;
tree phi_stmt;
! tree msq = NULL_TREE;
! tree stmts = NULL_TREE;
! bool inv_p;
! bool compute_in_loop = false;
! bool nested_in_vect_loop = nested_in_vect_loop_p (loop, stmt);
! struct loop *containing_loop = (bb_for_stmt (stmt))->loop_father;
! struct loop *loop_for_initial_load;
!
! gcc_assert (alignment_support_scheme == dr_explicit_realign
! || alignment_support_scheme == dr_explicit_realign_optimized);
!
! /* We need to generate three things:
! 1. the misalignment computation
! 2. the extra vector load (for the optimized realignment scheme).
! 3. the phi node for the two vectors from which the realignment is
! done (for the optimized realignment scheme).
! */
!
! /* 1. Determine where to generate the misalignment computation.
!
! If INIT_ADDR is NULL_TREE, this indicates that the misalignment
! calculation will be generated by this function, outside the loop (in the
! preheader). Otherwise, INIT_ADDR had already been computed for us by the
! caller, inside the loop.
!
! Background: If the misalignment remains fixed throughout the iterations of
! the loop, then both realignment schemes are applicable, and also the
! misalignment computation can be done outside LOOP. This is because we are
! vectorizing LOOP, and so the memory accesses in LOOP advance in steps that
! are a multiple of VS (the Vector Size), and therefore the misalignment in
! different vectorized LOOP iterations is always the same.
! The problem arises only if the memory access is in an inner-loop nested
! inside LOOP, which is now being vectorized using outer-loop vectorization.
! This is the only case when the misalignment of the memory access may not
! remain fixed thtoughout the iterations of the inner-loop (as exaplained in
! detail in vect_supportable_dr_alignment). In this case, not only is the
! optimized realignment scheme not applicable, but also the misalignment
! computation (and generation of the realignment token that is passed to
! REALIGN_LOAD) have to be done inside the loop.
!
! In short, INIT_ADDR indicates whether we are in a COMPUTE_IN_LOOP mode
! or not, which in turn determines if the misalignment is computed inside
! the inner-loop, or outside LOOP. */
!
! if (init_addr != NULL_TREE)
! {
! compute_in_loop = true;
! gcc_assert (alignment_support_scheme == dr_explicit_realign);
! }
!
!
! /* 2. Determine where to generate the extra vector load.
!
! For the optimized realignment scheme, instead of generating two vector
! loads in each iteration, we generate a single extra vector load in the
! preheader of the loop, and in each iteration reuse the result of the
! vector load from the previous iteration. In case the memory access is in
! an inner-loop nested inside LOOP, which is now being vectorized using
! outer-loop vectorization, we need to determine whether this initial vector
! load should be generated at the preheader of the inner-loop, or can be
! generated at the preheader of LOOP. If the memory access has no evolution
! in LOOP, it can be generated in the preheader of LOOP. Otherwise, it has
! to be generated inside LOOP (in the preheader of the inner-loop). */
!
! if (nested_in_vect_loop)
! {
! tree outerloop_step = STMT_VINFO_DR_STEP (stmt_info);
! bool invariant_in_outerloop =
! (tree_int_cst_compare (outerloop_step, size_zero_node) == 0);
! loop_for_initial_load = (invariant_in_outerloop ? loop : loop->inner);
! }
! else
! loop_for_initial_load = loop;
! if (at_loop)
! *at_loop = loop_for_initial_load;
! /* 3. For the case of the optimized realignment, create the first vector
! load at the loop preheader. */
!
! if (alignment_support_scheme == dr_explicit_realign_optimized)
! {
! /* Create msq_init = *(floor(p1)) in the loop preheader */
!
! gcc_assert (!compute_in_loop);
! pe = loop_preheader_edge (loop_for_initial_load);
! vec_dest = vect_create_destination_var (scalar_dest, vectype);
! ptr = vect_create_data_ref_ptr (stmt, loop_for_initial_load, NULL_TREE,
! &init_addr, &inc, true, NULL_TREE, &inv_p);
! data_ref = build1 (ALIGN_INDIRECT_REF, vectype, ptr);
! new_stmt = build_gimple_modify_stmt (vec_dest, data_ref);
! new_temp = make_ssa_name (vec_dest, new_stmt);
! GIMPLE_STMT_OPERAND (new_stmt, 0) = new_temp;
! new_bb = bsi_insert_on_edge_immediate (pe, new_stmt);
! gcc_assert (!new_bb);
! msq_init = GIMPLE_STMT_OPERAND (new_stmt, 0);
! copy_virtual_operands (new_stmt, stmt);
! update_vuses_to_preheader (new_stmt, loop_for_initial_load);
! }
!
!
! /* 4. Create realignment token using a target builtin, if available.
! It is done either inside the containing loop, or before LOOP (as
! determined above). */
if (targetm.vectorize.builtin_mask_for_load)
{
tree builtin_decl;
+ /* Compute INIT_ADDR - the initial addressed accessed by this memref. */
+ if (compute_in_loop)
+ gcc_assert (init_addr); /* already computed by the caller. */
+ else
+ {
+ /* Generate the INIT_ADDR computation outside LOOP. */
+ init_addr = vect_create_addr_base_for_vector_ref (stmt, &stmts,
+ NULL_TREE, loop);
+ pe = loop_preheader_edge (loop);
+ new_bb = bsi_insert_on_edge_immediate (pe, stmts);
+ gcc_assert (!new_bb);
+ }
+
builtin_decl = targetm.vectorize.builtin_mask_for_load ();
new_stmt = build_call_expr (builtin_decl, 1, init_addr);
vec_dest = vect_create_destination_var (scalar_dest,
*************** vect_setup_realignment (tree stmt, block
*** 4219,4226 ****
new_stmt = build_gimple_modify_stmt (vec_dest, new_stmt);
new_temp = make_ssa_name (vec_dest, new_stmt);
GIMPLE_STMT_OPERAND (new_stmt, 0) = new_temp;
! new_bb = bsi_insert_on_edge_immediate (pe, new_stmt);
! gcc_assert (!new_bb);
*realignment_token = GIMPLE_STMT_OPERAND (new_stmt, 0);
/* The result of the CALL_EXPR to this builtin is determined from
--- 4735,4751 ----
new_stmt = build_gimple_modify_stmt (vec_dest, new_stmt);
new_temp = make_ssa_name (vec_dest, new_stmt);
GIMPLE_STMT_OPERAND (new_stmt, 0) = new_temp;
!
! if (compute_in_loop)
! bsi_insert_before (bsi, new_stmt, BSI_SAME_STMT);
! else
! {
! /* Generate the misalignment computation outside LOOP. */
! pe = loop_preheader_edge (loop);
! new_bb = bsi_insert_on_edge_immediate (pe, new_stmt);
! gcc_assert (!new_bb);
! }
!
*realignment_token = GIMPLE_STMT_OPERAND (new_stmt, 0);
/* The result of the CALL_EXPR to this builtin is determined from
*************** vect_setup_realignment (tree stmt, block
*** 4231,4242 ****
gcc_assert (TREE_READONLY (builtin_decl));
}
! /* 3. Create msq = phi <msq_init, lsq> in loop */
vec_dest = vect_create_destination_var (scalar_dest, vectype);
msq = make_ssa_name (vec_dest, NULL_TREE);
! phi_stmt = create_phi_node (msq, loop->header);
SSA_NAME_DEF_STMT (msq) = phi_stmt;
! add_phi_arg (phi_stmt, msq_init, loop_preheader_edge (loop));
return msq;
}
--- 4756,4776 ----
gcc_assert (TREE_READONLY (builtin_decl));
}
! if (alignment_support_scheme == dr_explicit_realign)
! return msq;
!
! gcc_assert (!compute_in_loop);
! gcc_assert (alignment_support_scheme == dr_explicit_realign_optimized);
!
!
! /* 5. Create msq = phi <msq_init, lsq> in loop */
!
! pe = loop_preheader_edge (containing_loop);
vec_dest = vect_create_destination_var (scalar_dest, vectype);
msq = make_ssa_name (vec_dest, NULL_TREE);
! phi_stmt = create_phi_node (msq, containing_loop->header);
SSA_NAME_DEF_STMT (msq) = phi_stmt;
! add_phi_arg (phi_stmt, msq_init, pe);
return msq;
}
*************** vectorizable_load (tree stmt, block_stmt
*** 4526,4538 ****
stmt_vec_info prev_stmt_info;
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info), *first_dr;
tree vectype = STMT_VINFO_VECTYPE (stmt_info);
tree new_temp;
int mode;
tree new_stmt = NULL_TREE;
tree dummy;
! enum dr_alignment_support alignment_support_cheme;
tree dataref_ptr = NULL_TREE;
tree ptr_incr;
int nunits = TYPE_VECTOR_SUBPARTS (vectype);
--- 5060,5074 ----
stmt_vec_info prev_stmt_info;
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+ struct loop *containing_loop = (bb_for_stmt (stmt))->loop_father;
+ bool nested_in_vect_loop = nested_in_vect_loop_p (loop, stmt);
struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info), *first_dr;
tree vectype = STMT_VINFO_VECTYPE (stmt_info);
tree new_temp;
int mode;
tree new_stmt = NULL_TREE;
tree dummy;
! enum dr_alignment_support alignment_support_scheme;
tree dataref_ptr = NULL_TREE;
tree ptr_incr;
int nunits = TYPE_VECTOR_SUBPARTS (vectype);
*************** vectorizable_load (tree stmt, block_stmt
*** 4541,4550 ****
tree msq = NULL_TREE, lsq;
tree offset = NULL_TREE;
tree realignment_token = NULL_TREE;
! tree phi_stmt = NULL_TREE;
VEC(tree,heap) *dr_chain = NULL;
bool strided_load = false;
tree first_stmt;
if (!STMT_VINFO_RELEVANT_P (stmt_info))
return false;
--- 5077,5100 ----
tree msq = NULL_TREE, lsq;
tree offset = NULL_TREE;
tree realignment_token = NULL_TREE;
! tree phi = NULL_TREE;
VEC(tree,heap) *dr_chain = NULL;
bool strided_load = false;
tree first_stmt;
+ tree scalar_type;
+ bool inv_p;
+ bool compute_in_loop = false;
+ struct loop *at_loop;
+
+ gcc_assert (ncopies >= 1);
+
+ /* FORNOW. This restriction should be relaxed. */
+ if (nested_in_vect_loop && ncopies > 1)
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "multiple types in nested loop.");
+ return false;
+ }
if (!STMT_VINFO_RELEVANT_P (stmt_info))
return false;
*************** vectorizable_load (tree stmt, block_stmt
*** 4577,4582 ****
--- 5127,5133 ----
if (!STMT_VINFO_DATA_REF (stmt_info))
return false;
+ scalar_type = TREE_TYPE (DR_REF (dr));
mode = (int) TYPE_MODE (vectype);
/* FORNOW. In some cases can vectorize even if data-type not supported
*************** vectorizable_load (tree stmt, block_stmt
*** 4592,4597 ****
--- 5143,5150 ----
if (DR_GROUP_FIRST_DR (stmt_info))
{
strided_load = true;
+ /* FORNOW */
+ gcc_assert (! nested_in_vect_loop);
/* Check if interleaving is supported. */
if (!vect_strided_load_supported (vectype))
*************** vectorizable_load (tree stmt, block_stmt
*** 4630,4638 ****
group_size = 1;
}
! alignment_support_cheme = vect_supportable_dr_alignment (first_dr);
! gcc_assert (alignment_support_cheme);
!
/* In case the vectorization factor (VF) is bigger than the number
of elements that we can fit in a vectype (nunits), we have to generate
--- 5183,5190 ----
group_size = 1;
}
! alignment_support_scheme = vect_supportable_dr_alignment (first_dr);
! gcc_assert (alignment_support_scheme);
/* In case the vectorization factor (VF) is bigger than the number
of elements that we can fit in a vectype (nunits), we have to generate
*************** vectorizable_load (tree stmt, block_stmt
*** 4714,4720 ****
}
Otherwise, the data reference is potentially unaligned on a target that
! does not support unaligned accesses (dr_unaligned_software_pipeline) -
then generate the following code, in which the data in each iteration is
obtained by two vector loads, one from the previous iteration, and one
from the current iteration:
--- 5266,5272 ----
}
Otherwise, the data reference is potentially unaligned on a target that
! does not support unaligned accesses (dr_explicit_realign_optimized) -
then generate the following code, in which the data in each iteration is
obtained by two vector loads, one from the previous iteration, and one
from the current iteration:
*************** vectorizable_load (tree stmt, block_stmt
*** 4731,4757 ****
msq = lsq;
} */
! if (alignment_support_cheme == dr_unaligned_software_pipeline)
! {
! msq = vect_setup_realignment (first_stmt, bsi, &realignment_token);
! phi_stmt = SSA_NAME_DEF_STMT (msq);
! offset = size_int (TYPE_VECTOR_SUBPARTS (vectype) - 1);
}
prev_stmt_info = NULL;
for (j = 0; j < ncopies; j++)
{
/* 1. Create the vector pointer update chain. */
if (j == 0)
! dataref_ptr = vect_create_data_ref_ptr (first_stmt, bsi, offset, &dummy,
! &ptr_incr, false, NULL_TREE);
else
! dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, bsi, stmt);
for (i = 0; i < group_size; i++)
{
/* 2. Create the vector-load in the loop. */
! switch (alignment_support_cheme)
{
case dr_aligned:
gcc_assert (aligned_access_p (first_dr));
--- 5283,5334 ----
msq = lsq;
} */
! /* If the misalignment remains the same throughout the execution of the
! loop, we can create the init_addr and permutation mask at the loop
! preheader. Otherwise, it needs to be created inside the loop.
! This can only occur when vectorizing memory accesses in the inner-loop
! nested within an outer-loop that is being vectorized. */
!
! if (nested_in_vect_loop_p (loop, stmt)
! && (TREE_INT_CST_LOW (DR_STEP (dr)) % UNITS_PER_SIMD_WORD != 0))
! {
! gcc_assert (alignment_support_scheme != dr_explicit_realign_optimized);
! compute_in_loop = true;
! }
!
! if ((alignment_support_scheme == dr_explicit_realign_optimized
! || alignment_support_scheme == dr_explicit_realign)
! && !compute_in_loop)
! {
! msq = vect_setup_realignment (first_stmt, bsi, &realignment_token,
! alignment_support_scheme, NULL_TREE,
! &at_loop);
! if (alignment_support_scheme == dr_explicit_realign_optimized)
! {
! phi = SSA_NAME_DEF_STMT (msq);
! offset = size_int (TYPE_VECTOR_SUBPARTS (vectype) - 1);
! }
}
+ else
+ at_loop = loop;
prev_stmt_info = NULL;
for (j = 0; j < ncopies; j++)
{
/* 1. Create the vector pointer update chain. */
if (j == 0)
! dataref_ptr = vect_create_data_ref_ptr (first_stmt,
! at_loop, offset,
! &dummy, &ptr_incr, false,
! NULL_TREE, &inv_p);
else
! dataref_ptr =
! bump_vector_ptr (dataref_ptr, ptr_incr, bsi, stmt, NULL_TREE);
for (i = 0; i < group_size; i++)
{
/* 2. Create the vector-load in the loop. */
! switch (alignment_support_scheme)
{
case dr_aligned:
gcc_assert (aligned_access_p (first_dr));
*************** vectorizable_load (tree stmt, block_stmt
*** 4762,4775 ****
int mis = DR_MISALIGNMENT (first_dr);
tree tmis = (mis == -1 ? size_zero_node : size_int (mis));
- gcc_assert (!aligned_access_p (first_dr));
tmis = size_binop (MULT_EXPR, tmis, size_int(BITS_PER_UNIT));
data_ref =
build2 (MISALIGNED_INDIRECT_REF, vectype, dataref_ptr, tmis);
break;
}
! case dr_unaligned_software_pipeline:
! gcc_assert (!aligned_access_p (first_dr));
data_ref = build1 (ALIGN_INDIRECT_REF, vectype, dataref_ptr);
break;
default:
--- 5339,5377 ----
int mis = DR_MISALIGNMENT (first_dr);
tree tmis = (mis == -1 ? size_zero_node : size_int (mis));
tmis = size_binop (MULT_EXPR, tmis, size_int(BITS_PER_UNIT));
data_ref =
build2 (MISALIGNED_INDIRECT_REF, vectype, dataref_ptr, tmis);
break;
}
! case dr_explicit_realign:
! {
! tree ptr, bump;
! tree vs_minus_1 = size_int (TYPE_VECTOR_SUBPARTS (vectype) - 1);
!
! if (compute_in_loop)
! msq = vect_setup_realignment (first_stmt, bsi,
! &realignment_token,
! dr_explicit_realign,
! dataref_ptr, NULL);
!
! data_ref = build1 (ALIGN_INDIRECT_REF, vectype, dataref_ptr);
! vec_dest = vect_create_destination_var (scalar_dest, vectype);
! new_stmt = build_gimple_modify_stmt (vec_dest, data_ref);
! new_temp = make_ssa_name (vec_dest, new_stmt);
! GIMPLE_STMT_OPERAND (new_stmt, 0) = new_temp;
! vect_finish_stmt_generation (stmt, new_stmt, bsi);
! copy_virtual_operands (new_stmt, stmt);
! mark_symbols_for_renaming (new_stmt);
! msq = new_temp;
!
! bump = size_binop (MULT_EXPR, vs_minus_1,
! TYPE_SIZE_UNIT (scalar_type));
! ptr = bump_vector_ptr (dataref_ptr, NULL_TREE, bsi, stmt, bump);
! data_ref = build1 (ALIGN_INDIRECT_REF, vectype, ptr);
! break;
! }
! case dr_explicit_realign_optimized:
data_ref = build1 (ALIGN_INDIRECT_REF, vectype, dataref_ptr);
break;
default:
*************** vectorizable_load (tree stmt, block_stmt
*** 4783,4811 ****
copy_virtual_operands (new_stmt, stmt);
mark_symbols_for_renaming (new_stmt);
! /* 3. Handle explicit realignment if necessary/supported. */
! if (alignment_support_cheme == dr_unaligned_software_pipeline)
{
- /* Create in loop:
- <vec_dest = realign_load (msq, lsq, realignment_token)> */
lsq = GIMPLE_STMT_OPERAND (new_stmt, 0);
if (!realignment_token)
realignment_token = dataref_ptr;
vec_dest = vect_create_destination_var (scalar_dest, vectype);
! new_stmt =
! build3 (REALIGN_LOAD_EXPR, vectype, msq, lsq, realignment_token);
new_stmt = build_gimple_modify_stmt (vec_dest, new_stmt);
new_temp = make_ssa_name (vec_dest, new_stmt);
GIMPLE_STMT_OPERAND (new_stmt, 0) = new_temp;
vect_finish_stmt_generation (stmt, new_stmt, bsi);
! if (i == group_size - 1 && j == ncopies - 1)
! add_phi_arg (phi_stmt, lsq, loop_latch_edge (loop));
! msq = lsq;
}
if (strided_load)
VEC_quick_push (tree, dr_chain, new_temp);
if (i < group_size - 1)
! dataref_ptr = bump_vector_ptr (dataref_ptr, ptr_incr, bsi, stmt);
}
if (strided_load)
--- 5385,5454 ----
copy_virtual_operands (new_stmt, stmt);
mark_symbols_for_renaming (new_stmt);
! /* 3. Handle explicit realignment if necessary/supported. Create in
! loop: vec_dest = realign_load (msq, lsq, realignment_token) */
! if (alignment_support_scheme == dr_explicit_realign_optimized
! || alignment_support_scheme == dr_explicit_realign)
{
lsq = GIMPLE_STMT_OPERAND (new_stmt, 0);
if (!realignment_token)
realignment_token = dataref_ptr;
vec_dest = vect_create_destination_var (scalar_dest, vectype);
! new_stmt = build3 (REALIGN_LOAD_EXPR, vectype, msq, lsq,
! realignment_token);
new_stmt = build_gimple_modify_stmt (vec_dest, new_stmt);
new_temp = make_ssa_name (vec_dest, new_stmt);
GIMPLE_STMT_OPERAND (new_stmt, 0) = new_temp;
vect_finish_stmt_generation (stmt, new_stmt, bsi);
!
! if (alignment_support_scheme == dr_explicit_realign_optimized)
! {
! if (i == group_size - 1 && j == ncopies - 1)
! add_phi_arg (phi, lsq, loop_latch_edge (containing_loop));
! msq = lsq;
! }
}
+
+ /* 4. Handle invariant-load. */
+ if (inv_p)
+ {
+ gcc_assert (!strided_load);
+ gcc_assert (nested_in_vect_loop_p (loop, stmt));
+ if (j == 0)
+ {
+ int k;
+ tree t = NULL_TREE;
+ tree vec_inv, bitpos, bitsize = TYPE_SIZE (scalar_type);
+
+ /* CHECKME: bitpos depends on endianess? */
+ bitpos = bitsize_zero_node;
+ vec_inv = build3 (BIT_FIELD_REF, scalar_type, new_temp,
+ bitsize, bitpos);
+ BIT_FIELD_REF_UNSIGNED (vec_inv) =
+ TYPE_UNSIGNED (scalar_type);
+ vec_dest =
+ vect_create_destination_var (scalar_dest, NULL_TREE);
+ new_stmt = build_gimple_modify_stmt (vec_dest, vec_inv);
+ new_temp = make_ssa_name (vec_dest, new_stmt);
+ GIMPLE_STMT_OPERAND (new_stmt, 0) = new_temp;
+ vect_finish_stmt_generation (stmt, new_stmt, bsi);
+
+ for (k = nunits - 1; k >= 0; --k)
+ t = tree_cons (NULL_TREE, new_temp, t);
+ /* FIXME: use build_constructor directly. */
+ vec_inv = build_constructor_from_list (vectype, t);
+ new_temp = vect_init_vector (stmt, vec_inv, vectype, bsi);
+ new_stmt = SSA_NAME_DEF_STMT (new_temp);
+ }
+ else
+ gcc_unreachable (); /* FORNOW; FIXME. */
+ }
+
if (strided_load)
VEC_quick_push (tree, dr_chain, new_temp);
if (i < group_size - 1)
! dataref_ptr =
! bump_vector_ptr (dataref_ptr, ptr_incr, bsi, stmt, NULL_TREE);
}
if (strided_load)
*************** vectorizable_live_operation (tree stmt,
*** 4842,4847 ****
--- 5485,5491 ----
tree operation;
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
int i;
int op_type;
tree op;
*************** vectorizable_live_operation (tree stmt,
*** 4859,4864 ****
--- 5503,5512 ----
if (TREE_CODE (GIMPLE_STMT_OPERAND (stmt, 0)) != SSA_NAME)
return false;
+ /* FORNOW. CHECKME. */
+ if (nested_in_vect_loop_p (loop, stmt))
+ return false;
+
operation = GIMPLE_STMT_OPERAND (stmt, 1);
op_type = TREE_OPERAND_LENGTH (operation);
*************** vect_gen_niters_for_prolog_loop (loop_ve
*** 5643,5650 ****
else
{
tree new_stmts = NULL_TREE;
! tree start_addr =
! vect_create_addr_base_for_vector_ref (dr_stmt, &new_stmts, NULL_TREE);
tree ptr_type = TREE_TYPE (start_addr);
tree size = TYPE_SIZE (ptr_type);
tree type = lang_hooks.types.type_for_size (tree_low_cst (size, 1), 1);
--- 6291,6298 ----
else
{
tree new_stmts = NULL_TREE;
! tree start_addr = vect_create_addr_base_for_vector_ref (dr_stmt,
! &new_stmts, NULL_TREE, loop);
tree ptr_type = TREE_TYPE (start_addr);
tree size = TYPE_SIZE (ptr_type);
tree type = lang_hooks.types.type_for_size (tree_low_cst (size, 1), 1);
*************** static tree
*** 5817,5822 ****
--- 6465,6471 ----
vect_create_cond_for_align_checks (loop_vec_info loop_vinfo,
tree *cond_expr_stmt_list)
{
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
VEC(tree,heap) *may_misalign_stmts
= LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo);
tree ref_stmt, tmp;
*************** vect_create_cond_for_align_checks (loop_
*** 5852,5859 ****
/* create: addr_tmp = (int)(address_of_first_vector) */
addr_base = vect_create_addr_base_for_vector_ref (ref_stmt,
! &new_stmt_list,
! NULL_TREE);
if (new_stmt_list != NULL_TREE)
append_to_statement_list_force (new_stmt_list, cond_expr_stmt_list);
--- 6501,6507 ----
/* create: addr_tmp = (int)(address_of_first_vector) */
addr_base = vect_create_addr_base_for_vector_ref (ref_stmt,
! &new_stmt_list, NULL_TREE, loop);
if (new_stmt_list != NULL_TREE)
append_to_statement_list_force (new_stmt_list, cond_expr_stmt_list);
*************** vect_transform_loop (loop_vec_info loop_
*** 6067,6074 ****
fprintf (vect_dump, "------>vectorizing statement: ");
print_generic_expr (vect_dump, stmt, TDF_SLIM);
}
stmt_info = vinfo_for_stmt (stmt);
! gcc_assert (stmt_info);
if (!STMT_VINFO_RELEVANT_P (stmt_info)
&& !STMT_VINFO_LIVE_P (stmt_info))
{
--- 6715,6732 ----
fprintf (vect_dump, "------>vectorizing statement: ");
print_generic_expr (vect_dump, stmt, TDF_SLIM);
}
+
stmt_info = vinfo_for_stmt (stmt);
!
! /* vector stmts created in the outer-loop during vectorization of
! stmts in an inner-loop may not have a stmt_info, and do not
! need to be vectorized. */
! if (!stmt_info)
! {
! bsi_next (&si);
! continue;
! }
!
if (!STMT_VINFO_RELEVANT_P (stmt_info)
&& !STMT_VINFO_LIVE_P (stmt_info))
{
*************** vect_transform_loop (loop_vec_info loop_
*** 6140,6143 ****
--- 6798,6803 ----
if (vect_print_dump_info (REPORT_VECTORIZED_LOOPS))
fprintf (vect_dump, "LOOP VECTORIZED.");
+ if (loop->inner && vect_print_dump_info (REPORT_VECTORIZED_LOOPS))
+ fprintf (vect_dump, "OUTER LOOP VECTORIZED.");
}
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2007-08-07 20:58 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-08-07 20:58 [patch] [4.3 projects] outer-loop vectorization Dorit Nuzman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).