* [patch] [4.3 projects] outer-loop vectorization patch 1/n
@ 2007-08-08 21:23 Dorit Nuzman
2007-08-09 11:58 ` Dorit Nuzman
2007-08-12 15:02 ` Dorit Nuzman
0 siblings, 2 replies; 3+ messages in thread
From: Dorit Nuzman @ 2007-08-08 21:23 UTC (permalink / raw)
To: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 11602 bytes --]
Hi,
This patch is the first part of
http://gcc.gnu.org/ml/gcc-patches/2007-08/msg00461.html. It adds initial
support for outer-loop vectorization. It basicaly brings over this patch:
http://gcc.gnu.org/ml/gcc-patches/2007-04/msg00044.html, along with some
fixes that went in later.
This patch can vectorize outer-loops only if there are no memory-references
in the inner-loop.
The patch includes the following changes to the vectorizer:
1) So far we supported single-BB loops (+empty latch), so the order by
which we traversed the loop BBs did not matter. Now, it does - we sort in
BBs in dfs order (since we don't allow if's in the loop, this should
guarantee visiting defs before their uses).
2) vect_analyze_loop_form was extend to allow a restricted form of
outer-loops. We currently support doubly-nested loops that consist of a
header, a single inner(most)-loop, a tail, and an empty latch (5 BBs all
together).
3) vect_analyze_loop_form calls a new function - vect_analyze_loop_1 - to
do a few analyses on the inner-loop (currently only one analysis:
analyze_loop_form), and to build a loop_info for the inner-loop. It is
destroyed soon after, but w/o destroying the stmt_info's that were set up
for the inner-loop stmts. Maybe later we'll keep the inner-loop_info
around, if needed.
4) Support for outer-loops breaks the assumption that phi nodes are only in
the loop-header, and represent a scalar-cycle (induction or reduction). In
outer-loops we also have phi-nodes inside the loop - these are the
loop-closed phis after the inner-loop. This required a way to distinguish
between these two kinds of phis (we use 'is_loop_header_bb_p' for that),
and a few small changes in several places:
o new_stmt_vec_info: different def-type initialization for the two kinds of
phis
o vect_is_simple_reduction: the uses that are not the reduction-variable
can now be defined by a phi, though not a loop-header phi.
o vect_recog_dot_prod_pattern: a vect_loop_def might be a phi, and not
necessarily a gimple_modify_stmt.
o vect_get_vec_def_for_oprnd: a vect_loop_def can be a phi node, and not
necessarily a gimple_modify_stmt.
5) the enum "relevant" has two new values -
vect_used_in_outer[_by_reduction], which are propagated during the
mark_relevant pass.
6) since we don't yet support multiple-data-types in the inner-loop, we
check in all relevant places, that this is not the case.
The more significant changes are to vectorization of reduction and
induction. In both cases we need to be aware of whether the
induction/reduction-phi that we are vectorizing is in the same nest that is
being vectorized, or is 'nested_in_vect_loop' (is inside the inner-loop
while vectorizing the outer-loop):
7) vectorization of induction: In get_initial_def_for_induction, if this is
a 'nested_in_vect_loop' case, then:
o the initialization vector can be obtained using
vect_get_vec_def_for_operand (does not need to be built from scratch).
o the vector that holds the step of the vectorized induction is {S,S,S,S}
rather than {VF*S,VF*S,VF*S,VF*S} (where S is the step of the induction),
because in the vectorized inner-loop we are advancing sequentially (though
in parallel for VF outer-loop iterations).
o the final vector for inductions is recorded in the corresponding
loop-exit phi (of the inner-loop) so that we can easily obtain it when we
vectorize stmts in the outer-loop that use it.
8) vectorization of reduction: The main thing here is that we don't need to
reduce the reduction to a single result; the final vector of partial
results will feed the vector operations that may use it in the outer-loop.
So:
o In get_initial_def_for_reduction, we may return a vector for the epilog
adjustment, rather than a scalar.
o epilog_for_reduction - skip the part that computes the final scalar
result in case this is a 'nested_in_vect_loop' case.
o and in vectorizable_reduction, we don't check that the reduction is
LIVE_P anymore (used out of the loop), cause it may be not used outside the
(outer) loop, but used inside the outer-loop (so as far as the inner-loop
reduction is concerned, it is used_in_outer_loop, but not live).
Bootstrpped on powerpc64-linux,
bootstrapped with vectorization enabled on i386-linux,
passed full regression testing on both platforms.
I will wait at least a week to give people a chance to review and comment.
thanks,
dorit
ChangeLog:
* tree-vectorizer.h (vect_is_simple_reduction): Takes a
loop_vec_info
as argument instead of struct loop.
(nested_in_vect_loop_p): New function.
(vect_relevant): Add enum values vect_used_in_outer_by_reduction
and
vect_used_in_outer.
(is_loop_header_bb_p): New. Used to differentiate loop-header phis
from other phis in the loop.
(destroy_loop_vec_info): Add additional argument to declaration.
* tree-vectorizer.c (supportable_widening_operation): Also check if
nested_in_vect_loop_p (don't allow changing the order in this
case).
(vect_is_simple_reduction): Takes a loop_vec_info as argument
instead
of struct loop. Call nested_in_vect_loop_p and don't require
flag_unsafe_math_optimizations if it returns true.
* tree-vectorizer.c (new_stmt_vec_info): When setting def_type for
phis differentiate loop-header phis from other phis.
(bb_in_loop_p): New function.
(new_loop_vec_info): Inner-loop phis already have a stmt_vinfo, so
just
update their loop_vinfo. Order of BB traversal now matters - call
dfs_enumerate_from with bb_in_loop_p.
(destroy_loop_vec_info): Takes additional argument to control
whether
stmt_vinfo of the loop stmts should be destroyed as well.
(vect_is_simple_reduction): Allow the "non-reduction" use of a
reduction stmt to be defines by a non loop-header phi.
(vectorize_loops): Call destroy_loop_vec_info with additional
argument.
* tree-vect-transform.c (vectorizable_reduction): Call
nested_in_vect_loop_p. Check for multitypes in the inner-loop.
(vectorizable_call): Likewise.
(vectorizable_conversion): Likewise.
(vectorizable_operation): Likewise.
(vectorizable_type_promotion): Likewise.
(vectorizable_type_demotion): Likewise.
(vectorizable_store): Likewise.
(vectorizable_live_operation): Likewise.
(vectorizable_reduction): Likewise. Also pass loop_info to
vect_is_simple_reduction instead of loop.
(vect_init_vector): Call nested_in_vect_loop_p.
(get_initial_def_for_reduction): Likewise.
(vect_create_epilog_for_reduction): Likewise.
(vect_init_vector): Check which loop to work with, in case there's
an
inner-loop.
(get_initial_def_for_inducion): Extend to handle outer-loop
vectorization. Fix indentation.
(vect_get_vec_def_for_operand): Support phis in the case
vect_loop_def.
In the case vect_induction_def get the vector def from the
induction
phi node, instead of calling get_initial_def_for_inducion.
(get_initial_def_for_reduction): Extend to handle outer-loop
vectorization.
(vect_create_epilog_for_reduction): Extend to handle outer-loop
vectorization.
(vect_transform_loop): Change assert to just skip this case. Add a
dump printout.
(vect_finish_stmt_generation): Add a couple asserts.
(vect_estimate_min_profitable_iters): Multiply
cost of inner-loop stmts (in outer-loop vectorization) by estimated
inner-loop bound.
(vect_model_reduction_cost): Don't add reduction epilogue cost in
case
this is an inner-loop reduction in outer-loop vectorization.
* tree-vect-analyze.c (vect_analyze_scalar_cycles_1): New function.
Same code as what used to be vect_analyze_scalar_cycles, only with
additional argument loop, and loop_info passed to
vect_is_simple_reduction instead of loop.
(vect_analyze_scalar_cycles): Code factored out into
vect_analyze_scalar_cycles_1. Call it for each relevant loop-nest.
Updated documentation.
(analyze_operations): Check for inner-loop loop-closed exit-phis
during
outer-loop vectorization that are live or not used in the
outerloop,
cause this requires special handling.
(vect_enhance_data_refs_alignment): Don't consider versioning for
nested-loops.
(vect_analyze_data_refs): Check that there are no datarefs in the
inner-loop.
(vect_mark_stmts_to_be_vectorized): Also consider
vect_used_in_outer
and vect_used_in_outer_by_reduction cases.
(process_use): Also consider the case of outer-loop stmt defining
an
inner-loop stmt and vice versa.
(vect_analyze_loop_1): New function.
(vect_analyze_loop_form): Extend, to allow a restricted form of
nested
loops. Call vect_analyze_loop_1.
(vect_analyze_loop): Skip (inner-)loops within outer-loops that
have
been vectorized. Call destroy_loop_vec_info with additional
argument.
* tree-vect-patterns.c (vect_recog_widen_sum_pattern): Don't allow
in the inner-loop when doing outer-loop vectorization. Add
documentation and printout.
(vect_recog_dot_prod_pattern): Likewise. Also add check for
GIMPLE_MODIFY_STMT (in case we encounter a phi in the loop).
testsuite/ChangeLog:
* gcc.dg/vect/vect.exp: Compile tests with -fno-tree-scev-cprop
and -fno-tree-reassoc.
* gcc.dg/vect/no-tree-scev-cprop-vect-iv-1.c: Moved to...
* gcc.dg/vect/no-scevccp-vect-iv-1.c: New test.
* gcc.dg/vect/no-tree-scev-cprop-vect-iv-2.c: Moved to...
* gcc.dg/vect/no-scevccp-vect-iv-2.c: New test.
* gcc.dg/vect/no-scevccp-noreassoc-outer-1.c: New test.
* gcc.dg/vect/no-scevccp-noreassoc-outer-2.c: New test.
* gcc.dg/vect/no-scevccp-noreassoc-outer-3.c: New test.
* gcc.dg/vect/no-scevccp-noreassoc-outer-4.c: New test.
* gcc.dg/vect/no-scevccp-noreassoc-outer-5.c: New test.
* gcc.dg/vect/no-scevccp-outer-1.c: New test.
* gcc.dg/vect/no-scevccp-outer-2.c: New test.
* gcc.dg/vect/no-scevccp-outer-3.c: New test.
* gcc.dg/vect/no-scevccp-outer-4.c: New test.
* gcc.dg/vect/no-scevccp-outer-5.c: New test.
* gcc.dg/vect/no-scevccp-outer-6.c: New test.
* gcc.dg/vect/no-scevccp-outer-7.c: New test.
* gcc.dg/vect/no-scevccp-outer-8.c: New test.
* gcc.dg/vect/no-scevccp-outer-9.c: New test.
* gcc.dg/vect/no-scevccp-outer-9a.c: New test.
* gcc.dg/vect/no-scevccp-outer-9b.c: New test.
* gcc.dg/vect/no-scevccp-outer-10.c: New test.
* gcc.dg/vect/no-scevccp-outer-10a.c: New test.
* gcc.dg/vect/no-scevccp-outer-10b.c: New test.
* gcc.dg/vect/no-scevccp-outer-11.c: New test.
* gcc.dg/vect/no-scevccp-outer-12.c: New test.
* gcc.dg/vect/no-scevccp-outer-13.c: New test.
* gcc.dg/vect/no-scevccp-outer-14.c: New test.
* gcc.dg/vect/no-scevccp-outer-15.c: New test.
* gcc.dg/vect/no-scevccp-outer-16.c: New test.
* gcc.dg/vect/no-scevccp-outer-17.c: New test.
* gcc.dg/vect/no-scevccp-outer-18.c: New test.
* gcc.dg/vect/no-scevccp-outer-19.c: New test.
* gcc.dg/vect/no-scevccp-outer-20.c: New test.
* gcc.dg/vect/no-scevccp-outer-21.c: New test.
* gcc.dg/vect/no-scevccp-outer-22.c: New test.
(See attached file: mainlineouterloopdiff1t.txt)
[-- Attachment #2: mainlineouterloopdiff1t.txt --]
[-- Type: text/plain, Size: 130526 bytes --]
Index: testsuite/gcc.dg/vect/vect-widen-mult-sum.c
===================================================================
*** testsuite/gcc.dg/vect/vect-widen-mult-sum.c (revision 127202)
--- testsuite/gcc.dg/vect/vect-widen-mult-sum.c (working copy)
*************** int main (void)
*** 42,45 ****
--- 42,46 ----
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_widen_mult_hi_to_si } } } */
+ /* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 1 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-7.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-7.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-7.c (revision 0)
***************
*** 0 ****
--- 1,75 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 16
+
+ unsigned short in[N];
+ unsigned short coef[N];
+ unsigned short a[N];
+
+ unsigned int
+ foo (short scale){
+ int i;
+ unsigned short j;
+ unsigned int sum = 0;
+ unsigned short sum_j;
+
+ for (i = 0; i < N; i++) {
+ sum_j = 0;
+ for (j = 0; j < N; j++) {
+ sum_j += j;
+ }
+ a[i] = sum_j;
+ sum += ((unsigned int) in[i] * (unsigned int) coef[i]) >> scale;
+ }
+ return sum;
+ }
+
+ unsigned short
+ bar (void)
+ {
+ unsigned short j;
+ unsigned short sum_j;
+
+ sum_j = 0;
+ for (j = 0; j < N; j++) {
+ sum_j += j;
+ }
+
+ return sum_j;
+ }
+
+ int main (void)
+ {
+ int i;
+ unsigned short j, sum_j;
+ unsigned int sum = 0;
+ unsigned int res;
+
+ check_vect ();
+
+ for (i=0; i<N; i++){
+ in[i] = 2*i;
+ coef[i] = i;
+ }
+
+ res = foo (2);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ if (a[i] != bar ())
+ abort ();
+ sum += ((unsigned int) in[i] * (unsigned int) coef[i]) >> 2;
+ }
+ if (res != sum)
+ abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target vect_widen_mult_hi_to_si } } } */
+ /* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-10.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-10.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-10.c (revision 0)
***************
*** 0 ****
--- 1,54 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+ int b[N];
+
+ int
+ foo (int n){
+ int i,j;
+ int sum,x,y;
+
+ for (i = 0; i < N/2; i++) {
+ sum = 0;
+ x = b[2*i];
+ y = b[2*i+1];
+ for (j = 0; j < n; j++) {
+ sum += j;
+ }
+ a[2*i] = sum + x;
+ a[2*i+1] = sum + y;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ b[i] = i;
+
+ foo (N-1);
+
+ /* check results: */
+ for (i=0; i<N/2; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N-1; j++)
+ sum += j;
+ if (a[2*i] != sum + b[2*i] || a[2*i+1] != sum + b[2*i+1])
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-10a.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-10a.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-10a.c (revision 0)
***************
*** 0 ****
--- 1,58 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+ int b[N];
+
+ int
+ foo (int n){
+ int i,j;
+ int sum,x,y;
+
+ if (n<=0)
+ return 0;
+
+ for (i = 0; i < N/2; i++) {
+ sum = 0;
+ x = b[2*i];
+ y = b[2*i+1];
+ j = 0;
+ do {
+ sum += j;
+ } while (++j < n);
+ a[2*i] = sum + x;
+ a[2*i+1] = sum + y;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ b[i] = i;
+
+ foo (N-1);
+
+ /* check results: */
+ for (i=0; i<N/2; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N-1; j++)
+ sum += j;
+ if (a[2*i] != sum + b[2*i] || a[2*i+1] != sum + b[2*i+1])
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-18.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-18.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-18.c (revision 0)
***************
*** 0 ****
--- 1,51 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (){
+ int i,j;
+ int sum;
+
+ for (i = 0; i < N/2; i++) {
+ sum = 0;
+ for (j = 0; j < N; j++) {
+ sum += j;
+ }
+ a[2*i] = sum;
+ a[2*i+1] = 2*sum;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ foo ();
+
+ /* check results: */
+ for (i=0; i<N/2; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += j;
+ if (a[2*i] != sum || a[2*i+1] != 2*sum)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-8.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-8.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-8.c (revision 0)
***************
*** 0 ****
--- 1,50 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+
+ int
+ foo (int *a){
+ int i,j;
+ int sum;
+
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ for (j = 0; j < N; j++) {
+ sum += j;
+ }
+ a[i] = sum;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+ int a[N];
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ foo (a);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += j;
+ if (a[i] != sum)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-11.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-11.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-11.c (revision 0)
***************
*** 0 ****
--- 1,50 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (int n){
+ int i,j;
+ int sum;
+
+ for (i = 0; i < n; i++) {
+ sum = 0;
+ for (j = 0; j < N; j++) {
+ sum += j;
+ }
+ a[i] = sum;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ foo (N);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += j;
+ if (a[i] != sum)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-10b.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-10b.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-10b.c (revision 0)
***************
*** 0 ****
--- 1,57 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+ int b[N];
+
+ int
+ foo (int n){
+ int i,j;
+ int sum,x,y;
+
+ if (n<=0)
+ return 0;
+
+ for (i = 0; i < N/2; i++) {
+ sum = 0;
+ x = b[2*i];
+ y = b[2*i+1];
+ for (j = 0; j < n; j++) {
+ sum += j;
+ }
+ a[2*i] = sum + x;
+ a[2*i+1] = sum + y;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ b[i] = i;
+
+ foo (N-1);
+
+ /* check results: */
+ for (i=0; i<N/2; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N-1; j++)
+ sum += j;
+ if (a[2*i] != sum + b[2*i] || a[2*i+1] != sum + b[2*i+1])
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-19.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-19.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-19.c (revision 0)
***************
*** 0 ****
--- 1,52 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 64
+
+ unsigned short a[N];
+ unsigned int b[N];
+
+ int
+ foo (){
+ unsigned short i,j;
+ unsigned short sum;
+
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ for (j = 0; j < N; j++) {
+ sum += j;
+ }
+ a[i] = sum;
+ b[i] = (unsigned int)sum;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ short sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ foo ();
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += j;
+ if (a[i] != sum || b[i] != (unsigned int)sum)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-20.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-20.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-20.c (revision 0)
***************
*** 0 ****
--- 1,54 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+ int b[N];
+
+ int
+ foo (){
+ int i,j;
+ int sum,x,y;
+
+ for (i = 0; i < N/2; i++) {
+ sum = 0;
+ x = b[2*i];
+ y = b[2*i+1];
+ for (j = 0; j < N; j++) {
+ sum += j;
+ }
+ a[2*i] = sum + x;
+ a[2*i+1] = sum + y;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ b[i] = i;
+
+ foo ();
+
+ /* check results: */
+ for (i=0; i<N/2; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += j;
+ if (a[2*i] != sum + b[2*i] || a[2*i+1] != sum + b[2*i+1])
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-1.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-1.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-1.c (revision 0)
***************
*** 0 ****
--- 1,23 ----
+ /* { dg-do compile } */
+
+ #define N 40
+ signed short image[N][N];
+ signed short block[N][N];
+
+ /* memory references in the inner-loop */
+
+ unsigned int
+ foo (){
+ int i,j;
+ unsigned int diff = 0;
+
+ for (i = 0; i < N; i++) {
+ for (j = 0; j < N; j++) {
+ diff += (image[i][j] - block[i][j]);
+ }
+ }
+ return diff;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-9.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-9.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-9.c (revision 0)
***************
*** 0 ****
--- 1,50 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (int n){
+ int i,j;
+ int sum;
+
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ for (j = 0; j < n; j++) {
+ sum += j;
+ }
+ a[i] = sum;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ foo (N);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += j;
+ if (a[i] != sum)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-12.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-12.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-12.c (revision 0)
***************
*** 0 ****
--- 1,50 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 64
+
+ int a[N];
+ short b[N];
+
+ int
+ foo (){
+ int i,j;
+ int sum;
+
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ for (j = 0; j < N; j++) {
+ sum += j;
+ }
+ a[i] = sum;
+ b[i] = (short)sum;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ foo ();
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += j;
+ if (a[i] != sum || b[i] != (short)sum)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* Until we support multiple types in the inner loop */
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-21.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-21.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-21.c (revision 0)
***************
*** 0 ****
--- 1,62 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (){
+ int i;
+ unsigned short j;
+ int sum = 0;
+ unsigned short sum_j;
+
+ for (i = 0; i < N; i++) {
+ sum += i;
+
+ sum_j = i;
+ for (j = 0; j < N; j++) {
+ sum_j += j;
+ }
+ a[i] = sum_j + 5;
+ }
+ return sum;
+ }
+
+ int main (void)
+ {
+ int i;
+ unsigned short j, sum_j;
+ int sum = 0;
+ int res;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ res = foo ();
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum += i;
+
+ sum_j = i;
+ for (j = 0; j < N; j++){
+ sum_j += j;
+ }
+ if (a[i] != sum_j + 5)
+ abort();
+ }
+ if (res != sum)
+ abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-2.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-2.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-2.c (revision 0)
***************
*** 0 ****
--- 1,18 ----
+ /* { dg-do compile } */
+ #define N 40
+
+ int
+ foo (){
+ int i,j;
+ int diff = 0;
+
+ for (i = 0; i < N; i++) {
+ for (j = 0; j < N; j++) {
+ diff += j;
+ }
+ }
+ return diff;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-13.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-13.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-13.c (revision 0)
***************
*** 0 ****
--- 1,67 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 16
+
+ unsigned short in[N];
+
+ unsigned int
+ foo (short scale){
+ int i;
+ unsigned short j;
+ unsigned int sum = 0;
+ unsigned short sum_j;
+
+ for (i = 0; i < N; i++) {
+ sum_j = 0;
+ for (j = 0; j < N; j++) {
+ sum_j += j;
+ }
+ sum += ((unsigned int) in[i] * (unsigned int) sum_j) >> scale;
+ }
+ return sum;
+ }
+
+ unsigned short
+ bar (void)
+ {
+ unsigned short j;
+ unsigned short sum_j;
+ sum_j = 0;
+ for (j = 0; j < N; j++) {
+ sum_j += j;
+ }
+ return sum_j;
+ }
+
+ int main (void)
+ {
+ int i;
+ unsigned short j, sum_j;
+ unsigned int sum = 0;
+ unsigned int res;
+
+ check_vect ();
+
+ for (i=0; i<N; i++){
+ in[i] = i;
+ }
+
+ res = foo (2);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum_j = bar ();
+ sum += ((unsigned int) in[i] * (unsigned int) sum_j) >> 2;
+ }
+ if (res != sum)
+ abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target vect_widen_mult_hi_to_si } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-22.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-22.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-22.c (revision 0)
***************
*** 0 ****
--- 1,54 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (int n){
+ int i,j;
+ int sum;
+
+ if (n<=0)
+ return 0;
+
+ /* inner-loop index j used after the inner-loop */
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ for (j = 0; j < n; j+=2) {
+ sum += j;
+ }
+ a[i] = sum + j;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ foo (N);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j+=2)
+ sum += j;
+ if (a[i] != sum + j)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-3.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-3.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-3.c (revision 0)
***************
*** 0 ****
--- 1,51 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (){
+ int i,j;
+ int sum;
+
+ /* inner-loop step > 1 */
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ for (j = 0; j < N; j+=2) {
+ sum += j;
+ }
+ a[i] = sum;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ foo ();
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j+=2)
+ sum += j;
+ if (a[i] != sum)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-14.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-14.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-14.c (revision 0)
***************
*** 0 ****
--- 1,61 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 64
+
+ unsigned short
+ foo (short scale){
+ int i;
+ unsigned short j;
+ unsigned short sum = 0;
+ unsigned short sum_j;
+
+ for (i = 0; i < N; i++) {
+ sum_j = 0;
+ for (j = 0; j < N; j++) {
+ sum_j += j;
+ }
+ sum += sum_j;
+ }
+ return sum;
+ }
+
+ unsigned short
+ bar (void)
+ {
+ unsigned short j;
+ unsigned short sum_j;
+ sum_j = 0;
+ for (j = 0; j < N; j++) {
+ sum_j += j;
+ }
+ return sum_j;
+ }
+
+ int main (void)
+ {
+ int i;
+ unsigned short j, sum_j;
+ unsigned short sum = 0;
+ unsigned short res;
+
+ check_vect ();
+
+ res = foo (2);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum_j = bar();
+ sum += sum_j;
+ }
+ if (res != sum)
+ abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target vect_widen_mult_hi_to_si } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-tree-scev-cprop-vect-iv-1.c
===================================================================
*** testsuite/gcc.dg/vect/no-tree-scev-cprop-vect-iv-1.c (revision 127202)
--- testsuite/gcc.dg/vect/no-tree-scev-cprop-vect-iv-1.c (working copy)
***************
*** 1,34 ****
- /* { dg-require-effective-target vect_int } */
-
- #include <stdarg.h>
- #include "tree-vect.h"
-
- #define N 26
-
- int main1 (int X)
- {
- int s = X;
- int i;
-
- /* vectorization of reduction with induction.
- Need -fno-tree-scev-cprop or else the loop is eliminated. */
- for (i = 0; i < N; i++)
- s += i;
-
- return s;
- }
-
- int main (void)
- {
- int s;
- check_vect ();
-
- s = main1 (3);
- if (s != 328)
- abort ();
-
- return 0;
- }
-
- /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
- /* { dg-final { cleanup-tree-dump "vect" } } */
--- 0 ----
Index: testsuite/gcc.dg/vect/no-scevccp-outer-4.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-4.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-4.c (revision 0)
***************
*** 0 ****
--- 1,55 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ /* induction variable k advances through inner and outer loops. */
+
+ int
+ foo (int n){
+ int i,j,k=0;
+ int sum;
+
+ if (n<=0)
+ return 0;
+
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ for (j = 0; j < n; j+=2) {
+ sum += k++;
+ }
+ a[i] = sum + j;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j,k=0;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ foo (N);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j+=2)
+ sum += k++;
+ if (a[i] != sum + j)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-15.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-15.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-15.c (revision 0)
***************
*** 0 ****
--- 1,48 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (int x){
+ int i,j;
+ int sum;
+
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ for (j = 0; j < N; j++) {
+ sum += j;
+ }
+ a[i] = sum + i + x;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+ int aa[N];
+
+ check_vect ();
+
+ foo (3);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += j;
+ if (a[i] != sum + i + 3)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-tree-scev-cprop-vect-iv-2.c
===================================================================
*** testsuite/gcc.dg/vect/no-tree-scev-cprop-vect-iv-2.c (revision 127202)
--- testsuite/gcc.dg/vect/no-tree-scev-cprop-vect-iv-2.c (working copy)
***************
*** 1,49 ****
- /* { dg-require-effective-target vect_int } */
-
- #include <stdarg.h>
- #include "tree-vect.h"
-
- #define N 16
-
- int main1 ()
- {
- int arr1[N];
- int k = 0;
- int m = 3, i = 0;
-
- /* Vectorization of induction that is used after the loop.
- Currently vectorizable because scev_ccp disconnects the
- use-after-the-loop from the iv def inside the loop. */
-
- do {
- k = k + 2;
- arr1[i] = k;
- m = m + k;
- i++;
- } while (i < N);
-
- /* check results: */
- for (i = 0; i < N; i++)
- {
- if (arr1[i] != 2+2*i)
- abort ();
- }
-
- return m + k;
- }
-
- int main (void)
- {
- int res;
-
- check_vect ();
-
- res = main1 ();
- if (res != 32 + 275)
- abort ();
-
- return 0;
- }
-
- /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail *-*-* } } } */
- /* { dg-final { cleanup-tree-dump "vect" } } */
--- 0 ----
Index: testsuite/gcc.dg/vect/vect.exp
===================================================================
*** testsuite/gcc.dg/vect/vect.exp (revision 127202)
--- testsuite/gcc.dg/vect/vect.exp (working copy)
*************** dg-runtest [lsort [glob -nocomplain $src
*** 176,183 ****
# -fno-tree-scev-cprop
set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
lappend DEFAULT_VECTCFLAGS "-fno-tree-scev-cprop"
! dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-tree-scev-cprop-*.\[cS\]]] \
! "" $DEFAULT_VECTCFLAGS
# -fno-tree-dominator-opts
set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
--- 176,195 ----
# -fno-tree-scev-cprop
set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
lappend DEFAULT_VECTCFLAGS "-fno-tree-scev-cprop"
! dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-scevccp-vect-*.\[cS\]]] \
! "" $DEFAULT_VECTCFLAGS
!
! # -fno-tree-scev-cprop
! set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
! lappend DEFAULT_VECTCFLAGS "-fno-tree-scev-cprop"
! dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-scevccp-outer-*.\[cS\]]] \
! "" $DEFAULT_VECTCFLAGS
!
! # -fno-tree-scev-cprop -fno-tree-reassoc
! set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
! lappend DEFAULT_VECTCFLAGS "-fno-tree-scev-cprop" "-fno-tree-reassoc"
! dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-scevccp-noreassoc-*.\[cS\]]] \
! "" $DEFAULT_VECTCFLAGS
# -fno-tree-dominator-opts
set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
Index: testsuite/gcc.dg/vect/no-scevccp-outer-5.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-5.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-5.c (revision 0)
***************
*** 0 ****
--- 1,53 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (){
+ int i,j;
+ int sum;
+
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ for (j = 0; j < N; j++) {
+ sum += j;
+ }
+ a[i] += sum + i;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+ int aa[N];
+
+ check_vect ();
+
+ for (i=0; i<N; i++){
+ a[i] = i;
+ aa[i] = i;
+ }
+
+ foo ();
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += j;
+ if (a[i] != aa[i] + sum + i)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-9a.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-9a.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-9a.c (revision 0)
***************
*** 0 ****
--- 1,54 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (int n){
+ int i,j;
+ int sum;
+
+ if (n<=0)
+ return 0;
+
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ j = 0;
+ do {
+ sum += j;
+ }while (++j < n);
+ a[i] = sum;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ foo (N);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += j;
+ if (a[i] != sum)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-16.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-16.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-16.c (revision 0)
***************
*** 0 ****
--- 1,62 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (){
+ int i;
+ unsigned short j;
+ int sum = 0;
+ unsigned short sum_j;
+
+ for (i = 0; i < N; i++) {
+ sum += i;
+
+ sum_j = 0;
+ for (j = 0; j < N; j++) {
+ sum_j += j;
+ }
+ a[i] = sum_j + 5;
+ }
+ return sum;
+ }
+
+ int main (void)
+ {
+ int i;
+ unsigned short j, sum_j;
+ int sum = 0;
+ int res;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ res = foo ();
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum += i;
+
+ sum_j = 0;
+ for (j = 0; j < N; j++){
+ sum_j += j;
+ }
+ if (a[i] != sum_j + 5)
+ abort();
+ }
+ if (res != sum)
+ abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-6.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-6.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-6.c (revision 0)
***************
*** 0 ****
--- 1,56 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int
+ foo (int * __restrict__ b, int k){
+ int i,j;
+ int sum,x;
+ int a[N];
+
+ for (i = 0; i < N; i++) {
+ sum = b[i];
+ for (j = 0; j < N; j++) {
+ sum += j;
+ }
+ a[i] = sum;
+ }
+
+ return a[k];
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+ int b[N];
+ int a[N];
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ b[i] = i + 2;
+
+ for (i=0; i<N; i++)
+ a[i] = foo (b,i);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = b[i];
+ for (j = 0; j < N; j++){
+ sum += j;
+ }
+ if (a[i] != sum)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-9b.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-9b.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-9b.c (revision 0)
***************
*** 0 ****
--- 1,53 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (int n){
+ int i,j;
+ int sum;
+
+ if (n<=0)
+ return 0;
+
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ for (j = 0; j < n; j++) {
+ sum += j;
+ }
+ a[i] = sum;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ foo (N);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += j;
+ if (a[i] != sum)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-17.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-17.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-17.c (revision 0)
***************
*** 0 ****
--- 1,68 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+ int b[N];
+ int c[N];
+
+ int
+ foo (){
+ int i;
+ unsigned short j;
+ int sum = 0;
+ unsigned short sum_j;
+
+ for (i = 0; i < N; i++) {
+ int diff = b[i] - c[i];
+
+ sum_j = 0;
+ for (j = 0; j < N; j++) {
+ sum_j += j;
+ }
+ a[i] = sum_j + 5;
+
+ sum += diff;
+ }
+ return sum;
+ }
+
+ int main (void)
+ {
+ int i;
+ unsigned short j, sum_j;
+ int sum = 0;
+ int res;
+
+ check_vect ();
+
+ for (i=0; i<N; i++){
+ b[i] = i;
+ c[i] = 2*i;
+ }
+
+ res = foo ();
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum += (b[i] - c[i]);
+
+ sum_j = 0;
+ for (j = 0; j < N; j++){
+ sum_j += j;
+ }
+ if (a[i] != sum_j + 5)
+ abort();
+ }
+ if (res != sum)
+ abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: tree-vectorizer.c
===================================================================
*** tree-vectorizer.c (revision 127202)
--- tree-vectorizer.c (working copy)
*************** new_stmt_vec_info (tree stmt, loop_vec_i
*** 1345,1351 ****
STMT_VINFO_IN_PATTERN_P (res) = false;
STMT_VINFO_RELATED_STMT (res) = NULL;
STMT_VINFO_DATA_REF (res) = NULL;
! if (TREE_CODE (stmt) == PHI_NODE)
STMT_VINFO_DEF_TYPE (res) = vect_unknown_def_type;
else
STMT_VINFO_DEF_TYPE (res) = vect_loop_def;
--- 1345,1351 ----
STMT_VINFO_IN_PATTERN_P (res) = false;
STMT_VINFO_RELATED_STMT (res) = NULL;
STMT_VINFO_DATA_REF (res) = NULL;
! if (TREE_CODE (stmt) == PHI_NODE && is_loop_header_bb_p (bb_for_stmt (stmt)))
STMT_VINFO_DEF_TYPE (res) = vect_unknown_def_type;
else
STMT_VINFO_DEF_TYPE (res) = vect_loop_def;
*************** new_stmt_vec_info (tree stmt, loop_vec_i
*** 1364,1369 ****
--- 1364,1383 ----
}
+ /* Function bb_in_loop_p
+
+ Used as predicate for dfs order traversal of the loop bbs. */
+
+ static bool
+ bb_in_loop_p (basic_block bb, void *data)
+ {
+ struct loop *loop = (struct loop *)data;
+ if (flow_bb_inside_loop_p (loop, bb))
+ return true;
+ return false;
+ }
+
+
/* Function new_loop_vec_info.
Create and initialize a new loop_vec_info struct for LOOP, as well as
*************** new_loop_vec_info (struct loop *loop)
*** 1375,1392 ****
loop_vec_info res;
basic_block *bbs;
block_stmt_iterator si;
! unsigned int i;
res = (loop_vec_info) xcalloc (1, sizeof (struct _loop_vec_info));
bbs = get_loop_body (loop);
! /* Create stmt_info for all stmts in the loop. */
for (i = 0; i < loop->num_nodes; i++)
{
basic_block bb = bbs[i];
tree phi;
for (phi = phi_nodes (bb); phi; phi = PHI_CHAIN (phi))
{
stmt_ann_t ann = get_stmt_ann (phi);
--- 1389,1437 ----
loop_vec_info res;
basic_block *bbs;
block_stmt_iterator si;
! unsigned int i, nbbs;
res = (loop_vec_info) xcalloc (1, sizeof (struct _loop_vec_info));
+ LOOP_VINFO_LOOP (res) = loop;
bbs = get_loop_body (loop);
! /* Create/Update stmt_info for all stmts in the loop. */
for (i = 0; i < loop->num_nodes; i++)
{
basic_block bb = bbs[i];
tree phi;
+ /* BBs in a nested inner-loop will have been already processed (because
+ we will have called vect_analyze_loop_form for any nested inner-loop).
+ Therefore, for stmts in an inner-loop we just want to update the
+ STMT_VINFO_LOOP_VINFO field of their stmt_info to point to the new
+ loop_info of the outer-loop we are currently considering to vectorize
+ (instead of the loop_info of the inner-loop).
+ For stmts in other BBs we need to create a stmt_info from scratch. */
+ if (bb->loop_father != loop)
+ {
+ /* Inner-loop bb. */
+ gcc_assert (loop->inner && bb->loop_father == loop->inner);
+ for (phi = phi_nodes (bb); phi; phi = PHI_CHAIN (phi))
+ {
+ stmt_vec_info stmt_info = vinfo_for_stmt (phi);
+ loop_vec_info inner_loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ gcc_assert (loop->inner == LOOP_VINFO_LOOP (inner_loop_vinfo));
+ STMT_VINFO_LOOP_VINFO (stmt_info) = res;
+ }
+ for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
+ {
+ tree stmt = bsi_stmt (si);
+ stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+ loop_vec_info inner_loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ gcc_assert (loop->inner == LOOP_VINFO_LOOP (inner_loop_vinfo));
+ STMT_VINFO_LOOP_VINFO (stmt_info) = res;
+ }
+ }
+ else
+ {
+ /* bb in current nest. */
for (phi = phi_nodes (bb); phi; phi = PHI_CHAIN (phi))
{
stmt_ann_t ann = get_stmt_ann (phi);
*************** new_loop_vec_info (struct loop *loop)
*** 1396,1411 ****
for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
{
tree stmt = bsi_stmt (si);
! stmt_ann_t ann;
!
! ann = stmt_ann (stmt);
set_stmt_info (ann, new_stmt_vec_info (stmt, res));
}
}
- LOOP_VINFO_LOOP (res) = loop;
LOOP_VINFO_BBS (res) = bbs;
- LOOP_VINFO_EXIT_COND (res) = NULL;
LOOP_VINFO_NITERS (res) = NULL;
LOOP_VINFO_COST_MODEL_MIN_ITERS (res) = 0;
LOOP_VINFO_VECTORIZABLE_P (res) = 0;
--- 1441,1464 ----
for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
{
tree stmt = bsi_stmt (si);
! stmt_ann_t ann = stmt_ann (stmt);
set_stmt_info (ann, new_stmt_vec_info (stmt, res));
}
}
+ }
+
+ /* CHECKME: We want to visit all BBs before their successors (except for
+ latch blocks, for which this assertion wouldn't hold). In the simple
+ case of the loop forms we allow, a dfs order of the BBs would the same
+ as reversed postorder traversal, so we are safe. */
+
+ free (bbs);
+ bbs = XCNEWVEC (basic_block, loop->num_nodes);
+ nbbs = dfs_enumerate_from (loop->header, 0, bb_in_loop_p,
+ bbs, loop->num_nodes, loop);
+ gcc_assert (nbbs == loop->num_nodes);
LOOP_VINFO_BBS (res) = bbs;
LOOP_VINFO_NITERS (res) = NULL;
LOOP_VINFO_COST_MODEL_MIN_ITERS (res) = 0;
LOOP_VINFO_VECTORIZABLE_P (res) = 0;
*************** new_loop_vec_info (struct loop *loop)
*** 1427,1433 ****
stmts in the loop. */
void
! destroy_loop_vec_info (loop_vec_info loop_vinfo)
{
struct loop *loop;
basic_block *bbs;
--- 1480,1486 ----
stmts in the loop. */
void
! destroy_loop_vec_info (loop_vec_info loop_vinfo, bool clean_stmts)
{
struct loop *loop;
basic_block *bbs;
*************** destroy_loop_vec_info (loop_vec_info loo
*** 1443,1448 ****
--- 1496,1513 ----
bbs = LOOP_VINFO_BBS (loop_vinfo);
nbbs = loop->num_nodes;
+ if (!clean_stmts)
+ {
+ free (LOOP_VINFO_BBS (loop_vinfo));
+ free_data_refs (LOOP_VINFO_DATAREFS (loop_vinfo));
+ free_dependence_relations (LOOP_VINFO_DDRS (loop_vinfo));
+ VEC_free (tree, heap, LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo));
+
+ free (loop_vinfo);
+ loop->aux = NULL;
+ return;
+ }
+
for (j = 0; j < nbbs; j++)
{
basic_block bb = bbs[j];
*************** vect_is_simple_use (tree operand, loop_v
*** 1714,1721 ****
{
case PHI_NODE:
*def = PHI_RESULT (*def_stmt);
- gcc_assert (*dt == vect_induction_def || *dt == vect_reduction_def
- || *dt == vect_invariant_def);
break;
case GIMPLE_MODIFY_STMT:
--- 1779,1784 ----
*************** supportable_widening_operation (enum tre
*** 1756,1761 ****
--- 1819,1826 ----
enum tree_code *code1, enum tree_code *code2)
{
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+ loop_vec_info loop_info = STMT_VINFO_LOOP_VINFO (stmt_info);
+ struct loop *vect_loop = LOOP_VINFO_LOOP (loop_info);
bool ordered_p;
enum machine_mode vec_mode;
enum insn_code icode1, icode2;
*************** supportable_widening_operation (enum tre
*** 1778,1786 ****
Some targets can take advantage of this and generate more efficient code.
For example, targets like Altivec, that support widen_mult using a sequence
of {mult_even,mult_odd} generate the following vectors:
! vect1: [res1,res3,res5,res7], vect2: [res2,res4,res6,res8]. */
! if (STMT_VINFO_RELEVANT (stmt_info) == vect_used_by_reduction)
ordered_p = false;
else
ordered_p = true;
--- 1843,1857 ----
Some targets can take advantage of this and generate more efficient code.
For example, targets like Altivec, that support widen_mult using a sequence
of {mult_even,mult_odd} generate the following vectors:
! vect1: [res1,res3,res5,res7], vect2: [res2,res4,res6,res8].
! When vectorizaing outer-loops, we execute the inner-loop sequentially
! (each vectorized inner-loop iteration contributes to VF outer-loop
! iterations in parallel). We therefore don't allow to change the order
! of the computation in the inner-loop during outer-loop vectorization. */
!
! if (STMT_VINFO_RELEVANT (stmt_info) == vect_used_by_reduction
! && !nested_in_vect_loop_p (vect_loop, stmt))
ordered_p = false;
else
ordered_p = true;
*************** reduction_code_for_scalar_code (enum tre
*** 2004,2011 ****
Conditions 2,3 are tested in vect_mark_stmts_to_be_vectorized. */
tree
! vect_is_simple_reduction (struct loop *loop, tree phi)
{
edge latch_e = loop_latch_edge (loop);
tree loop_arg = PHI_ARG_DEF_FROM_EDGE (phi, latch_e);
tree def_stmt, def1, def2;
--- 2075,2084 ----
Conditions 2,3 are tested in vect_mark_stmts_to_be_vectorized. */
tree
! vect_is_simple_reduction (loop_vec_info loop_info, tree phi)
{
+ struct loop *loop = (bb_for_stmt (phi))->loop_father;
+ struct loop *vect_loop = LOOP_VINFO_LOOP (loop_info);
edge latch_e = loop_latch_edge (loop);
tree loop_arg = PHI_ARG_DEF_FROM_EDGE (phi, latch_e);
tree def_stmt, def1, def2;
*************** vect_is_simple_reduction (struct loop *l
*** 2018,2023 ****
--- 2091,2098 ----
imm_use_iterator imm_iter;
use_operand_p use_p;
+ gcc_assert (loop == vect_loop || flow_loop_nested_p (vect_loop, loop));
+
name = PHI_RESULT (phi);
nloop_uses = 0;
FOR_EACH_IMM_USE_FAST (use_p, imm_iter, name)
*************** vect_is_simple_reduction (struct loop *l
*** 2129,2136 ****
return NULL_TREE;
}
/* CHECKME: check for !flag_finite_math_only too? */
! if (SCALAR_FLOAT_TYPE_P (type) && !flag_unsafe_math_optimizations)
{
/* Changing the order of operations changes the semantics. */
if (vect_print_dump_info (REPORT_DETAILS))
--- 2204,2219 ----
return NULL_TREE;
}
+ /* Generally, when vectorizing a reduction we change the order of the
+ computation. This may change the behavior of the program in some
+ cases, so we need to check that this is ok. One exception is when
+ vectorizing an outer-loop: the inner-loop is executed sequentially,
+ and therefore vectorizing reductions in the inner-loop durint
+ outer-loop vectorization is safe. */
+
/* CHECKME: check for !flag_finite_math_only too? */
! if (SCALAR_FLOAT_TYPE_P (type) && !flag_unsafe_math_optimizations
! && !nested_in_vect_loop_p (vect_loop, def_stmt))
{
/* Changing the order of operations changes the semantics. */
if (vect_print_dump_info (REPORT_DETAILS))
*************** vect_is_simple_reduction (struct loop *l
*** 2140,2146 ****
}
return NULL_TREE;
}
! else if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_TRAPS (type))
{
/* Changing the order of operations changes the semantics. */
if (vect_print_dump_info (REPORT_DETAILS))
--- 2223,2230 ----
}
return NULL_TREE;
}
! else if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_TRAPS (type)
! && !nested_in_vect_loop_p (vect_loop, def_stmt))
{
/* Changing the order of operations changes the semantics. */
if (vect_print_dump_info (REPORT_DETAILS))
*************** vect_is_simple_reduction (struct loop *l
*** 2169,2181 ****
/* Check that one def is the reduction def, defined by PHI,
! the other def is either defined in the loop by a GIMPLE_MODIFY_STMT,
! or it's an induction (defined by some phi node). */
if (def2 == phi
&& flow_bb_inside_loop_p (loop, bb_for_stmt (def1))
&& (TREE_CODE (def1) == GIMPLE_MODIFY_STMT
! || STMT_VINFO_DEF_TYPE (vinfo_for_stmt (def1)) == vect_induction_def))
{
if (vect_print_dump_info (REPORT_DETAILS))
{
--- 2253,2268 ----
/* Check that one def is the reduction def, defined by PHI,
! the other def is either defined in the loop ("vect_loop_def"),
! or it's an induction (defined by a loop-header phi-node). */
if (def2 == phi
&& flow_bb_inside_loop_p (loop, bb_for_stmt (def1))
&& (TREE_CODE (def1) == GIMPLE_MODIFY_STMT
! || STMT_VINFO_DEF_TYPE (vinfo_for_stmt (def1)) == vect_induction_def
! || (TREE_CODE (def1) == PHI_NODE
! && STMT_VINFO_DEF_TYPE (vinfo_for_stmt (def1)) == vect_loop_def
! && !is_loop_header_bb_p (bb_for_stmt (def1)))))
{
if (vect_print_dump_info (REPORT_DETAILS))
{
*************** vect_is_simple_reduction (struct loop *l
*** 2187,2193 ****
else if (def1 == phi
&& flow_bb_inside_loop_p (loop, bb_for_stmt (def2))
&& (TREE_CODE (def2) == GIMPLE_MODIFY_STMT
! || STMT_VINFO_DEF_TYPE (vinfo_for_stmt (def2)) == vect_induction_def))
{
/* Swap operands (just for simplicity - so that the rest of the code
can assume that the reduction variable is always the last (second)
--- 2274,2283 ----
else if (def1 == phi
&& flow_bb_inside_loop_p (loop, bb_for_stmt (def2))
&& (TREE_CODE (def2) == GIMPLE_MODIFY_STMT
! || STMT_VINFO_DEF_TYPE (vinfo_for_stmt (def2)) == vect_induction_def
! || (TREE_CODE (def2) == PHI_NODE
! && STMT_VINFO_DEF_TYPE (vinfo_for_stmt (def2)) == vect_loop_def
! && !is_loop_header_bb_p (bb_for_stmt (def2)))))
{
/* Swap operands (just for simplicity - so that the rest of the code
can assume that the reduction variable is always the last (second)
*************** vectorize_loops (void)
*** 2326,2332 ****
if (!loop)
continue;
loop_vinfo = loop->aux;
! destroy_loop_vec_info (loop_vinfo);
loop->aux = NULL;
}
--- 2416,2422 ----
if (!loop)
continue;
loop_vinfo = loop->aux;
! destroy_loop_vec_info (loop_vinfo, true);
loop->aux = NULL;
}
Index: tree-vectorizer.h
===================================================================
*** tree-vectorizer.h (revision 127202)
--- tree-vectorizer.h (working copy)
*************** typedef struct _loop_vec_info {
*** 92,100 ****
/* The loop basic blocks. */
basic_block *bbs;
- /* The loop exit_condition. */
- tree exit_cond;
-
/* Number of iterations. */
tree num_iters;
--- 92,97 ----
*************** typedef struct _loop_vec_info {
*** 144,150 ****
/* Access Functions. */
#define LOOP_VINFO_LOOP(L) (L)->loop
#define LOOP_VINFO_BBS(L) (L)->bbs
- #define LOOP_VINFO_EXIT_COND(L) (L)->exit_cond
#define LOOP_VINFO_NITERS(L) (L)->num_iters
#define LOOP_VINFO_COST_MODEL_MIN_ITERS(L) (L)->min_profitable_iters
#define LOOP_VINFO_VECTORIZABLE_P(L) (L)->vectorizable
--- 141,146 ----
*************** typedef struct _loop_vec_info {
*** 165,170 ****
--- 161,179 ----
#define LOOP_VINFO_NITERS_KNOWN_P(L) \
NITERS_KNOWN_P((L)->num_iters)
+ static inline loop_vec_info
+ loop_vec_info_for_loop (struct loop *loop)
+ {
+ return (loop_vec_info) loop->aux;
+ }
+
+ static inline bool
+ nested_in_vect_loop_p (struct loop *loop, tree stmt)
+ {
+ return (loop->inner
+ && (loop->inner == (bb_for_stmt (stmt))->loop_father));
+ }
+
/*-----------------------------------------------------------------*/
/* Info on vectorized defs. */
/*-----------------------------------------------------------------*/
*************** enum stmt_vec_info_type {
*** 180,191 ****
induc_vec_info_type,
type_promotion_vec_info_type,
type_demotion_vec_info_type,
! type_conversion_vec_info_type
};
/* Indicates whether/how a variable is used in the loop. */
enum vect_relevant {
vect_unused_in_loop = 0,
/* defs that feed computations that end up (only) in a reduction. These
defs may be used by non-reduction stmts, but eventually, any
--- 189,203 ----
induc_vec_info_type,
type_promotion_vec_info_type,
type_demotion_vec_info_type,
! type_conversion_vec_info_type,
! loop_exit_ctrl_vec_info_type
};
/* Indicates whether/how a variable is used in the loop. */
enum vect_relevant {
vect_unused_in_loop = 0,
+ vect_used_in_outer_by_reduction,
+ vect_used_in_outer,
/* defs that feed computations that end up (only) in a reduction. These
defs may be used by non-reduction stmts, but eventually, any
*************** is_pattern_stmt_p (stmt_vec_info stmt_in
*** 403,408 ****
--- 415,429 ----
return false;
}
+ static inline bool
+ is_loop_header_bb_p (basic_block bb)
+ {
+ if (bb == (bb->loop_father)->header)
+ return true;
+ gcc_assert (EDGE_COUNT (bb->preds) == 1);
+ return false;
+ }
+
/*-----------------------------------------------------------------*/
/* Info on data references alignment. */
/*-----------------------------------------------------------------*/
*************** extern tree get_vectype_for_scalar_type
*** 462,468 ****
extern bool vect_is_simple_use (tree, loop_vec_info, tree *, tree *,
enum vect_def_type *);
extern bool vect_is_simple_iv_evolution (unsigned, tree, tree *, tree *);
! extern tree vect_is_simple_reduction (struct loop *, tree);
extern bool vect_can_force_dr_alignment_p (tree, unsigned int);
extern enum dr_alignment_support vect_supportable_dr_alignment
(struct data_reference *);
--- 483,489 ----
extern bool vect_is_simple_use (tree, loop_vec_info, tree *, tree *,
enum vect_def_type *);
extern bool vect_is_simple_iv_evolution (unsigned, tree, tree *, tree *);
! extern tree vect_is_simple_reduction (loop_vec_info, tree);
extern bool vect_can_force_dr_alignment_p (tree, unsigned int);
extern enum dr_alignment_support vect_supportable_dr_alignment
(struct data_reference *);
*************** extern bool supportable_narrowing_operat
*** 474,480 ****
/* Creation and deletion of loop and stmt info structs. */
extern loop_vec_info new_loop_vec_info (struct loop *loop);
! extern void destroy_loop_vec_info (loop_vec_info);
extern stmt_vec_info new_stmt_vec_info (tree stmt, loop_vec_info);
--- 495,501 ----
/* Creation and deletion of loop and stmt info structs. */
extern loop_vec_info new_loop_vec_info (struct loop *loop);
! extern void destroy_loop_vec_info (loop_vec_info, bool);
extern stmt_vec_info new_stmt_vec_info (tree stmt, loop_vec_info);
Index: tree-vect-analyze.c
===================================================================
*** tree-vect-analyze.c (revision 127202)
--- tree-vect-analyze.c (working copy)
*************** vect_analyze_operations (loop_vec_info l
*** 325,330 ****
--- 325,348 ----
print_generic_expr (vect_dump, phi, TDF_SLIM);
}
+ if (! is_loop_header_bb_p (bb))
+ {
+ /* inner-loop loop-closed exit phi in outer-loop vectorization
+ (i.e. a phi in the tail of the outer-loop).
+ FORNOW: we currently don't support the case that these phis
+ are not used in the outerloop, cause this case requires
+ to actually do something here. */
+ if (!STMT_VINFO_RELEVANT_P (stmt_info)
+ || STMT_VINFO_LIVE_P (stmt_info))
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump,
+ "Unsupported loop-closed phi in outer-loop.");
+ return false;
+ }
+ continue;
+ }
+
gcc_assert (stmt_info);
if (STMT_VINFO_LIVE_P (stmt_info))
*************** vect_analyze_operations (loop_vec_info l
*** 398,404 ****
break;
case vect_reduction_def:
! gcc_assert (relevance == vect_unused_in_loop);
break;
case vect_induction_def:
--- 416,424 ----
break;
case vect_reduction_def:
! gcc_assert (relevance == vect_used_in_outer
! || relevance == vect_used_in_outer_by_reduction
! || relevance == vect_unused_in_loop);
break;
case vect_induction_def:
*************** exist_non_indexing_operands_for_use_p (t
*** 589,638 ****
}
! /* Function vect_analyze_scalar_cycles.
!
! Examine the cross iteration def-use cycles of scalar variables, by
! analyzing the loop (scalar) PHIs; Classify each cycle as one of the
! following: invariant, induction, reduction, unknown.
!
! Some forms of scalar cycles are not yet supported.
!
! Example1: reduction: (unsupported yet)
!
! loop1:
! for (i=0; i<N; i++)
! sum += a[i];
!
! Example2: induction: (unsupported yet)
!
! loop2:
! for (i=0; i<N; i++)
! a[i] = i;
! Note: the following loop *is* vectorizable:
!
! loop3:
! for (i=0; i<N; i++)
! a[i] = b[i];
!
! even though it has a def-use cycle caused by the induction variable i:
!
! loop: i_2 = PHI (i_0, i_1)
! a[i_2] = ...;
! i_1 = i_2 + 1;
! GOTO loop;
!
! because the def-use cycle in loop3 is considered "not relevant" - i.e.,
! it does not need to be vectorized because it is only used for array
! indexing (see 'mark_stmts_to_be_vectorized'). The def-use cycle in
! loop2 on the other hand is relevant (it is being written to memory).
! */
static void
! vect_analyze_scalar_cycles (loop_vec_info loop_vinfo)
{
tree phi;
- struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
basic_block bb = loop->header;
tree dumy;
VEC(tree,heap) *worklist = VEC_alloc (tree, heap, 64);
--- 609,625 ----
}
! /* Function vect_analyze_scalar_cycles_1.
! Examine the cross iteration def-use cycles of scalar variables
! in LOOP. LOOP_VINFO represents the loop that is noe being
! considered for vectorization (can be LOOP, or an outer-loop
! enclosing LOOP). */
static void
! vect_analyze_scalar_cycles_1 (loop_vec_info loop_vinfo, struct loop *loop)
{
tree phi;
basic_block bb = loop->header;
tree dumy;
VEC(tree,heap) *worklist = VEC_alloc (tree, heap, 64);
*************** vect_analyze_scalar_cycles (loop_vec_inf
*** 698,704 ****
gcc_assert (is_gimple_reg (SSA_NAME_VAR (def)));
gcc_assert (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_unknown_def_type);
! reduc_stmt = vect_is_simple_reduction (loop, phi);
if (reduc_stmt)
{
if (vect_print_dump_info (REPORT_DETAILS))
--- 685,691 ----
gcc_assert (is_gimple_reg (SSA_NAME_VAR (def)));
gcc_assert (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_unknown_def_type);
! reduc_stmt = vect_is_simple_reduction (loop_vinfo, phi);
if (reduc_stmt)
{
if (vect_print_dump_info (REPORT_DETAILS))
*************** vect_analyze_scalar_cycles (loop_vec_inf
*** 717,722 ****
--- 704,751 ----
}
+ /* Function vect_analyze_scalar_cycles.
+
+ Examine the cross iteration def-use cycles of scalar variables, by
+ analyzing the loop-header PHIs of scalar variables; Classify each
+ cycle as one of the following: invariant, induction, reduction, unknown.
+ We do that for the loop represented by LOOP_VINFO, and also to its
+ inner-loop, if exists.
+ Examples for scalar cycles:
+
+ Example1: reduction:
+
+ loop1:
+ for (i=0; i<N; i++)
+ sum += a[i];
+
+ Example2: induction:
+
+ loop2:
+ for (i=0; i<N; i++)
+ a[i] = i; */
+
+ static void
+ vect_analyze_scalar_cycles (loop_vec_info loop_vinfo)
+ {
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+
+ vect_analyze_scalar_cycles_1 (loop_vinfo, loop);
+
+ /* When vectorizing an outer-loop, the inner-loop is executed sequentially.
+ Reductions in such inner-loop therefore have different properties than
+ the reductions in the nest that gets vectorized:
+ 1. When vectorized, they are executed in the same order as in the original
+ scalar loop, so we can't change the order of computation when
+ vectorizing them.
+ 2. FIXME: Inner-loop reductions can be used in the inner-loop, so the
+ current checks are too strict. */
+
+ if (loop->inner)
+ vect_analyze_scalar_cycles_1 (loop_vinfo, loop->inner);
+ }
+
+
/* Function vect_insert_into_interleaving_chain.
Insert DRA into the interleaving chain of DRB according to DRA's INIT. */
*************** vect_enhance_data_refs_alignment (loop_v
*** 1722,1728 ****
4) all misaligned data refs with a known misalignment are supported, and
5) the number of runtime alignment checks is within reason. */
! do_versioning = flag_tree_vect_loop_version && (!optimize_size);
if (do_versioning)
{
--- 1751,1760 ----
4) all misaligned data refs with a known misalignment are supported, and
5) the number of runtime alignment checks is within reason. */
! do_versioning =
! flag_tree_vect_loop_version
! && (!optimize_size)
! && (!loop->inner);
if (do_versioning)
{
*************** vect_analyze_data_refs (loop_vec_info lo
*** 2105,2110 ****
--- 2137,2143 ----
{
tree stmt;
stmt_vec_info stmt_info;
+ basic_block bb;
if (!dr || !DR_REF (dr))
{
*************** vect_analyze_data_refs (loop_vec_info lo
*** 2117,2122 ****
--- 2150,2165 ----
stmt = DR_STMT (dr);
stmt_info = vinfo_for_stmt (stmt);
+ /* If outer-loop vectorization: we don't yet support datarefs
+ in the innermost loop. */
+ bb = bb_for_stmt (stmt);
+ if (bb->loop_father != LOOP_VINFO_LOOP (loop_vinfo))
+ {
+ if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
+ fprintf (vect_dump, "not vectorized: data-ref in nested loop");
+ return false;
+ }
+
if (STMT_VINFO_DATA_REF (stmt_info))
{
if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
*************** vect_mark_relevant (VEC(tree,heap) **wor
*** 2204,2214 ****
/* This is the last stmt in a sequence that was detected as a
pattern that can potentially be vectorized. Don't mark the stmt
! as relevant/live because it's not going to vectorized.
Instead mark the pattern-stmt that replaces it. */
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "last stmt in pattern. don't mark relevant/live.");
- pattern_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
stmt_info = vinfo_for_stmt (pattern_stmt);
gcc_assert (STMT_VINFO_RELATED_STMT (stmt_info) == stmt);
save_relevant = STMT_VINFO_RELEVANT (stmt_info);
--- 2247,2259 ----
/* This is the last stmt in a sequence that was detected as a
pattern that can potentially be vectorized. Don't mark the stmt
! as relevant/live because it's not going to be vectorized.
Instead mark the pattern-stmt that replaces it. */
+
+ pattern_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
+
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "last stmt in pattern. don't mark relevant/live.");
stmt_info = vinfo_for_stmt (pattern_stmt);
gcc_assert (STMT_VINFO_RELATED_STMT (stmt_info) == stmt);
save_relevant = STMT_VINFO_RELEVANT (stmt_info);
*************** vect_stmt_relevant_p (tree stmt, loop_ve
*** 2258,2264 ****
*live_p = false;
/* cond stmt other than loop exit cond. */
! if (is_ctrl_stmt (stmt) && (stmt != LOOP_VINFO_EXIT_COND (loop_vinfo)))
*relevant = vect_used_in_loop;
/* changing memory. */
--- 2303,2310 ----
*live_p = false;
/* cond stmt other than loop exit cond. */
! if (is_ctrl_stmt (stmt)
! && STMT_VINFO_TYPE (vinfo_for_stmt (stmt)) != loop_exit_ctrl_vec_info_type)
*relevant = vect_used_in_loop;
/* changing memory. */
*************** vect_stmt_relevant_p (tree stmt, loop_ve
*** 2315,2320 ****
--- 2361,2368 ----
of the respective DEF_STMT is left unchanged.
- case 2: If STMT is a reduction phi and DEF_STMT is a reduction stmt, we
skip DEF_STMT cause it had already been processed.
+ - case 3: If DEF_STMT and STMT are in different nests, then "relevant" will
+ be modified accordingly.
Return true if everything is as expected. Return false otherwise. */
*************** process_use (tree stmt, tree use, loop_v
*** 2325,2331 ****
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt);
stmt_vec_info dstmt_vinfo;
! basic_block def_bb;
tree def, def_stmt;
enum vect_def_type dt;
--- 2373,2379 ----
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt);
stmt_vec_info dstmt_vinfo;
! basic_block bb, def_bb;
tree def, def_stmt;
enum vect_def_type dt;
*************** process_use (tree stmt, tree use, loop_v
*** 2346,2362 ****
def_bb = bb_for_stmt (def_stmt);
if (!flow_bb_inside_loop_p (loop, def_bb))
! return true;
! /* case 2: A reduction phi defining a reduction stmt (DEF_STMT). DEF_STMT
! must have already been processed, so we just check that everything is as
! expected, and we are done. */
dstmt_vinfo = vinfo_for_stmt (def_stmt);
if (TREE_CODE (stmt) == PHI_NODE
&& STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def
&& TREE_CODE (def_stmt) != PHI_NODE
! && STMT_VINFO_DEF_TYPE (dstmt_vinfo) == vect_reduction_def)
{
if (STMT_VINFO_IN_PATTERN_P (dstmt_vinfo))
dstmt_vinfo = vinfo_for_stmt (STMT_VINFO_RELATED_STMT (dstmt_vinfo));
gcc_assert (STMT_VINFO_RELEVANT (dstmt_vinfo) < vect_used_by_reduction);
--- 2394,2420 ----
def_bb = bb_for_stmt (def_stmt);
if (!flow_bb_inside_loop_p (loop, def_bb))
! {
! if (vect_print_dump_info (REPORT_DETAILS))
! fprintf (vect_dump, "def_stmt is out of loop.");
! return true;
! }
! /* case 2: A reduction phi (STMT) defined by a reduction stmt (DEF_STMT).
! DEF_STMT must have already been processed, because this should be the
! only way that STMT, which is a reduction-phi, was put in the worklist,
! as there should be no other uses for DEF_STMT in the loop. So we just
! check that everything is as expected, and we are done. */
dstmt_vinfo = vinfo_for_stmt (def_stmt);
+ bb = bb_for_stmt (stmt);
if (TREE_CODE (stmt) == PHI_NODE
&& STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def
&& TREE_CODE (def_stmt) != PHI_NODE
! && STMT_VINFO_DEF_TYPE (dstmt_vinfo) == vect_reduction_def
! && bb->loop_father == def_bb->loop_father)
{
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "reduc-stmt defining reduc-phi in the same nest.");
if (STMT_VINFO_IN_PATTERN_P (dstmt_vinfo))
dstmt_vinfo = vinfo_for_stmt (STMT_VINFO_RELATED_STMT (dstmt_vinfo));
gcc_assert (STMT_VINFO_RELEVANT (dstmt_vinfo) < vect_used_by_reduction);
*************** process_use (tree stmt, tree use, loop_v
*** 2365,2370 ****
--- 2423,2495 ----
return true;
}
+ /* case 3a: outer-loop stmt defining an inner-loop stmt:
+ outer-loop-header-bb:
+ d = def_stmt
+ inner-loop:
+ stmt # use (d)
+ outer-loop-tail-bb:
+ ... */
+ if (flow_loop_nested_p (def_bb->loop_father, bb->loop_father))
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "outer-loop def-stmt defining inner-loop stmt.");
+ switch (relevant)
+ {
+ case vect_unused_in_loop:
+ relevant = (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def) ?
+ vect_used_by_reduction : vect_unused_in_loop;
+ break;
+ case vect_used_in_outer_by_reduction:
+ relevant = vect_used_by_reduction;
+ break;
+ case vect_used_in_outer:
+ relevant = vect_used_in_loop;
+ break;
+ case vect_used_by_reduction:
+ case vect_used_in_loop:
+ break;
+
+ default:
+ gcc_unreachable ();
+ }
+ }
+
+ /* case 3b: inner-loop stmt defining an outer-loop stmt:
+ outer-loop-header-bb:
+ ...
+ inner-loop:
+ d = def_stmt
+ outer-loop-tail-bb:
+ stmt # use (d) */
+ else if (flow_loop_nested_p (bb->loop_father, def_bb->loop_father))
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "inner-loop def-stmt defining outer-loop stmt.");
+ switch (relevant)
+ {
+ case vect_unused_in_loop:
+ relevant = (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def) ?
+ vect_used_in_outer_by_reduction : vect_unused_in_loop;
+ break;
+
+ case vect_used_in_outer_by_reduction:
+ case vect_used_in_outer:
+ break;
+
+ case vect_used_by_reduction:
+ relevant = vect_used_in_outer_by_reduction;
+ break;
+
+ case vect_used_in_loop:
+ relevant = vect_used_in_outer;
+ break;
+
+ default:
+ gcc_unreachable ();
+ }
+ }
+
vect_mark_relevant (worklist, def_stmt, relevant, live_p);
return true;
}
*************** vect_mark_stmts_to_be_vectorized (loop_v
*** 2473,2497 ****
identify stmts that are used solely by a reduction, and therefore the
order of the results that they produce does not have to be kept.
! Reduction phis are expected to be used by a reduction stmt; Other
! reduction stmts are expected to be unused in the loop. These are the
! expected values of "relevant" for reduction phis/stmts in the loop:
relevance: phi stmt
vect_unused_in_loop ok
vect_used_by_reduction ok
vect_used_in_loop */
if (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def)
{
! switch (relevant)
{
case vect_unused_in_loop:
gcc_assert (TREE_CODE (stmt) != PHI_NODE);
break;
case vect_used_by_reduction:
if (TREE_CODE (stmt) == PHI_NODE)
break;
case vect_used_in_loop:
default:
if (vect_print_dump_info (REPORT_DETAILS))
--- 2598,2635 ----
identify stmts that are used solely by a reduction, and therefore the
order of the results that they produce does not have to be kept.
! Reduction phis are expected to be used by a reduction stmt, or by
! in an outer loop; Other reduction stmts are expected to be
! in the loop, and possibly used by a stmt in an outer loop.
! Here are the expected values of "relevant" for reduction phis/stmts:
relevance: phi stmt
vect_unused_in_loop ok
+ vect_used_in_outer_by_reduction ok ok
+ vect_used_in_outer ok ok
vect_used_by_reduction ok
vect_used_in_loop */
if (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def)
{
! enum vect_relevant tmp_relevant = relevant;
! switch (tmp_relevant)
{
case vect_unused_in_loop:
gcc_assert (TREE_CODE (stmt) != PHI_NODE);
+ relevant = vect_used_by_reduction;
+ break;
+
+ case vect_used_in_outer_by_reduction:
+ case vect_used_in_outer:
+ gcc_assert (TREE_CODE (stmt) != WIDEN_SUM_EXPR
+ && TREE_CODE (stmt) != DOT_PROD_EXPR);
break;
+
case vect_used_by_reduction:
if (TREE_CODE (stmt) == PHI_NODE)
break;
+ /* fall through */
case vect_used_in_loop:
default:
if (vect_print_dump_info (REPORT_DETAILS))
*************** vect_mark_stmts_to_be_vectorized (loop_v
*** 2499,2505 ****
VEC_free (tree, heap, worklist);
return false;
}
- relevant = vect_used_by_reduction;
live_p = false;
}
--- 2637,2642 ----
*************** vect_get_loop_niters (struct loop *loop,
*** 2641,2651 ****
}
/* Function vect_analyze_loop_form.
! Verify the following restrictions (some may be relaxed in the future):
! - it's an inner-most loop
! - number of BBs = 2 (which are the loop header and the latch)
- the loop has a pre-header
- the loop has a single entry and exit
- the loop exit condition is simple enough, and the number of iterations
--- 2778,2816 ----
}
+ /* Function vect_analyze_loop_1.
+
+ Apply a set of analyses on LOOP, and create a loop_vec_info struct
+ for it. The different analyses will record information in the
+ loop_vec_info struct. This is a subset of the analyses applied in
+ vect_analyze_loop, to be applied on an inner-loop nested in the loop
+ that is now considered for (outer-loop) vectorization. */
+
+ static loop_vec_info
+ vect_analyze_loop_1 (struct loop *loop)
+ {
+ loop_vec_info loop_vinfo;
+
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "===== analyze_loop_nest_1 =====");
+
+ /* Check the CFG characteristics of the loop (nesting, entry/exit, etc. */
+
+ loop_vinfo = vect_analyze_loop_form (loop);
+ if (!loop_vinfo)
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "bad inner-loop form.");
+ return NULL;
+ }
+
+ return loop_vinfo;
+ }
+
+
/* Function vect_analyze_loop_form.
! Verify that certain CFG restrictions hold, including:
- the loop has a pre-header
- the loop has a single entry and exit
- the loop exit condition is simple enough, and the number of iterations
*************** vect_analyze_loop_form (struct loop *loo
*** 2657,2687 ****
loop_vec_info loop_vinfo;
tree loop_cond;
tree number_of_iterations = NULL;
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "=== vect_analyze_loop_form ===");
! if (loop->inner)
{
! if (vect_print_dump_info (REPORT_OUTER_LOOPS))
! fprintf (vect_dump, "not vectorized: nested loop.");
return NULL;
}
if (!single_exit (loop)
- || loop->num_nodes != 2
|| EDGE_COUNT (loop->header->preds) != 2)
{
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
{
if (!single_exit (loop))
fprintf (vect_dump, "not vectorized: multiple exits.");
- else if (loop->num_nodes != 2)
- fprintf (vect_dump, "not vectorized: too many BBs in loop.");
else if (EDGE_COUNT (loop->header->preds) != 2)
fprintf (vect_dump, "not vectorized: too many incoming edges.");
}
!
return NULL;
}
--- 2822,2945 ----
loop_vec_info loop_vinfo;
tree loop_cond;
tree number_of_iterations = NULL;
+ loop_vec_info inner_loop_vinfo = NULL;
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "=== vect_analyze_loop_form ===");
! /* Different restrictions apply when we are considering an inner-most loop,
! vs. an outer (nested) loop.
! (FORNOW. May want to relax some of these restrictions in the future). */
!
! if (!loop->inner)
! {
! /* Inner-most loop. We currently require that the number of BBs is
! exactly 2 (the header and latch). Vectorizable inner-most loops
! look like this:
!
! (pre-header)
! |
! header <--------+
! | | |
! | +--> latch --+
! |
! (exit-bb) */
!
! if (loop->num_nodes != 2)
! {
! if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
! fprintf (vect_dump, "not vectorized: too many BBs in loop.");
! return NULL;
! }
!
! if (empty_block_p (loop->header))
{
! if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
! fprintf (vect_dump, "not vectorized: empty loop.");
return NULL;
}
+ }
+ else
+ {
+ struct loop *innerloop = loop->inner;
+ edge backedge, entryedge;
+
+ /* Nested loop. We currently require that the loop is doubly-nested,
+ contains a single inner loop, and the number of BBs is exactly 5.
+ Vectorizable outer-loops look like this:
+
+ (pre-header)
+ |
+ header <---+
+ | |
+ inner-loop |
+ | |
+ tail ------+
+ |
+ (exit-bb)
+
+ The inner-loop has the properties expected of inner-most loops
+ as described above. */
+
+ if ((loop->inner)->inner || (loop->inner)->next)
+ {
+ if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
+ fprintf (vect_dump, "not vectorized: multiple nested loops.");
+ return NULL;
+ }
+
+ /* Analyze the inner-loop. */
+ inner_loop_vinfo = vect_analyze_loop_1 (loop->inner);
+ if (!inner_loop_vinfo)
+ {
+ if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
+ fprintf (vect_dump, "not vectorized: Bad inner loop.");
+ return NULL;
+ }
+
+ if (loop->num_nodes != 5)
+ {
+ if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
+ fprintf (vect_dump, "not vectorized: too many BBs in loop.");
+ destroy_loop_vec_info (inner_loop_vinfo, true);
+ return NULL;
+ }
+
+ gcc_assert (EDGE_COUNT (innerloop->header->preds) == 2);
+ backedge = EDGE_PRED (innerloop->header, 1);
+ entryedge = EDGE_PRED (innerloop->header, 0);
+ if (EDGE_PRED (innerloop->header, 0)->src == innerloop->latch)
+ {
+ backedge = EDGE_PRED (innerloop->header, 0);
+ entryedge = EDGE_PRED (innerloop->header, 1);
+ }
+
+ if (entryedge->src != loop->header
+ || !single_exit (innerloop)
+ || single_exit (innerloop)->dest != EDGE_PRED (loop->latch, 0)->src)
+ {
+ if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
+ fprintf (vect_dump, "not vectorized: unsupported outerloop form.");
+ destroy_loop_vec_info (inner_loop_vinfo, true);
+ return NULL;
+ }
+
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "Considering outer-loop vectorization.");
+ }
if (!single_exit (loop)
|| EDGE_COUNT (loop->header->preds) != 2)
{
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
{
if (!single_exit (loop))
fprintf (vect_dump, "not vectorized: multiple exits.");
else if (EDGE_COUNT (loop->header->preds) != 2)
fprintf (vect_dump, "not vectorized: too many incoming edges.");
}
! if (inner_loop_vinfo)
! destroy_loop_vec_info (inner_loop_vinfo, true);
return NULL;
}
*************** vect_analyze_loop_form (struct loop *loo
*** 2694,2699 ****
--- 2952,2959 ----
{
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
fprintf (vect_dump, "not vectorized: unexpected loop form.");
+ if (inner_loop_vinfo)
+ destroy_loop_vec_info (inner_loop_vinfo, true);
return NULL;
}
*************** vect_analyze_loop_form (struct loop *loo
*** 2711,2732 ****
{
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
fprintf (vect_dump, "not vectorized: abnormal loop exit edge.");
return NULL;
}
}
- if (empty_block_p (loop->header))
- {
- if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
- fprintf (vect_dump, "not vectorized: empty loop.");
- return NULL;
- }
-
loop_cond = vect_get_loop_niters (loop, &number_of_iterations);
if (!loop_cond)
{
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
fprintf (vect_dump, "not vectorized: complicated exit condition.");
return NULL;
}
--- 2971,2989 ----
{
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
fprintf (vect_dump, "not vectorized: abnormal loop exit edge.");
+ if (inner_loop_vinfo)
+ destroy_loop_vec_info (inner_loop_vinfo, true);
return NULL;
}
}
loop_cond = vect_get_loop_niters (loop, &number_of_iterations);
if (!loop_cond)
{
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
fprintf (vect_dump, "not vectorized: complicated exit condition.");
+ if (inner_loop_vinfo)
+ destroy_loop_vec_info (inner_loop_vinfo, true);
return NULL;
}
*************** vect_analyze_loop_form (struct loop *loo
*** 2735,2740 ****
--- 2992,2999 ----
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
fprintf (vect_dump,
"not vectorized: number of iterations cannot be computed.");
+ if (inner_loop_vinfo)
+ destroy_loop_vec_info (inner_loop_vinfo, true);
return NULL;
}
*************** vect_analyze_loop_form (struct loop *loo
*** 2742,2748 ****
{
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
fprintf (vect_dump, "Infinite number of iterations.");
! return false;
}
if (!NITERS_KNOWN_P (number_of_iterations))
--- 3001,3009 ----
{
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
fprintf (vect_dump, "Infinite number of iterations.");
! if (inner_loop_vinfo)
! destroy_loop_vec_info (inner_loop_vinfo, true);
! return NULL;
}
if (!NITERS_KNOWN_P (number_of_iterations))
*************** vect_analyze_loop_form (struct loop *loo
*** 2757,2768 ****
{
if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
fprintf (vect_dump, "not vectorized: number of iterations = 0.");
return NULL;
}
loop_vinfo = new_loop_vec_info (loop);
LOOP_VINFO_NITERS (loop_vinfo) = number_of_iterations;
! LOOP_VINFO_EXIT_COND (loop_vinfo) = loop_cond;
gcc_assert (!loop->aux);
loop->aux = loop_vinfo;
--- 3018,3036 ----
{
if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
fprintf (vect_dump, "not vectorized: number of iterations = 0.");
+ if (inner_loop_vinfo)
+ destroy_loop_vec_info (inner_loop_vinfo, false);
return NULL;
}
loop_vinfo = new_loop_vec_info (loop);
LOOP_VINFO_NITERS (loop_vinfo) = number_of_iterations;
!
! STMT_VINFO_TYPE (vinfo_for_stmt (loop_cond)) = loop_exit_ctrl_vec_info_type;
!
! /* CHECKME: May want to keep it around it in the future. */
! if (inner_loop_vinfo)
! destroy_loop_vec_info (inner_loop_vinfo, false);
gcc_assert (!loop->aux);
loop->aux = loop_vinfo;
*************** vect_analyze_loop (struct loop *loop)
*** 2784,2789 ****
--- 3052,3066 ----
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "===== analyze_loop_nest =====");
+ if (loop_outer (loop)
+ && loop_vec_info_for_loop (loop_outer (loop))
+ && LOOP_VINFO_VECTORIZABLE_P (loop_vec_info_for_loop (loop_outer (loop))))
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "outer-loop already vectorized.");
+ return NULL;
+ }
+
/* Check the CFG characteristics of the loop (nesting, entry/exit, etc. */
loop_vinfo = vect_analyze_loop_form (loop);
*************** vect_analyze_loop (struct loop *loop)
*** 2805,2811 ****
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad data references.");
! destroy_loop_vec_info (loop_vinfo);
return NULL;
}
--- 3082,3088 ----
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad data references.");
! destroy_loop_vec_info (loop_vinfo, true);
return NULL;
}
*************** vect_analyze_loop (struct loop *loop)
*** 2823,2829 ****
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "unexpected pattern.");
! destroy_loop_vec_info (loop_vinfo);
return NULL;
}
--- 3100,3106 ----
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "unexpected pattern.");
! destroy_loop_vec_info (loop_vinfo, true);
return NULL;
}
*************** vect_analyze_loop (struct loop *loop)
*** 2835,2841 ****
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad data alignment.");
! destroy_loop_vec_info (loop_vinfo);
return NULL;
}
--- 3112,3118 ----
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad data alignment.");
! destroy_loop_vec_info (loop_vinfo, true);
return NULL;
}
*************** vect_analyze_loop (struct loop *loop)
*** 2844,2850 ****
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "can't determine vectorization factor.");
! destroy_loop_vec_info (loop_vinfo);
return NULL;
}
--- 3121,3127 ----
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "can't determine vectorization factor.");
! destroy_loop_vec_info (loop_vinfo, true);
return NULL;
}
*************** vect_analyze_loop (struct loop *loop)
*** 2856,2862 ****
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad data dependence.");
! destroy_loop_vec_info (loop_vinfo);
return NULL;
}
--- 3133,3139 ----
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad data dependence.");
! destroy_loop_vec_info (loop_vinfo, true);
return NULL;
}
*************** vect_analyze_loop (struct loop *loop)
*** 2868,2874 ****
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad data access.");
! destroy_loop_vec_info (loop_vinfo);
return NULL;
}
--- 3145,3151 ----
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad data access.");
! destroy_loop_vec_info (loop_vinfo, true);
return NULL;
}
*************** vect_analyze_loop (struct loop *loop)
*** 2880,2886 ****
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad data alignment.");
! destroy_loop_vec_info (loop_vinfo);
return NULL;
}
--- 3157,3163 ----
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad data alignment.");
! destroy_loop_vec_info (loop_vinfo, true);
return NULL;
}
*************** vect_analyze_loop (struct loop *loop)
*** 2892,2898 ****
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad operation or unsupported loop bound.");
! destroy_loop_vec_info (loop_vinfo);
return NULL;
}
--- 3169,3175 ----
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad operation or unsupported loop bound.");
! destroy_loop_vec_info (loop_vinfo, true);
return NULL;
}
Index: tree-vect-patterns.c
===================================================================
*** tree-vect-patterns.c (revision 127202)
--- tree-vect-patterns.c (working copy)
*************** widened_name_p (tree name, tree use_stmt
*** 148,154 ****
* Return value: A new stmt that will be used to replace the sequence of
stmts that constitute the pattern. In this case it will be:
WIDEN_DOT_PRODUCT <x_t, y_t, sum_0>
! */
static tree
vect_recog_dot_prod_pattern (tree last_stmt, tree *type_in, tree *type_out)
--- 148,161 ----
* Return value: A new stmt that will be used to replace the sequence of
stmts that constitute the pattern. In this case it will be:
WIDEN_DOT_PRODUCT <x_t, y_t, sum_0>
!
! Note: The dot-prod idiom is a widening reduction pattern that is
! vectorized without preserving all the intermediate results. It
! produces only N/2 (widened) results (by summing up pairs of
! intermediate results) rather than all N results. Therefore, we
! cannot allow this pattern when we want to get all the results and in
! the correct order (as is the case when this computation is in an
! inner-loop nested in an outer-loop that us being vectorized). */
static tree
vect_recog_dot_prod_pattern (tree last_stmt, tree *type_in, tree *type_out)
*************** vect_recog_dot_prod_pattern (tree last_s
*** 160,165 ****
--- 167,174 ----
tree type, half_type;
tree pattern_expr;
tree prod_type;
+ loop_vec_info loop_info = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
+ struct loop *loop = LOOP_VINFO_LOOP (loop_info);
if (TREE_CODE (last_stmt) != GIMPLE_MODIFY_STMT)
return NULL;
*************** vect_recog_dot_prod_pattern (tree last_s
*** 242,247 ****
--- 251,260 ----
gcc_assert (stmt_vinfo);
if (STMT_VINFO_DEF_TYPE (stmt_vinfo) != vect_loop_def)
return NULL;
+ /* FORNOW. Can continue analyzing the def-use chain when this stmt in a phi
+ inside the loop (in case we are analyzing an outer-loop). */
+ if (TREE_CODE (stmt) != GIMPLE_MODIFY_STMT)
+ return NULL;
expr = GIMPLE_STMT_OPERAND (stmt, 1);
if (TREE_CODE (expr) != MULT_EXPR)
return NULL;
*************** vect_recog_dot_prod_pattern (tree last_s
*** 295,300 ****
--- 308,323 ----
fprintf (vect_dump, "vect_recog_dot_prod_pattern: detected: ");
print_generic_expr (vect_dump, pattern_expr, TDF_SLIM);
}
+
+ /* We don't allow changing the order of the computation in the inner-loop
+ when doing outer-loop vectorization. */
+ if (nested_in_vect_loop_p (loop, last_stmt))
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "vect_recog_dot_prod_pattern: not allowed.");
+ return NULL;
+ }
+
return pattern_expr;
}
*************** vect_recog_pow_pattern (tree last_stmt,
*** 521,527 ****
* Return value: A new stmt that will be used to replace the sequence of
stmts that constitute the pattern. In this case it will be:
WIDEN_SUM <x_t, sum_0>
! */
static tree
vect_recog_widen_sum_pattern (tree last_stmt, tree *type_in, tree *type_out)
--- 544,557 ----
* Return value: A new stmt that will be used to replace the sequence of
stmts that constitute the pattern. In this case it will be:
WIDEN_SUM <x_t, sum_0>
!
! Note: The widneing-sum idiom is a widening reduction pattern that is
! vectorized without preserving all the intermediate results. It
! produces only N/2 (widened) results (by summing up pairs of
! intermediate results) rather than all N results. Therefore, we
! cannot allow this pattern when we want to get all the results and in
! the correct order (as is the case when this computation is in an
! inner-loop nested in an outer-loop that us being vectorized). */
static tree
vect_recog_widen_sum_pattern (tree last_stmt, tree *type_in, tree *type_out)
*************** vect_recog_widen_sum_pattern (tree last_
*** 531,536 ****
--- 561,568 ----
stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
tree type, half_type;
tree pattern_expr;
+ loop_vec_info loop_info = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
+ struct loop *loop = LOOP_VINFO_LOOP (loop_info);
if (TREE_CODE (last_stmt) != GIMPLE_MODIFY_STMT)
return NULL;
*************** vect_recog_widen_sum_pattern (tree last_
*** 580,585 ****
--- 612,627 ----
fprintf (vect_dump, "vect_recog_widen_sum_pattern: detected: ");
print_generic_expr (vect_dump, pattern_expr, TDF_SLIM);
}
+
+ /* We don't allow changing the order of the computation in the inner-loop
+ when doing outer-loop vectorization. */
+ if (nested_in_vect_loop_p (loop, last_stmt))
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "vect_recog_widen_sum_pattern: not allowed.");
+ return NULL;
+ }
+
return pattern_expr;
}
Index: tree-vect-transform.c
===================================================================
*** tree-vect-transform.c (revision 127202)
--- tree-vect-transform.c (working copy)
*************** vect_estimate_min_profitable_iters (loop
*** 125,130 ****
--- 125,131 ----
basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
int nbbs = loop->num_nodes;
int byte_misalign;
+ int innerloop_iters, factor;
/* Cost model disabled. */
if (!flag_vect_cost_model)
*************** vect_estimate_min_profitable_iters (loop
*** 153,163 ****
--- 154,173 ----
TODO: Consider assigning different costs to different scalar
statements. */
+ /* FORNOW. */
+ if (loop->inner)
+ innerloop_iters = 50; /* FIXME */
+
for (i = 0; i < nbbs; i++)
{
block_stmt_iterator si;
basic_block bb = bbs[i];
+ if (bb->loop_father == loop->inner)
+ factor = innerloop_iters;
+ else
+ factor = 1;
+
for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
{
tree stmt = bsi_stmt (si);
*************** vect_estimate_min_profitable_iters (loop
*** 165,172 ****
if (!STMT_VINFO_RELEVANT_P (stmt_info)
&& !STMT_VINFO_LIVE_P (stmt_info))
continue;
! scalar_single_iter_cost += cost_for_stmt (stmt);
! vec_inside_cost += STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info);
vec_outside_cost += STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info);
}
}
--- 175,184 ----
if (!STMT_VINFO_RELEVANT_P (stmt_info)
&& !STMT_VINFO_LIVE_P (stmt_info))
continue;
! scalar_single_iter_cost += cost_for_stmt (stmt) * factor;
! vec_inside_cost += STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) * factor;
! /* FIXME: for stmts in the inner-loop in outer-loop vectorization,
! some of the "outside" costs are generated inside the outer-loop. */
vec_outside_cost += STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info);
}
}
*************** vect_init_vector (tree stmt, tree vector
*** 1072,1077 ****
--- 1084,1092 ----
tree new_temp;
basic_block new_bb;
+ if (nested_in_vect_loop_p (loop, stmt))
+ loop = loop->inner;
+
new_var = vect_get_new_vect_var (vector_type, vect_simple_var, "cst_");
add_referenced_var (new_var);
*************** vect_init_vector (tree stmt, tree vector
*** 1097,1102 ****
--- 1112,1118 ----
/* Function get_initial_def_for_induction
Input:
+ STMT - a stmt that performs an induction operation in the loop.
IV_PHI - the initial value of the induction variable
Output:
*************** get_initial_def_for_induction (tree iv_p
*** 1115,1122 ****
tree vectype = get_vectype_for_scalar_type (scalar_type);
int nunits = TYPE_VECTOR_SUBPARTS (vectype);
edge pe = loop_preheader_edge (loop);
basic_block new_bb;
- block_stmt_iterator bsi;
tree vec, vec_init, vec_step, t;
tree access_fn;
tree new_var;
--- 1131,1138 ----
tree vectype = get_vectype_for_scalar_type (scalar_type);
int nunits = TYPE_VECTOR_SUBPARTS (vectype);
edge pe = loop_preheader_edge (loop);
+ struct loop *iv_loop;
basic_block new_bb;
tree vec, vec_init, vec_step, t;
tree access_fn;
tree new_var;
*************** get_initial_def_for_induction (tree iv_p
*** 1130,1137 ****
int ncopies = vf / nunits;
tree expr;
stmt_vec_info phi_info = vinfo_for_stmt (iv_phi);
tree stmts;
! tree stmt = NULL_TREE;
block_stmt_iterator si;
basic_block bb = bb_for_stmt (iv_phi);
--- 1146,1158 ----
int ncopies = vf / nunits;
tree expr;
stmt_vec_info phi_info = vinfo_for_stmt (iv_phi);
+ bool nested_in_vect_loop = false;
tree stmts;
! imm_use_iterator imm_iter;
! use_operand_p use_p;
! tree exit_phi;
! edge latch_e;
! tree loop_arg;
block_stmt_iterator si;
basic_block bb = bb_for_stmt (iv_phi);
*************** get_initial_def_for_induction (tree iv_p
*** 1140,1204 ****
/* Find the first insertion point in the BB. */
si = bsi_after_labels (bb);
- stmt = bsi_stmt (si);
! access_fn = analyze_scalar_evolution (loop, PHI_RESULT (iv_phi));
gcc_assert (access_fn);
! ok = vect_is_simple_iv_evolution (loop->num, access_fn,
! &init_expr, &step_expr);
gcc_assert (ok);
/* Create the vector that holds the initial_value of the induction. */
! new_var = vect_get_new_vect_var (scalar_type, vect_scalar_var, "var_");
! add_referenced_var (new_var);
!
! new_name = force_gimple_operand (init_expr, &stmts, false, new_var);
! if (stmts)
{
! new_bb = bsi_insert_on_edge_immediate (pe, stmts);
! gcc_assert (!new_bb);
}
!
! t = NULL_TREE;
! t = tree_cons (NULL_TREE, new_name, t);
! for (i = 1; i < nunits; i++)
{
! tree tmp;
! /* Create: new_name = new_name + step_expr */
! tmp = fold_build2 (PLUS_EXPR, scalar_type, new_name, step_expr);
! init_stmt = build_gimple_modify_stmt (new_var, tmp);
! new_name = make_ssa_name (new_var, init_stmt);
! GIMPLE_STMT_OPERAND (init_stmt, 0) = new_name;
! new_bb = bsi_insert_on_edge_immediate (pe, init_stmt);
! gcc_assert (!new_bb);
! if (vect_print_dump_info (REPORT_DETAILS))
! {
! fprintf (vect_dump, "created new init_stmt: ");
! print_generic_expr (vect_dump, init_stmt, TDF_SLIM);
! }
! t = tree_cons (NULL_TREE, new_name, t);
}
- vec = build_constructor_from_list (vectype, nreverse (t));
- vec_init = vect_init_vector (stmt, vec, vectype);
/* Create the vector that holds the step of the induction. */
! expr = build_int_cst (scalar_type, vf);
! new_name = fold_build2 (MULT_EXPR, scalar_type, expr, step_expr);
t = NULL_TREE;
for (i = 0; i < nunits; i++)
t = tree_cons (NULL_TREE, unshare_expr (new_name), t);
vec = build_constructor_from_list (vectype, t);
! vec_step = vect_init_vector (stmt, vec, vectype);
/* Create the following def-use cycle:
loop prolog:
! vec_init = [X, X+S, X+2*S, X+3*S]
! vec_step = [VF*S, VF*S, VF*S, VF*S]
loop:
vec_iv = PHI <vec_init, vec_loop>
...
--- 1161,1267 ----
/* Find the first insertion point in the BB. */
si = bsi_after_labels (bb);
! if (INTEGRAL_TYPE_P (scalar_type))
! step_expr = build_int_cst (scalar_type, 0);
! else
! step_expr = build_real (scalar_type, dconst0);
!
! /* Is phi in an inner-loop, while vectorizing an enclosing outer-loop? */
! if (nested_in_vect_loop_p (loop, iv_phi))
! {
! nested_in_vect_loop = true;
! iv_loop = loop->inner;
! }
! else
! iv_loop = loop;
! gcc_assert (iv_loop == (bb_for_stmt (iv_phi))->loop_father);
!
! latch_e = loop_latch_edge (iv_loop);
! loop_arg = PHI_ARG_DEF_FROM_EDGE (iv_phi, latch_e);
!
! access_fn = analyze_scalar_evolution (iv_loop, PHI_RESULT (iv_phi));
gcc_assert (access_fn);
! ok = vect_is_simple_iv_evolution (iv_loop->num, access_fn,
! &init_expr, &step_expr);
gcc_assert (ok);
+ pe = loop_preheader_edge (iv_loop);
/* Create the vector that holds the initial_value of the induction. */
! if (nested_in_vect_loop)
{
! /* iv_loop is nested in the loop to be vectorized. init_expr had already
! been created during vectorization of previous stmts; We obtain it from
! the STMT_VINFO_VEC_STMT of the defining stmt. */
! tree iv_def = PHI_ARG_DEF_FROM_EDGE (iv_phi, loop_preheader_edge (iv_loop));
! vec_init = vect_get_vec_def_for_operand (iv_def, iv_phi, NULL);
}
! else
{
! /* iv_loop is the loop to be vectorized. Create:
! vec_init = [X, X+S, X+2*S, X+3*S] (S = step_expr, X = init_expr) */
! new_var = vect_get_new_vect_var (scalar_type, vect_scalar_var, "var_");
! add_referenced_var (new_var);
!
! new_name = force_gimple_operand (init_expr, &stmts, false, new_var);
! if (stmts)
! {
! new_bb = bsi_insert_on_edge_immediate (pe, stmts);
! gcc_assert (!new_bb);
! }
!
! t = NULL_TREE;
! t = tree_cons (NULL_TREE, init_expr, t);
! for (i = 1; i < nunits; i++)
! {
! tree tmp;
! /* Create: new_name_i = new_name + step_expr */
! tmp = fold_build2 (PLUS_EXPR, scalar_type, new_name, step_expr);
! init_stmt = build_gimple_modify_stmt (new_var, tmp);
! new_name = make_ssa_name (new_var, init_stmt);
! GIMPLE_STMT_OPERAND (init_stmt, 0) = new_name;
! new_bb = bsi_insert_on_edge_immediate (pe, init_stmt);
! gcc_assert (!new_bb);
! if (vect_print_dump_info (REPORT_DETAILS))
! {
! fprintf (vect_dump, "created new init_stmt: ");
! print_generic_expr (vect_dump, init_stmt, TDF_SLIM);
! }
! t = tree_cons (NULL_TREE, new_name, t);
! }
! /* Create a vector from [new_name_0, new_name_1, ..., new_name_nunits-1] */
! vec = build_constructor_from_list (vectype, nreverse (t));
! vec_init = vect_init_vector (iv_phi, vec, vectype);
}
/* Create the vector that holds the step of the induction. */
! if (nested_in_vect_loop)
! /* iv_loop is nested in the loop to be vectorized. Generate:
! vec_step = [S, S, S, S] */
! new_name = step_expr;
! else
! {
! /* iv_loop is the loop to be vectorized. Generate:
! vec_step = [VF*S, VF*S, VF*S, VF*S] */
! expr = build_int_cst (scalar_type, vf);
! new_name = fold_build2 (MULT_EXPR, scalar_type, expr, step_expr);
! }
!
t = NULL_TREE;
for (i = 0; i < nunits; i++)
t = tree_cons (NULL_TREE, unshare_expr (new_name), t);
vec = build_constructor_from_list (vectype, t);
! vec_step = vect_init_vector (iv_phi, vec, vectype);
/* Create the following def-use cycle:
loop prolog:
! vec_init = ...
! vec_step = ...
loop:
vec_iv = PHI <vec_init, vec_loop>
...
*************** get_initial_def_for_induction (tree iv_p
*** 1209,1215 ****
/* Create the induction-phi that defines the induction-operand. */
vec_dest = vect_get_new_vect_var (vectype, vect_simple_var, "vec_iv_");
add_referenced_var (vec_dest);
! induction_phi = create_phi_node (vec_dest, loop->header);
set_stmt_info (get_stmt_ann (induction_phi),
new_stmt_vec_info (induction_phi, loop_vinfo));
induc_def = PHI_RESULT (induction_phi);
--- 1272,1278 ----
/* Create the induction-phi that defines the induction-operand. */
vec_dest = vect_get_new_vect_var (vectype, vect_simple_var, "vec_iv_");
add_referenced_var (vec_dest);
! induction_phi = create_phi_node (vec_dest, iv_loop->header);
set_stmt_info (get_stmt_ann (induction_phi),
new_stmt_vec_info (induction_phi, loop_vinfo));
induc_def = PHI_RESULT (induction_phi);
*************** get_initial_def_for_induction (tree iv_p
*** 1220,1234 ****
induc_def, vec_step));
vec_def = make_ssa_name (vec_dest, new_stmt);
GIMPLE_STMT_OPERAND (new_stmt, 0) = vec_def;
! bsi = bsi_for_stmt (stmt);
! vect_finish_stmt_generation (stmt, new_stmt, &bsi);
/* Set the arguments of the phi node: */
! add_phi_arg (induction_phi, vec_init, loop_preheader_edge (loop));
! add_phi_arg (induction_phi, vec_def, loop_latch_edge (loop));
! /* In case the vectorization factor (VF) is bigger than the number
of elements that we can fit in a vectype (nunits), we have to generate
more than one vector stmt - i.e - we need to "unroll" the
vector stmt by a factor VF/nunits. For more details see documentation
--- 1283,1298 ----
induc_def, vec_step));
vec_def = make_ssa_name (vec_dest, new_stmt);
GIMPLE_STMT_OPERAND (new_stmt, 0) = vec_def;
! bsi_insert_before (&si, new_stmt, BSI_SAME_STMT);
! set_stmt_info (get_stmt_ann (new_stmt),
! new_stmt_vec_info (new_stmt, loop_vinfo));
/* Set the arguments of the phi node: */
! add_phi_arg (induction_phi, vec_init, pe);
! add_phi_arg (induction_phi, vec_def, loop_latch_edge (iv_loop));
! /* In case that vectorization factor (VF) is bigger than the number
of elements that we can fit in a vectype (nunits), we have to generate
more than one vector stmt - i.e - we need to "unroll" the
vector stmt by a factor VF/nunits. For more details see documentation
*************** get_initial_def_for_induction (tree iv_p
*** 1237,1242 ****
--- 1301,1308 ----
if (ncopies > 1)
{
stmt_vec_info prev_stmt_vinfo;
+ /* FORNOW. This restriction should be relaxed. */
+ gcc_assert (!nested_in_vect_loop);
/* Create the vector that holds the step of the induction. */
expr = build_int_cst (scalar_type, nunits);
*************** get_initial_def_for_induction (tree iv_p
*** 1245,1251 ****
for (i = 0; i < nunits; i++)
t = tree_cons (NULL_TREE, unshare_expr (new_name), t);
vec = build_constructor_from_list (vectype, t);
! vec_step = vect_init_vector (stmt, vec, vectype);
vec_def = induc_def;
prev_stmt_vinfo = vinfo_for_stmt (induction_phi);
--- 1311,1317 ----
for (i = 0; i < nunits; i++)
t = tree_cons (NULL_TREE, unshare_expr (new_name), t);
vec = build_constructor_from_list (vectype, t);
! vec_step = vect_init_vector (iv_phi, vec, vectype);
vec_def = induc_def;
prev_stmt_vinfo = vinfo_for_stmt (induction_phi);
*************** get_initial_def_for_induction (tree iv_p
*** 1253,1271 ****
{
tree tmp;
! /* vec_i = vec_prev + vec_{step*nunits} */
tmp = build2 (PLUS_EXPR, vectype, vec_def, vec_step);
new_stmt = build_gimple_modify_stmt (NULL_TREE, tmp);
vec_def = make_ssa_name (vec_dest, new_stmt);
GIMPLE_STMT_OPERAND (new_stmt, 0) = vec_def;
! bsi = bsi_for_stmt (stmt);
! vect_finish_stmt_generation (stmt, new_stmt, &bsi);
!
STMT_VINFO_RELATED_STMT (prev_stmt_vinfo) = new_stmt;
prev_stmt_vinfo = vinfo_for_stmt (new_stmt);
}
}
if (vect_print_dump_info (REPORT_DETAILS))
{
fprintf (vect_dump, "transform induction: created def-use cycle:");
--- 1319,1368 ----
{
tree tmp;
! /* vec_i = vec_prev + vec_step */
tmp = build2 (PLUS_EXPR, vectype, vec_def, vec_step);
new_stmt = build_gimple_modify_stmt (NULL_TREE, tmp);
vec_def = make_ssa_name (vec_dest, new_stmt);
GIMPLE_STMT_OPERAND (new_stmt, 0) = vec_def;
! bsi_insert_before (&si, new_stmt, BSI_SAME_STMT);
! set_stmt_info (get_stmt_ann (new_stmt),
! new_stmt_vec_info (new_stmt, loop_vinfo));
STMT_VINFO_RELATED_STMT (prev_stmt_vinfo) = new_stmt;
prev_stmt_vinfo = vinfo_for_stmt (new_stmt);
}
}
+ if (nested_in_vect_loop)
+ {
+ /* Find the loop-closed exit-phi of the induction, and record
+ the final vector of induction results: */
+ exit_phi = NULL;
+ FOR_EACH_IMM_USE_FAST (use_p, imm_iter, loop_arg)
+ {
+ if (!flow_bb_inside_loop_p (iv_loop, bb_for_stmt (USE_STMT (use_p))))
+ {
+ exit_phi = USE_STMT (use_p);
+ break;
+ }
+ }
+ if (exit_phi)
+ {
+ stmt_vec_info stmt_vinfo = vinfo_for_stmt (exit_phi);
+ /* FORNOW. Currently not supporting the case that an inner-loop induction
+ is not used in the outer-loop (i.e. only outside the outer-loop). */
+ gcc_assert (STMT_VINFO_RELEVANT_P (stmt_vinfo)
+ && !STMT_VINFO_LIVE_P (stmt_vinfo));
+
+ STMT_VINFO_VEC_STMT (stmt_vinfo) = new_stmt;
+ if (vect_print_dump_info (REPORT_DETAILS))
+ {
+ fprintf (vect_dump, "vector of inductions after inner-loop:");
+ print_generic_expr (vect_dump, new_stmt, TDF_SLIM);
+ }
+ }
+ }
+
+
if (vect_print_dump_info (REPORT_DETAILS))
{
fprintf (vect_dump, "transform induction: created def-use cycle:");
*************** vect_get_vec_def_for_operand (tree op, t
*** 1301,1307 ****
tree vectype = STMT_VINFO_VECTYPE (stmt_vinfo);
int nunits = TYPE_VECTOR_SUBPARTS (vectype);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
- struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
tree vec_inv;
tree vec_cst;
tree t = NULL_TREE;
--- 1398,1403 ----
*************** vect_get_vec_def_for_operand (tree op, t
*** 1387,1400 ****
def_stmt_info = vinfo_for_stmt (def_stmt);
vec_stmt = STMT_VINFO_VEC_STMT (def_stmt_info);
gcc_assert (vec_stmt);
! vec_oprnd = GIMPLE_STMT_OPERAND (vec_stmt, 0);
return vec_oprnd;
}
/* Case 4: operand is defined by a loop header phi - reduction */
case vect_reduction_def:
{
gcc_assert (TREE_CODE (def_stmt) == PHI_NODE);
/* Get the def before the loop */
op = PHI_ARG_DEF_FROM_EDGE (def_stmt, loop_preheader_edge (loop));
--- 1483,1502 ----
def_stmt_info = vinfo_for_stmt (def_stmt);
vec_stmt = STMT_VINFO_VEC_STMT (def_stmt_info);
gcc_assert (vec_stmt);
! if (TREE_CODE (vec_stmt) == PHI_NODE)
! vec_oprnd = PHI_RESULT (vec_stmt);
! else
! vec_oprnd = GIMPLE_STMT_OPERAND (vec_stmt, 0);
return vec_oprnd;
}
/* Case 4: operand is defined by a loop header phi - reduction */
case vect_reduction_def:
{
+ struct loop *loop;
+
gcc_assert (TREE_CODE (def_stmt) == PHI_NODE);
+ loop = (bb_for_stmt (def_stmt))->loop_father;
/* Get the def before the loop */
op = PHI_ARG_DEF_FROM_EDGE (def_stmt, loop_preheader_edge (loop));
*************** vect_get_vec_def_for_operand (tree op, t
*** 1406,1413 ****
{
gcc_assert (TREE_CODE (def_stmt) == PHI_NODE);
! /* Get the def before the loop */
! return get_initial_def_for_induction (def_stmt);
}
default:
--- 1508,1519 ----
{
gcc_assert (TREE_CODE (def_stmt) == PHI_NODE);
! /* Get the def from the vectorized stmt. */
! def_stmt_info = vinfo_for_stmt (def_stmt);
! vec_stmt = STMT_VINFO_VEC_STMT (def_stmt_info);
! gcc_assert (vec_stmt && (TREE_CODE (vec_stmt) == PHI_NODE));
! vec_oprnd = PHI_RESULT (vec_stmt);
! return vec_oprnd;
}
default:
*************** vect_get_vec_def_for_stmt_copy (enum vec
*** 1488,1494 ****
vec_stmt_for_operand = STMT_VINFO_RELATED_STMT (def_stmt_info);
gcc_assert (vec_stmt_for_operand);
vec_oprnd = GIMPLE_STMT_OPERAND (vec_stmt_for_operand, 0);
-
return vec_oprnd;
}
--- 1594,1599 ----
*************** vect_finish_stmt_generation (tree stmt,
*** 1504,1510 ****
--- 1609,1619 ----
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ gcc_assert (stmt == bsi_stmt (*bsi));
+ gcc_assert (TREE_CODE (stmt) != LABEL_EXPR);
+
bsi_insert_before (bsi, vec_stmt, BSI_SAME_STMT);
+
set_stmt_info (get_stmt_ann (vec_stmt),
new_stmt_vec_info (vec_stmt, loop_vinfo));
*************** static tree
*** 1572,1577 ****
--- 1681,1688 ----
get_initial_def_for_reduction (tree stmt, tree init_val, tree *adjustment_def)
{
stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt);
+ loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
tree vectype = STMT_VINFO_VECTYPE (stmt_vinfo);
int nunits = TYPE_VECTOR_SUBPARTS (vectype);
enum tree_code code = TREE_CODE (GIMPLE_STMT_OPERAND (stmt, 1));
*************** get_initial_def_for_reduction (tree stmt
*** 1582,1589 ****
--- 1693,1706 ----
tree t = NULL_TREE;
int i;
tree vector_type;
+ bool nested_in_vect_loop = false;
gcc_assert (INTEGRAL_TYPE_P (type) || SCALAR_FLOAT_TYPE_P (type));
+ if (nested_in_vect_loop_p (loop, stmt))
+ nested_in_vect_loop = true;
+ else
+ gcc_assert (loop == (bb_for_stmt (stmt))->loop_father);
+
vecdef = vect_get_vec_def_for_operand (init_val, stmt, NULL);
switch (code)
*************** get_initial_def_for_reduction (tree stmt
*** 1591,1597 ****
case WIDEN_SUM_EXPR:
case DOT_PROD_EXPR:
case PLUS_EXPR:
! *adjustment_def = init_val;
/* Create a vector of zeros for init_def. */
if (INTEGRAL_TYPE_P (type))
def_for_init = build_int_cst (type, 0);
--- 1708,1717 ----
case WIDEN_SUM_EXPR:
case DOT_PROD_EXPR:
case PLUS_EXPR:
! if (nested_in_vect_loop)
! *adjustment_def = vecdef;
! else
! *adjustment_def = init_val;
/* Create a vector of zeros for init_def. */
if (INTEGRAL_TYPE_P (type))
def_for_init = build_int_cst (type, 0);
*************** vect_create_epilog_for_reduction (tree v
*** 1680,1703 ****
tree new_phi;
block_stmt_iterator exit_bsi;
tree vec_dest;
! tree new_temp;
tree new_name;
! tree epilog_stmt;
! tree new_scalar_dest, exit_phi;
tree bitsize, bitpos, bytesize;
enum tree_code code = TREE_CODE (GIMPLE_STMT_OPERAND (stmt, 1));
! tree scalar_initial_def;
tree vec_initial_def;
tree orig_name;
imm_use_iterator imm_iter;
use_operand_p use_p;
! bool extract_scalar_result;
! tree reduction_op;
tree orig_stmt;
tree use_stmt;
tree operation = GIMPLE_STMT_OPERAND (stmt, 1);
int op_type;
op_type = TREE_OPERAND_LENGTH (operation);
reduction_op = TREE_OPERAND (operation, op_type-1);
vectype = get_vectype_for_scalar_type (TREE_TYPE (reduction_op));
--- 1800,1830 ----
tree new_phi;
block_stmt_iterator exit_bsi;
tree vec_dest;
! tree new_temp = NULL_TREE;
tree new_name;
! tree epilog_stmt = NULL_TREE;
! tree new_scalar_dest, exit_phi, new_dest;
tree bitsize, bitpos, bytesize;
enum tree_code code = TREE_CODE (GIMPLE_STMT_OPERAND (stmt, 1));
! tree adjustment_def;
tree vec_initial_def;
tree orig_name;
imm_use_iterator imm_iter;
use_operand_p use_p;
! bool extract_scalar_result = false;
! tree reduction_op, expr;
tree orig_stmt;
tree use_stmt;
tree operation = GIMPLE_STMT_OPERAND (stmt, 1);
+ bool nested_in_vect_loop = false;
int op_type;
+ if (nested_in_vect_loop_p (loop, stmt))
+ {
+ loop = loop->inner;
+ nested_in_vect_loop = true;
+ }
+
op_type = TREE_OPERAND_LENGTH (operation);
reduction_op = TREE_OPERAND (operation, op_type-1);
vectype = get_vectype_for_scalar_type (TREE_TYPE (reduction_op));
*************** vect_create_epilog_for_reduction (tree v
*** 1710,1716 ****
the scalar def before the loop, that defines the initial value
of the reduction variable. */
vec_initial_def = vect_get_vec_def_for_operand (reduction_op, stmt,
! &scalar_initial_def);
add_phi_arg (reduction_phi, vec_initial_def, loop_preheader_edge (loop));
/* 1.2 set the loop-latch arg for the reduction-phi: */
--- 1837,1843 ----
the scalar def before the loop, that defines the initial value
of the reduction variable. */
vec_initial_def = vect_get_vec_def_for_operand (reduction_op, stmt,
! &adjustment_def);
add_phi_arg (reduction_phi, vec_initial_def, loop_preheader_edge (loop));
/* 1.2 set the loop-latch arg for the reduction-phi: */
*************** vect_create_epilog_for_reduction (tree v
*** 1789,1794 ****
--- 1916,1930 ----
bitsize = TYPE_SIZE (scalar_type);
bytesize = TYPE_SIZE_UNIT (scalar_type);
+
+ /* In case this is a reduction in an inner-loop while vectorizing an outer
+ loop - we don't need to extract a single scalar result at the end of the
+ inner-loop. The final vector of partial results will be used in the
+ vectorized outer-loop, or reduced to a scalar result at the end of the
+ outer-loop. */
+ if (nested_in_vect_loop)
+ goto vect_finalize_reduction;
+
/* 2.3 Create the reduction code, using one of the three schemes described
above. */
*************** vect_create_epilog_for_reduction (tree v
*** 1935,1940 ****
--- 2071,2077 ----
{
tree rhs;
+ gcc_assert (!nested_in_vect_loop);
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "extract scalar result");
*************** vect_create_epilog_for_reduction (tree v
*** 1953,1977 ****
bsi_insert_before (&exit_bsi, epilog_stmt, BSI_SAME_STMT);
}
! /* 2.4 Adjust the final result by the initial value of the reduction
variable. (When such adjustment is not needed, then
! 'scalar_initial_def' is zero).
! Create:
! s_out4 = scalar_expr <s_out3, scalar_initial_def> */
!
! if (scalar_initial_def)
{
! tree tmp = build2 (code, scalar_type, new_temp, scalar_initial_def);
! epilog_stmt = build_gimple_modify_stmt (new_scalar_dest, tmp);
! new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
GIMPLE_STMT_OPERAND (epilog_stmt, 0) = new_temp;
bsi_insert_before (&exit_bsi, epilog_stmt, BSI_SAME_STMT);
}
- /* 2.6 Replace uses of s_out0 with uses of s_out3 */
! /* Find the loop-closed-use at the loop exit of the original scalar result.
(The reduction result is expected to have two immediate uses - one at the
latch block, and one at the loop exit). */
exit_phi = NULL;
--- 2090,2131 ----
bsi_insert_before (&exit_bsi, epilog_stmt, BSI_SAME_STMT);
}
! vect_finalize_reduction:
!
! /* 2.5 Adjust the final result by the initial value of the reduction
variable. (When such adjustment is not needed, then
! 'adjustment_def' is zero). For example, if code is PLUS we create:
! new_temp = loop_exit_def + adjustment_def */
! if (adjustment_def)
{
! if (nested_in_vect_loop)
! {
! gcc_assert (TREE_CODE (TREE_TYPE (adjustment_def)) == VECTOR_TYPE);
! expr = build2 (code, vectype, PHI_RESULT (new_phi), adjustment_def);
! new_dest = vect_create_destination_var (scalar_dest, vectype);
! }
! else
! {
! gcc_assert (TREE_CODE (TREE_TYPE (adjustment_def)) != VECTOR_TYPE);
! expr = build2 (code, scalar_type, new_temp, adjustment_def);
! new_dest = vect_create_destination_var (scalar_dest, scalar_type);
! }
! epilog_stmt = build_gimple_modify_stmt (new_dest, expr);
! new_temp = make_ssa_name (new_dest, epilog_stmt);
GIMPLE_STMT_OPERAND (epilog_stmt, 0) = new_temp;
+ #if 0
+ bsi_insert_after (&exit_bsi, epilog_stmt, BSI_NEW_STMT);
+ #else
bsi_insert_before (&exit_bsi, epilog_stmt, BSI_SAME_STMT);
+ #endif
}
! /* 2.6 Handle the loop-exit phi */
!
! /* Replace uses of s_out0 with uses of s_out3:
! Find the loop-closed-use at the loop exit of the original scalar result.
(The reduction result is expected to have two immediate uses - one at the
latch block, and one at the loop exit). */
exit_phi = NULL;
*************** vect_create_epilog_for_reduction (tree v
*** 1985,1990 ****
--- 2139,2167 ----
}
/* We expect to have found an exit_phi because of loop-closed-ssa form. */
gcc_assert (exit_phi);
+
+ if (nested_in_vect_loop)
+ {
+ stmt_vec_info stmt_vinfo = vinfo_for_stmt (exit_phi);
+
+ /* FORNOW. Currently not supporting the case that an inner-loop reduction
+ is not used in the outer-loop (but only outside the outer-loop). */
+ gcc_assert (STMT_VINFO_RELEVANT_P (stmt_vinfo)
+ && !STMT_VINFO_LIVE_P (stmt_vinfo));
+
+ epilog_stmt = adjustment_def ? epilog_stmt : new_phi;
+ STMT_VINFO_VEC_STMT (stmt_vinfo) = epilog_stmt;
+ set_stmt_info (get_stmt_ann (epilog_stmt),
+ new_stmt_vec_info (epilog_stmt, loop_vinfo));
+
+ if (vect_print_dump_info (REPORT_DETAILS))
+ {
+ fprintf (vect_dump, "vector of partial results after inner-loop:");
+ print_generic_expr (vect_dump, epilog_stmt, TDF_SLIM);
+ }
+ return;
+ }
+
/* Replace the uses: */
orig_name = PHI_RESULT (exit_phi);
FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, orig_name)
*************** vectorizable_reduction (tree stmt, block
*** 2066,2080 ****
tree new_stmt = NULL_TREE;
int j;
gcc_assert (ncopies >= 1);
/* 1. Is vectorizable reduction? */
/* Not supportable if the reduction variable is used in the loop. */
! if (STMT_VINFO_RELEVANT_P (stmt_info))
return false;
! if (!STMT_VINFO_LIVE_P (stmt_info))
return false;
/* Make sure it was already recognized as a reduction computation. */
--- 2243,2272 ----
tree new_stmt = NULL_TREE;
int j;
+ if (nested_in_vect_loop_p (loop, stmt))
+ {
+ loop = loop->inner;
+ /* FORNOW. This restriction should be relaxed. */
+ if (ncopies > 1)
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "multiple types in nested loop.");
+ return false;
+ }
+ }
+
gcc_assert (ncopies >= 1);
/* 1. Is vectorizable reduction? */
/* Not supportable if the reduction variable is used in the loop. */
! if (STMT_VINFO_RELEVANT (stmt_info) > vect_used_in_outer)
return false;
! /* Reductions that are not used even in an enclosing outer-loop,
! are expected to be "live" (used out of the loop). */
! if (STMT_VINFO_RELEVANT (stmt_info) == vect_unused_in_loop
! && !STMT_VINFO_LIVE_P (stmt_info))
return false;
/* Make sure it was already recognized as a reduction computation. */
*************** vectorizable_reduction (tree stmt, block
*** 2131,2139 ****
gcc_assert (dt == vect_reduction_def);
gcc_assert (TREE_CODE (def_stmt) == PHI_NODE);
if (orig_stmt)
! gcc_assert (orig_stmt == vect_is_simple_reduction (loop, def_stmt));
else
! gcc_assert (stmt == vect_is_simple_reduction (loop, def_stmt));
if (STMT_VINFO_LIVE_P (vinfo_for_stmt (def_stmt)))
return false;
--- 2323,2331 ----
gcc_assert (dt == vect_reduction_def);
gcc_assert (TREE_CODE (def_stmt) == PHI_NODE);
if (orig_stmt)
! gcc_assert (orig_stmt == vect_is_simple_reduction (loop_vinfo, def_stmt));
else
! gcc_assert (stmt == vect_is_simple_reduction (loop_vinfo, def_stmt));
if (STMT_VINFO_LIVE_P (vinfo_for_stmt (def_stmt)))
return false;
*************** vectorizable_call (tree stmt, block_stmt
*** 2358,2363 ****
--- 2550,2556 ----
int nunits_in;
int nunits_out;
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
tree fndecl, rhs, new_temp, def, def_stmt, rhs_type, lhs_type;
enum vect_def_type dt[2] = {vect_unknown_def_type, vect_unknown_def_type};
tree new_stmt;
*************** vectorizable_call (tree stmt, block_stmt
*** 2467,2472 ****
--- 2660,2673 ----
needs to be generated. */
gcc_assert (ncopies >= 1);
+ /* FORNOW. This restriction should be relaxed. */
+ if (nested_in_vect_loop_p (loop, stmt) && ncopies > 1)
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "multiple types in nested loop.");
+ return false;
+ }
+
if (!vec_stmt) /* transformation not required. */
{
STMT_VINFO_TYPE (stmt_info) = call_vec_info_type;
*************** vectorizable_call (tree stmt, block_stmt
*** 2481,2486 ****
--- 2682,2695 ----
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "transform operation.");
+ /* FORNOW. This restriction should be relaxed. */
+ if (nested_in_vect_loop_p (loop, stmt) && ncopies > 1)
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "multiple types in nested loop.");
+ return false;
+ }
+
/* Handle def. */
scalar_dest = GIMPLE_STMT_OPERAND (stmt, 0);
vec_dest = vect_create_destination_var (scalar_dest, vectype_out);
*************** vectorizable_conversion (tree stmt, bloc
*** 2672,2677 ****
--- 2881,2887 ----
tree vec_oprnd0 = NULL_TREE, vec_oprnd1 = NULL_TREE;
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
enum tree_code code, code1 = ERROR_MARK, code2 = ERROR_MARK;
tree decl1 = NULL_TREE, decl2 = NULL_TREE;
tree new_temp;
*************** vectorizable_conversion (tree stmt, bloc
*** 2753,2758 ****
--- 2963,2976 ----
needs to be generated. */
gcc_assert (ncopies >= 1);
+ /* FORNOW. This restriction should be relaxed. */
+ if (nested_in_vect_loop_p (loop, stmt) && ncopies > 1)
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "multiple types in nested loop.");
+ return false;
+ }
+
/* Check the operands of the operation. */
if (!vect_is_simple_use (op0, loop_vinfo, &def_stmt, &def, &dt0))
{
*************** vectorizable_operation (tree stmt, block
*** 3094,3099 ****
--- 3312,3318 ----
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
tree vectype = STMT_VINFO_VECTYPE (stmt_info);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
enum tree_code code;
enum machine_mode vec_mode;
tree new_temp;
*************** vectorizable_operation (tree stmt, block
*** 3112,3117 ****
--- 3331,3343 ----
int j;
gcc_assert (ncopies >= 1);
+ /* FORNOW. This restriction should be relaxed. */
+ if (nested_in_vect_loop_p (loop, stmt) && ncopies > 1)
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "multiple types in nested loop.");
+ return false;
+ }
if (!STMT_VINFO_RELEVANT_P (stmt_info))
return false;
*************** vectorizable_type_demotion (tree stmt, b
*** 3374,3379 ****
--- 3600,3606 ----
tree vec_oprnd0=NULL, vec_oprnd1=NULL;
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
enum tree_code code, code1 = ERROR_MARK;
tree new_temp;
tree def, def_stmt;
*************** vectorizable_type_demotion (tree stmt, b
*** 3426,3431 ****
--- 3653,3665 ----
ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_out;
gcc_assert (ncopies >= 1);
+ /* FORNOW. This restriction should be relaxed. */
+ if (nested_in_vect_loop_p (loop, stmt) && ncopies > 1)
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "multiple types in nested loop.");
+ return false;
+ }
if (! ((INTEGRAL_TYPE_P (TREE_TYPE (scalar_dest))
&& INTEGRAL_TYPE_P (TREE_TYPE (op0)))
*************** vectorizable_type_promotion (tree stmt,
*** 3523,3528 ****
--- 3757,3763 ----
tree vec_oprnd0=NULL, vec_oprnd1=NULL;
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
enum tree_code code, code1 = ERROR_MARK, code2 = ERROR_MARK;
tree decl1 = NULL_TREE, decl2 = NULL_TREE;
int op_type;
*************** vectorizable_type_promotion (tree stmt,
*** 3576,3581 ****
--- 3811,3823 ----
ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_in;
gcc_assert (ncopies >= 1);
+ /* FORNOW. This restriction should be relaxed. */
+ if (nested_in_vect_loop_p (loop, stmt) && ncopies > 1)
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "multiple types in nested loop.");
+ return false;
+ }
if (! ((INTEGRAL_TYPE_P (TREE_TYPE (scalar_dest))
&& INTEGRAL_TYPE_P (TREE_TYPE (op0)))
*************** vectorizable_store (tree stmt, block_stm
*** 3868,3873 ****
--- 4110,4116 ----
struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info), *first_dr = NULL;
tree vectype = STMT_VINFO_VECTYPE (stmt_info);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
enum machine_mode vec_mode;
tree dummy;
enum dr_alignment_support alignment_support_cheme;
*************** vectorizable_store (tree stmt, block_stm
*** 3885,3890 ****
--- 4128,4140 ----
unsigned int group_size, i;
VEC(tree,heap) *dr_chain = NULL, *oprnds = NULL, *result_chain = NULL;
gcc_assert (ncopies >= 1);
+ /* FORNOW. This restriction should be relaxed. */
+ if (nested_in_vect_loop_p (loop, stmt) && ncopies > 1)
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "multiple types in nested loop.");
+ return false;
+ }
if (!STMT_VINFO_RELEVANT_P (stmt_info))
return false;
*************** vectorizable_load (tree stmt, block_stmt
*** 4546,4551 ****
--- 4796,4810 ----
bool strided_load = false;
tree first_stmt;
+ gcc_assert (ncopies >= 1);
+ /* FORNOW. This restriction should be relaxed. */
+ if (nested_in_vect_loop_p (loop, stmt) && ncopies > 1)
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "multiple types in nested loop.");
+ return false;
+ }
+
if (!STMT_VINFO_RELEVANT_P (stmt_info))
return false;
*************** vectorizable_live_operation (tree stmt,
*** 4842,4847 ****
--- 5101,5107 ----
tree operation;
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
int i;
int op_type;
tree op;
*************** vectorizable_live_operation (tree stmt,
*** 4859,4864 ****
--- 5119,5128 ----
if (TREE_CODE (GIMPLE_STMT_OPERAND (stmt, 0)) != SSA_NAME)
return false;
+ /* FORNOW. CHECKME. */
+ if (nested_in_vect_loop_p (loop, stmt))
+ return false;
+
operation = GIMPLE_STMT_OPERAND (stmt, 1);
op_type = TREE_OPERAND_LENGTH (operation);
*************** vect_transform_loop (loop_vec_info loop_
*** 6067,6074 ****
fprintf (vect_dump, "------>vectorizing statement: ");
print_generic_expr (vect_dump, stmt, TDF_SLIM);
}
stmt_info = vinfo_for_stmt (stmt);
! gcc_assert (stmt_info);
if (!STMT_VINFO_RELEVANT_P (stmt_info)
&& !STMT_VINFO_LIVE_P (stmt_info))
{
--- 6331,6348 ----
fprintf (vect_dump, "------>vectorizing statement: ");
print_generic_expr (vect_dump, stmt, TDF_SLIM);
}
+
stmt_info = vinfo_for_stmt (stmt);
!
! /* vector stmts created in the outer-loop during vectorization of
! stmts in an inner-loop may not have a stmt_info, and do not
! need to be vectorized. */
! if (!stmt_info)
! {
! bsi_next (&si);
! continue;
! }
!
if (!STMT_VINFO_RELEVANT_P (stmt_info)
&& !STMT_VINFO_LIVE_P (stmt_info))
{
*************** vect_transform_loop (loop_vec_info loop_
*** 6140,6143 ****
--- 6414,6419 ----
if (vect_print_dump_info (REPORT_VECTORIZED_LOOPS))
fprintf (vect_dump, "LOOP VECTORIZED.");
+ if (loop->inner && vect_print_dump_info (REPORT_VECTORIZED_LOOPS))
+ fprintf (vect_dump, "OUTER LOOP VECTORIZED.");
}
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [patch] [4.3 projects] outer-loop vectorization patch 1/n
2007-08-08 21:23 [patch] [4.3 projects] outer-loop vectorization patch 1/n Dorit Nuzman
@ 2007-08-09 11:58 ` Dorit Nuzman
2007-08-12 15:02 ` Dorit Nuzman
1 sibling, 0 replies; 3+ messages in thread
From: Dorit Nuzman @ 2007-08-09 11:58 UTC (permalink / raw)
To: Dorit Nuzman; +Cc: gcc-patches
>
> Hi,
>
> This patch is the first part of
> http://gcc.gnu.org/ml/gcc-patches/2007-08/msg00461.html. It adds initial
> support for outer-loop vectorization. It basicaly brings over this patch:
> http://gcc.gnu.org/ml/gcc-patches/2007-04/msg00044.html, along with some
> fixes that went in later.
> This patch can vectorize outer-loops only if there are no
memory-references
> in the inner-loop.
>
> The patch includes the following changes to the vectorizer:
>
> 1) So far we supported single-BB loops (+empty latch), so the order by
> which we traversed the loop BBs did not matter. Now, it does - we sort in
> BBs in dfs order (since we don't allow if's in the loop, this should
> guarantee visiting defs before their uses).
>
> 2) vect_analyze_loop_form was extend to allow a restricted form of
> outer-loops. We currently support doubly-nested loops that consist of a
> header, a single inner(most)-loop, a tail, and an empty latch (5 BBs all
> together).
>
the following bit was missing from vect_analyze_loop_form: it's not enough
to check that the inner-loop bound is countable, it also needs to be
invariant in the outer-loop:
*************** vect_analyze_loop_form (struct loop *loo
*** 3052,3057 ****
--- 3080,3095 ----
return NULL;
}
+ if (!expr_invariant_in_loop_p (loop,
+ LOOP_VINFO_NITERS
(inner_loop_vinfo)))
+ {
+ if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
+ fprintf (vect_dump,
+ "not vectorized: inner-loop count not invariant.");
+ destroy_loop_vec_info (inner_loop_vinfo, true);
+ return NULL;
+ }
+
if (loop->num_nodes != 5)
{
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
Index: vect-outer-2d.c
===================================================================
--- vect-outer-2d.c (revision 0)
+++ vect-outer-2d.c (revision 0)
@@ -0,0 +1,41 @@
+/* { dg-require-effective-target vect_float } */
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 40
+float image[N][N][N+1] __attribute__ ((__aligned__(16)));
+
+void
+foo (){
+ int i,j,k;
+
+ for (k=0; k<N; k++) {
+ for (i = 0; i < N; i++) {
+ for (j = 0; j < i+1; j++) {
+ image[k][j][i] = j+i+k;
+ }
+ }
+ }
+}
+
+int main (void)
+{
+ check_vect ();
+ int i, j, k;
+
+ foo ();
+
+ for (k=0; k<N; k++) {
+ for (i = 0; i < N; i++) {
+ for (j = 0; j < i+1; j++) {
+ if (image[k][j][i] != j+i+k)
+ abort ();
+ }
+ }
+ }
+
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 0 "vect" } }
*/
+/* { dg-final { cleanup-tree-dump "vect" } } */
dorit
> 3) vect_analyze_loop_form calls a new function - vect_analyze_loop_1 - to
> do a few analyses on the inner-loop (currently only one analysis:
> analyze_loop_form), and to build a loop_info for the inner-loop. It is
> destroyed soon after, but w/o destroying the stmt_info's that were set up
> for the inner-loop stmts. Maybe later we'll keep the inner-loop_info
> around, if needed.
>
> 4) Support for outer-loops breaks the assumption that phi nodes are only
in
> the loop-header, and represent a scalar-cycle (induction or reduction).
In
> outer-loops we also have phi-nodes inside the loop - these are the
> loop-closed phis after the inner-loop. This required a way to distinguish
> between these two kinds of phis (we use 'is_loop_header_bb_p' for that),
> and a few small changes in several places:
> o new_stmt_vec_info: different def-type initialization for the two kinds
of
> phis
> o vect_is_simple_reduction: the uses that are not the reduction-variable
> can now be defined by a phi, though not a loop-header phi.
> o vect_recog_dot_prod_pattern: a vect_loop_def might be a phi, and not
> necessarily a gimple_modify_stmt.
> o vect_get_vec_def_for_oprnd: a vect_loop_def can be a phi node, and not
> necessarily a gimple_modify_stmt.
>
> 5) the enum "relevant" has two new values -
> vect_used_in_outer[_by_reduction], which are propagated during the
> mark_relevant pass.
>
> 6) since we don't yet support multiple-data-types in the inner-loop, we
> check in all relevant places, that this is not the case.
>
> The more significant changes are to vectorization of reduction and
> induction. In both cases we need to be aware of whether the
> induction/reduction-phi that we are vectorizing is in the same nest that
is
> being vectorized, or is 'nested_in_vect_loop' (is inside the inner-loop
> while vectorizing the outer-loop):
>
> 7) vectorization of induction: In get_initial_def_for_induction, if this
is
> a 'nested_in_vect_loop' case, then:
> o the initialization vector can be obtained using
> vect_get_vec_def_for_operand (does not need to be built from scratch).
> o the vector that holds the step of the vectorized induction is {S,S,S,S}
> rather than {VF*S,VF*S,VF*S,VF*S} (where S is the step of the induction),
> because in the vectorized inner-loop we are advancing sequentially
(though
> in parallel for VF outer-loop iterations).
> o the final vector for inductions is recorded in the corresponding
> loop-exit phi (of the inner-loop) so that we can easily obtain it when we
> vectorize stmts in the outer-loop that use it.
>
> 8) vectorization of reduction: The main thing here is that we don't need
to
> reduce the reduction to a single result; the final vector of partial
> results will feed the vector operations that may use it in the
outer-loop.
> So:
> o In get_initial_def_for_reduction, we may return a vector for the epilog
> adjustment, rather than a scalar.
> o epilog_for_reduction - skip the part that computes the final scalar
> result in case this is a 'nested_in_vect_loop' case.
> o and in vectorizable_reduction, we don't check that the reduction is
> LIVE_P anymore (used out of the loop), cause it may be not used outside
the
> (outer) loop, but used inside the outer-loop (so as far as the inner-loop
> reduction is concerned, it is used_in_outer_loop, but not live).
>
> Bootstrpped on powerpc64-linux,
> bootstrapped with vectorization enabled on i386-linux,
> passed full regression testing on both platforms.
>
> I will wait at least a week to give people a chance to review and
comment.
>
> thanks,
> dorit
>
> ChangeLog:
>
> * tree-vectorizer.h (vect_is_simple_reduction): Takes a
> loop_vec_info
> as argument instead of struct loop.
> (nested_in_vect_loop_p): New function.
> (vect_relevant): Add enum values vect_used_in_outer_by_reduction
> and
> vect_used_in_outer.
> (is_loop_header_bb_p): New. Used to differentiate loop-header
phis
> from other phis in the loop.
> (destroy_loop_vec_info): Add additional argument to declaration.
>
> * tree-vectorizer.c (supportable_widening_operation): Also check
if
> nested_in_vect_loop_p (don't allow changing the order in this
> case).
> (vect_is_simple_reduction): Takes a loop_vec_info as argument
> instead
> of struct loop. Call nested_in_vect_loop_p and don't require
> flag_unsafe_math_optimizations if it returns true.
> * tree-vectorizer.c (new_stmt_vec_info): When setting def_type
for
> phis differentiate loop-header phis from other phis.
> (bb_in_loop_p): New function.
> (new_loop_vec_info): Inner-loop phis already have a stmt_vinfo,
so
> just
> update their loop_vinfo. Order of BB traversal now matters -
call
> dfs_enumerate_from with bb_in_loop_p.
> (destroy_loop_vec_info): Takes additional argument to control
> whether
> stmt_vinfo of the loop stmts should be destroyed as well.
> (vect_is_simple_reduction): Allow the "non-reduction" use of a
> reduction stmt to be defines by a non loop-header phi.
> (vectorize_loops): Call destroy_loop_vec_info with additional
> argument.
>
> * tree-vect-transform.c (vectorizable_reduction): Call
> nested_in_vect_loop_p. Check for multitypes in the inner-loop.
> (vectorizable_call): Likewise.
> (vectorizable_conversion): Likewise.
> (vectorizable_operation): Likewise.
> (vectorizable_type_promotion): Likewise.
> (vectorizable_type_demotion): Likewise.
> (vectorizable_store): Likewise.
> (vectorizable_live_operation): Likewise.
> (vectorizable_reduction): Likewise. Also pass loop_info to
> vect_is_simple_reduction instead of loop.
> (vect_init_vector): Call nested_in_vect_loop_p.
> (get_initial_def_for_reduction): Likewise.
> (vect_create_epilog_for_reduction): Likewise.
> (vect_init_vector): Check which loop to work with, in case
there's
> an
> inner-loop.
> (get_initial_def_for_inducion): Extend to handle outer-loop
> vectorization. Fix indentation.
> (vect_get_vec_def_for_operand): Support phis in the case
> vect_loop_def.
> In the case vect_induction_def get the vector def from the
> induction
> phi node, instead of calling get_initial_def_for_inducion.
> (get_initial_def_for_reduction): Extend to handle outer-loop
> vectorization.
> (vect_create_epilog_for_reduction): Extend to handle outer-loop
> vectorization.
> (vect_transform_loop): Change assert to just skip this case. Add
a
> dump printout.
> (vect_finish_stmt_generation): Add a couple asserts.
>
> (vect_estimate_min_profitable_iters): Multiply
> cost of inner-loop stmts (in outer-loop vectorization) by
estimated
> inner-loop bound.
> (vect_model_reduction_cost): Don't add reduction epilogue cost in
> case
> this is an inner-loop reduction in outer-loop vectorization.
>
> * tree-vect-analyze.c (vect_analyze_scalar_cycles_1): New
function.
> Same code as what used to be vect_analyze_scalar_cycles, only
with
> additional argument loop, and loop_info passed to
> vect_is_simple_reduction instead of loop.
> (vect_analyze_scalar_cycles): Code factored out into
> vect_analyze_scalar_cycles_1. Call it for each relevant
loop-nest.
> Updated documentation.
> (analyze_operations): Check for inner-loop loop-closed exit-phis
> during
> outer-loop vectorization that are live or not used in the
> outerloop,
> cause this requires special handling.
> (vect_enhance_data_refs_alignment): Don't consider versioning for
> nested-loops.
> (vect_analyze_data_refs): Check that there are no datarefs in the
> inner-loop.
> (vect_mark_stmts_to_be_vectorized): Also consider
> vect_used_in_outer
> and vect_used_in_outer_by_reduction cases.
> (process_use): Also consider the case of outer-loop stmt defining
> an
> inner-loop stmt and vice versa.
> (vect_analyze_loop_1): New function.
> (vect_analyze_loop_form): Extend, to allow a restricted form of
> nested
> loops. Call vect_analyze_loop_1.
> (vect_analyze_loop): Skip (inner-)loops within outer-loops that
> have
> been vectorized. Call destroy_loop_vec_info with additional
> argument.
>
> * tree-vect-patterns.c (vect_recog_widen_sum_pattern): Don't
allow
> in the inner-loop when doing outer-loop vectorization. Add
> documentation and printout.
> (vect_recog_dot_prod_pattern): Likewise. Also add check for
> GIMPLE_MODIFY_STMT (in case we encounter a phi in the loop).
>
> testsuite/ChangeLog:
>
> * gcc.dg/vect/vect.exp: Compile tests with -fno-tree-scev-cprop
> and -fno-tree-reassoc.
> * gcc.dg/vect/no-tree-scev-cprop-vect-iv-1.c: Moved to...
> * gcc.dg/vect/no-scevccp-vect-iv-1.c: New test.
> * gcc.dg/vect/no-tree-scev-cprop-vect-iv-2.c: Moved to...
> * gcc.dg/vect/no-scevccp-vect-iv-2.c: New test.
> * gcc.dg/vect/no-scevccp-noreassoc-outer-1.c: New test.
> * gcc.dg/vect/no-scevccp-noreassoc-outer-2.c: New test.
> * gcc.dg/vect/no-scevccp-noreassoc-outer-3.c: New test.
> * gcc.dg/vect/no-scevccp-noreassoc-outer-4.c: New test.
> * gcc.dg/vect/no-scevccp-noreassoc-outer-5.c: New test.
> * gcc.dg/vect/no-scevccp-outer-1.c: New test.
> * gcc.dg/vect/no-scevccp-outer-2.c: New test.
> * gcc.dg/vect/no-scevccp-outer-3.c: New test.
> * gcc.dg/vect/no-scevccp-outer-4.c: New test.
> * gcc.dg/vect/no-scevccp-outer-5.c: New test.
> * gcc.dg/vect/no-scevccp-outer-6.c: New test.
> * gcc.dg/vect/no-scevccp-outer-7.c: New test.
> * gcc.dg/vect/no-scevccp-outer-8.c: New test.
> * gcc.dg/vect/no-scevccp-outer-9.c: New test.
> * gcc.dg/vect/no-scevccp-outer-9a.c: New test.
> * gcc.dg/vect/no-scevccp-outer-9b.c: New test.
> * gcc.dg/vect/no-scevccp-outer-10.c: New test.
> * gcc.dg/vect/no-scevccp-outer-10a.c: New test.
> * gcc.dg/vect/no-scevccp-outer-10b.c: New test.
> * gcc.dg/vect/no-scevccp-outer-11.c: New test.
> * gcc.dg/vect/no-scevccp-outer-12.c: New test.
> * gcc.dg/vect/no-scevccp-outer-13.c: New test.
> * gcc.dg/vect/no-scevccp-outer-14.c: New test.
> * gcc.dg/vect/no-scevccp-outer-15.c: New test.
> * gcc.dg/vect/no-scevccp-outer-16.c: New test.
> * gcc.dg/vect/no-scevccp-outer-17.c: New test.
> * gcc.dg/vect/no-scevccp-outer-18.c: New test.
> * gcc.dg/vect/no-scevccp-outer-19.c: New test.
> * gcc.dg/vect/no-scevccp-outer-20.c: New test.
> * gcc.dg/vect/no-scevccp-outer-21.c: New test.
> * gcc.dg/vect/no-scevccp-outer-22.c: New test.
>
> (See attached file: mainlineouterloopdiff1t.txt)[attachment
> "mainlineouterloopdiff1t.txt" deleted by Dorit Nuzman/Haifa/IBM]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [patch] [4.3 projects] outer-loop vectorization patch 1/n
2007-08-08 21:23 [patch] [4.3 projects] outer-loop vectorization patch 1/n Dorit Nuzman
2007-08-09 11:58 ` Dorit Nuzman
@ 2007-08-12 15:02 ` Dorit Nuzman
1 sibling, 0 replies; 3+ messages in thread
From: Dorit Nuzman @ 2007-08-12 15:02 UTC (permalink / raw)
To: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 11498 bytes --]
Attached below is the updated patch (part 1, updated to a more recent
snapshot)
Bootstrpped on powerpc64-linux,
bootstrapped with vectorization enabled on i386-linux,
and tested on the vectorizer testcases.
dorit
(See attached file: updated-outerloop-patch1.txt)
> Hi,
> This patch is the first part of
> http://gcc.gnu.org/ml/gcc-patches/2007-08/msg00461.html. It adds initial
> support for outer-loop vectorization. It basicaly brings over this patch:
> http://gcc.gnu.org/ml/gcc-patches/2007-04/msg00044.html, along with some
> fixes that went in later.
> This patch can vectorize outer-loops only if there are no
memory-references
> in the inner-loop.
> The patch includes the following changes to the vectorizer:
> 1) So far we supported single-BB loops (+empty latch), so the order by
> which we traversed the loop BBs did not matter. Now, it does - we sort in
> BBs in dfs order (since we don't allow if's in the loop, this should
> guarantee visiting defs before their uses).
> 2) vect_analyze_loop_form was extend to allow a restricted form of
> outer-loops. We currently support doubly-nested loops that consist of a
> header, a single inner(most)-loop, a tail, and an empty latch (5 BBs all
> together).
> 3) vect_analyze_loop_form calls a new function - vect_analyze_loop_1 - to
> do a few analyses on the inner-loop (currently only one analysis:
> analyze_loop_form), and to build a loop_info for the inner-loop. It is
> destroyed soon after, but w/o destroying the stmt_info's that were set up
> for the inner-loop stmts. Maybe later we'll keep the inner-loop_info
> around, if needed.
> 4) Support for outer-loops breaks the assumption that phi nodes are only
in
> the loop-header, and represent a scalar-cycle (induction or reduction).
In
> outer-loops we also have phi-nodes inside the loop - these are the
> loop-closed phis after the inner-loop. This required a way to distinguish
> between these two kinds of phis (we use 'is_loop_header_bb_p' for that),
> and a few small changes in several places:
> o new_stmt_vec_info: different def-type initialization for the two kinds
of
> phis
> o vect_is_simple_reduction: the uses that are not the reduction-variable
> can now be defined by a phi, though not a loop-header phi.
> o vect_recog_dot_prod_pattern: a vect_loop_def might be a phi, and not
> necessarily a gimple_modify_stmt.
> o vect_get_vec_def_for_oprnd: a vect_loop_def can be a phi node, and not
> necessarily a gimple_modify_stmt.
> 5) the enum "relevant" has two new values -
> vect_used_in_outer[_by_reduction], which are propagated during the
> mark_relevant pass.
> 6) since we don't yet support multiple-data-types in the inner-loop, we
> check in all relevant places, that this is not the case.
> The more significant changes are to vectorization of reduction and
> induction. In both cases we need to be aware of whether the
> induction/reduction-phi that we are vectorizing is in the same nest that
is
> being vectorized, or is 'nested_in_vect_loop' (is inside the inner-loop
> while vectorizing the outer-loop):
> 7) vectorization of induction: In get_initial_def_for_induction, if this
is
> a 'nested_in_vect_loop' case, then:
> o the initialization vector can be obtained using
> vect_get_vec_def_for_operand (does not need to be built from scratch).
> o the vector that holds the step of the vectorized induction is {S,S,S,S}
> rather than {VF*S,VF*S,VF*S,VF*S} (where S is the step of the induction),
> because in the vectorized inner-loop we are advancing sequentially
(though
> in parallel for VF outer-loop iterations).
> o the final vector for inductions is recorded in the corresponding
> loop-exit phi (of the inner-loop) so that we can easily obtain it when we
> vectorize stmts in the outer-loop that use it.
> 8) vectorization of reduction: The main thing here is that we don't need
to
> reduce the reduction to a single result; the final vector of partial
> results will feed the vector operations that may use it in the
outer-loop.
> So:
> o In get_initial_def_for_reduction, we may return a vector for the epilog
> adjustment, rather than a scalar.
> o epilog_for_reduction - skip the part that computes the final scalar
> result in case this is a 'nested_in_vect_loop' case.
> o and in vectorizable_reduction, we don't check that the reduction is
> LIVE_P anymore (used out of the loop), cause it may be not used outside
the
> (outer) loop, but used inside the outer-loop (so as far as the inner-loop
> reduction is concerned, it is used_in_outer_loop, but not live).
> Bootstrpped on powerpc64-linux,
> bootstrapped with vectorization enabled on i386-linux,
> passed full regression testing on both platforms.
> I will wait at least a week to give people a chance to review and
comment.
> thanks,
> dorit
> ChangeLog:
> * tree-vectorizer.h (vect_is_simple_reduction): Takes a
> loop_vec_info
> as argument instead of struct loop.
> (nested_in_vect_loop_p): New function.
> (vect_relevant): Add enum values vect_used_in_outer_by_reduction
> and
> vect_used_in_outer.
> (is_loop_header_bb_p): New. Used to differentiate loop-header phis
> from other phis in the loop.
> (destroy_loop_vec_info): Add additional argument to declaration.
>
> * tree-vectorizer.c (supportable_widening_operation): Also check if
> nested_in_vect_loop_p (don't allow changing the order in this
> case).
> (vect_is_simple_reduction): Takes a loop_vec_info as argument
> instead
> of struct loop. Call nested_in_vect_loop_p and don't require
> flag_unsafe_math_optimizations if it returns true.
> * tree-vectorizer.c (new_stmt_vec_info): When setting def_type for
> phis differentiate loop-header phis from other phis.
> (bb_in_loop_p): New function.
> (new_loop_vec_info): Inner-loop phis already have a stmt_vinfo, so
> just
> update their loop_vinfo. Order of BB traversal now matters - call
> dfs_enumerate_from with bb_in_loop_p.
> (destroy_loop_vec_info): Takes additional argument to control
> whether
> stmt_vinfo of the loop stmts should be destroyed as well.
> (vect_is_simple_reduction): Allow the "non-reduction" use of a
> reduction stmt to be defines by a non loop-header phi.
> (vectorize_loops): Call destroy_loop_vec_info with additional
> argument.
> * tree-vect-transform.c (vectorizable_reduction): Call
> nested_in_vect_loop_p. Check for multitypes in the inner-loop.
> (vectorizable_call): Likewise.
> (vectorizable_conversion): Likewise.
> (vectorizable_operation): Likewise.
> (vectorizable_type_promotion): Likewise.
> (vectorizable_type_demotion): Likewise.
> (vectorizable_store): Likewise.
> (vectorizable_live_operation): Likewise.
> (vectorizable_reduction): Likewise. Also pass loop_info to
> vect_is_simple_reduction instead of loop.
> (vect_init_vector): Call nested_in_vect_loop_p.
> (get_initial_def_for_reduction): Likewise.
> (vect_create_epilog_for_reduction): Likewise.
> (vect_init_vector): Check which loop to work with, in case there's
> an
> inner-loop.
> (get_initial_def_for_inducion): Extend to handle outer-loop
> vectorization. Fix indentation.
> (vect_get_vec_def_for_operand): Support phis in the case
> vect_loop_def.
> In the case vect_induction_def get the vector def from the
> induction
> phi node, instead of calling get_initial_def_for_inducion.
> (get_initial_def_for_reduction): Extend to handle outer-loop
> vectorization.
> (vect_create_epilog_for_reduction): Extend to handle outer-loop
> vectorization.
> (vect_transform_loop): Change assert to just skip this case. Add a
> dump printout.
> (vect_finish_stmt_generation): Add a couple asserts.
>
> (vect_estimate_min_profitable_iters): Multiply
> cost of inner-loop stmts (in outer-loop vectorization) by estimated
> inner-loop bound.
> (vect_model_reduction_cost): Don't add reduction epilogue cost in
> case
> this is an inner-loop reduction in outer-loop vectorization.
>
> * tree-vect-analyze.c (vect_analyze_scalar_cycles_1): New function.
> Same code as what used to be vect_analyze_scalar_cycles, only with
> additional argument loop, and loop_info passed to
> vect_is_simple_reduction instead of loop.
> (vect_analyze_scalar_cycles): Code factored out into
> vect_analyze_scalar_cycles_1. Call it for each relevant loop-nest.
> Updated documentation.
> (analyze_operations): Check for inner-loop loop-closed exit-phis
> during
> outer-loop vectorization that are live or not used in the
> outerloop,
> cause this requires special handling.
> (vect_enhance_data_refs_alignment): Don't consider versioning for
> nested-loops.
> (vect_analyze_data_refs): Check that there are no datarefs in the
> inner-loop.
> (vect_mark_stmts_to_be_vectorized): Also consider
> vect_used_in_outer
> and vect_used_in_outer_by_reduction cases.
> (process_use): Also consider the case of outer-loop stmt defining
> an
> inner-loop stmt and vice versa.
> (vect_analyze_loop_1): New function.
> (vect_analyze_loop_form): Extend, to allow a restricted form of
> nested
> loops. Call vect_analyze_loop_1.
> (vect_analyze_loop): Skip (inner-)loops within outer-loops that
> have
> been vectorized. Call destroy_loop_vec_info with additional
> argument.
> * tree-vect-patterns.c (vect_recog_widen_sum_pattern): Don't allow
> in the inner-loop when doing outer-loop vectorization. Add
> documentation and printout.
> (vect_recog_dot_prod_pattern): Likewise. Also add check for
> GIMPLE_MODIFY_STMT (in case we encounter a phi in the loop).
>
> testsuite/ChangeLog:
> * gcc.dg/vect/vect.exp: Compile tests with -fno-tree-scev-cprop
> and -fno-tree-reassoc.
> * gcc.dg/vect/no-tree-scev-cprop-vect-iv-1.c: Moved to...
> * gcc.dg/vect/no-scevccp-vect-iv-1.c: New test.
> * gcc.dg/vect/no-tree-scev-cprop-vect-iv-2.c: Moved to...
> * gcc.dg/vect/no-scevccp-vect-iv-2.c: New test.
> * gcc.dg/vect/no-scevccp-noreassoc-outer-1.c: New test.
> * gcc.dg/vect/no-scevccp-noreassoc-outer-2.c: New test.
> * gcc.dg/vect/no-scevccp-noreassoc-outer-3.c: New test.
> * gcc.dg/vect/no-scevccp-noreassoc-outer-4.c: New test.
> * gcc.dg/vect/no-scevccp-noreassoc-outer-5.c: New test.
> * gcc.dg/vect/no-scevccp-outer-1.c: New test.
> * gcc.dg/vect/no-scevccp-outer-2.c: New test.
> * gcc.dg/vect/no-scevccp-outer-3.c: New test.
> * gcc.dg/vect/no-scevccp-outer-4.c: New test.
> * gcc.dg/vect/no-scevccp-outer-5.c: New test.
> * gcc.dg/vect/no-scevccp-outer-6.c: New test.
> * gcc.dg/vect/no-scevccp-outer-7.c: New test.
> * gcc.dg/vect/no-scevccp-outer-8.c: New test.
> * gcc.dg/vect/no-scevccp-outer-9.c: New test.
> * gcc.dg/vect/no-scevccp-outer-9a.c: New test.
> * gcc.dg/vect/no-scevccp-outer-9b.c: New test.
> * gcc.dg/vect/no-scevccp-outer-10.c: New test.
> * gcc.dg/vect/no-scevccp-outer-10a.c: New test.
> * gcc.dg/vect/no-scevccp-outer-10b.c: New test.
> * gcc.dg/vect/no-scevccp-outer-11.c: New test.
> * gcc.dg/vect/no-scevccp-outer-12.c: New test.
> * gcc.dg/vect/no-scevccp-outer-13.c: New test.
> * gcc.dg/vect/no-scevccp-outer-14.c: New test.
> * gcc.dg/vect/no-scevccp-outer-15.c: New test.
> * gcc.dg/vect/no-scevccp-outer-16.c: New test.
> * gcc.dg/vect/no-scevccp-outer-17.c: New test.
> * gcc.dg/vect/no-scevccp-outer-18.c: New test.
> * gcc.dg/vect/no-scevccp-outer-19.c: New test.
> * gcc.dg/vect/no-scevccp-outer-20.c: New test.
> * gcc.dg/vect/no-scevccp-outer-21.c: New test.
> * gcc.dg/vect/no-scevccp-outer-22.c: New test.
>
> (See attached file: mainlineouterloopdiff1t.txt)
>
> #### mainlineouterloopdiff1t.txt has been deleted (was already in
> repository MyAttachments Repository ->) from this note on 11 August
> 2007 by Dorit Nuzman
[-- Attachment #2: updated-outerloop-patch1.txt --]
[-- Type: text/plain, Size: 137634 bytes --]
Index: testsuite/gcc.dg/vect/vect-widen-mult-sum.c
===================================================================
*** testsuite/gcc.dg/vect/vect-widen-mult-sum.c (revision 127371)
--- testsuite/gcc.dg/vect/vect-widen-mult-sum.c (working copy)
*************** int main (void)
*** 42,45 ****
--- 42,46 ----
/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_widen_mult_hi_to_si } } } */
+ /* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 1 "vect" } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-7.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-7.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-7.c (revision 0)
***************
*** 0 ****
--- 1,75 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 16
+
+ unsigned short in[N];
+ unsigned short coef[N];
+ unsigned short a[N];
+
+ unsigned int
+ foo (short scale){
+ int i;
+ unsigned short j;
+ unsigned int sum = 0;
+ unsigned short sum_j;
+
+ for (i = 0; i < N; i++) {
+ sum_j = 0;
+ for (j = 0; j < N; j++) {
+ sum_j += j;
+ }
+ a[i] = sum_j;
+ sum += ((unsigned int) in[i] * (unsigned int) coef[i]) >> scale;
+ }
+ return sum;
+ }
+
+ unsigned short
+ bar (void)
+ {
+ unsigned short j;
+ unsigned short sum_j;
+
+ sum_j = 0;
+ for (j = 0; j < N; j++) {
+ sum_j += j;
+ }
+
+ return sum_j;
+ }
+
+ int main (void)
+ {
+ int i;
+ unsigned short j, sum_j;
+ unsigned int sum = 0;
+ unsigned int res;
+
+ check_vect ();
+
+ for (i=0; i<N; i++){
+ in[i] = 2*i;
+ coef[i] = i;
+ }
+
+ res = foo (2);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ if (a[i] != bar ())
+ abort ();
+ sum += ((unsigned int) in[i] * (unsigned int) coef[i]) >> 2;
+ }
+ if (res != sum)
+ abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target vect_widen_mult_hi_to_si } } } */
+ /* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-10.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-10.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-10.c (revision 0)
***************
*** 0 ****
--- 1,54 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+ int b[N];
+
+ int
+ foo (int n){
+ int i,j;
+ int sum,x,y;
+
+ for (i = 0; i < N/2; i++) {
+ sum = 0;
+ x = b[2*i];
+ y = b[2*i+1];
+ for (j = 0; j < n; j++) {
+ sum += j;
+ }
+ a[2*i] = sum + x;
+ a[2*i+1] = sum + y;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ b[i] = i;
+
+ foo (N-1);
+
+ /* check results: */
+ for (i=0; i<N/2; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N-1; j++)
+ sum += j;
+ if (a[2*i] != sum + b[2*i] || a[2*i+1] != sum + b[2*i+1])
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-10a.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-10a.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-10a.c (revision 0)
***************
*** 0 ****
--- 1,58 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+ int b[N];
+
+ int
+ foo (int n){
+ int i,j;
+ int sum,x,y;
+
+ if (n<=0)
+ return 0;
+
+ for (i = 0; i < N/2; i++) {
+ sum = 0;
+ x = b[2*i];
+ y = b[2*i+1];
+ j = 0;
+ do {
+ sum += j;
+ } while (++j < n);
+ a[2*i] = sum + x;
+ a[2*i+1] = sum + y;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ b[i] = i;
+
+ foo (N-1);
+
+ /* check results: */
+ for (i=0; i<N/2; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N-1; j++)
+ sum += j;
+ if (a[2*i] != sum + b[2*i] || a[2*i+1] != sum + b[2*i+1])
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-18.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-18.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-18.c (revision 0)
***************
*** 0 ****
--- 1,51 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (){
+ int i,j;
+ int sum;
+
+ for (i = 0; i < N/2; i++) {
+ sum = 0;
+ for (j = 0; j < N; j++) {
+ sum += j;
+ }
+ a[2*i] = sum;
+ a[2*i+1] = 2*sum;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ foo ();
+
+ /* check results: */
+ for (i=0; i<N/2; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += j;
+ if (a[2*i] != sum || a[2*i+1] != 2*sum)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-8.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-8.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-8.c (revision 0)
***************
*** 0 ****
--- 1,50 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+
+ int
+ foo (int *a){
+ int i,j;
+ int sum;
+
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ for (j = 0; j < N; j++) {
+ sum += j;
+ }
+ a[i] = sum;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+ int a[N];
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ foo (a);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += j;
+ if (a[i] != sum)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-11.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-11.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-11.c (revision 0)
***************
*** 0 ****
--- 1,50 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (int n){
+ int i,j;
+ int sum;
+
+ for (i = 0; i < n; i++) {
+ sum = 0;
+ for (j = 0; j < N; j++) {
+ sum += j;
+ }
+ a[i] = sum;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ foo (N);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += j;
+ if (a[i] != sum)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-10b.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-10b.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-10b.c (revision 0)
***************
*** 0 ****
--- 1,57 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+ int b[N];
+
+ int
+ foo (int n){
+ int i,j;
+ int sum,x,y;
+
+ if (n<=0)
+ return 0;
+
+ for (i = 0; i < N/2; i++) {
+ sum = 0;
+ x = b[2*i];
+ y = b[2*i+1];
+ for (j = 0; j < n; j++) {
+ sum += j;
+ }
+ a[2*i] = sum + x;
+ a[2*i+1] = sum + y;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ b[i] = i;
+
+ foo (N-1);
+
+ /* check results: */
+ for (i=0; i<N/2; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N-1; j++)
+ sum += j;
+ if (a[2*i] != sum + b[2*i] || a[2*i+1] != sum + b[2*i+1])
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-19.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-19.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-19.c (revision 0)
***************
*** 0 ****
--- 1,52 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 64
+
+ unsigned short a[N];
+ unsigned int b[N];
+
+ int
+ foo (){
+ unsigned short i,j;
+ unsigned short sum;
+
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ for (j = 0; j < N; j++) {
+ sum += j;
+ }
+ a[i] = sum;
+ b[i] = (unsigned int)sum;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ short sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ foo ();
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += j;
+ if (a[i] != sum || b[i] != (unsigned int)sum)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-20.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-20.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-20.c (revision 0)
***************
*** 0 ****
--- 1,54 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+ int b[N];
+
+ int
+ foo (){
+ int i,j;
+ int sum,x,y;
+
+ for (i = 0; i < N/2; i++) {
+ sum = 0;
+ x = b[2*i];
+ y = b[2*i+1];
+ for (j = 0; j < N; j++) {
+ sum += j;
+ }
+ a[2*i] = sum + x;
+ a[2*i+1] = sum + y;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ b[i] = i;
+
+ foo ();
+
+ /* check results: */
+ for (i=0; i<N/2; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += j;
+ if (a[2*i] != sum + b[2*i] || a[2*i+1] != sum + b[2*i+1])
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target { vect_interleave && vect_extract_even_odd } } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-1.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-1.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-1.c (revision 0)
***************
*** 0 ****
--- 1,23 ----
+ /* { dg-do compile } */
+
+ #define N 40
+ signed short image[N][N];
+ signed short block[N][N];
+
+ /* memory references in the inner-loop */
+
+ unsigned int
+ foo (){
+ int i,j;
+ unsigned int diff = 0;
+
+ for (i = 0; i < N; i++) {
+ for (j = 0; j < N; j++) {
+ diff += (image[i][j] - block[i][j]);
+ }
+ }
+ return diff;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-9.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-9.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-9.c (revision 0)
***************
*** 0 ****
--- 1,50 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (int n){
+ int i,j;
+ int sum;
+
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ for (j = 0; j < n; j++) {
+ sum += j;
+ }
+ a[i] = sum;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ foo (N);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += j;
+ if (a[i] != sum)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-12.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-12.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-12.c (revision 0)
***************
*** 0 ****
--- 1,50 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 64
+
+ int a[N];
+ short b[N];
+
+ int
+ foo (){
+ int i,j;
+ int sum;
+
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ for (j = 0; j < N; j++) {
+ sum += j;
+ }
+ a[i] = sum;
+ b[i] = (short)sum;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ foo ();
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += j;
+ if (a[i] != sum || b[i] != (short)sum)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* Until we support multiple types in the inner loop */
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-21.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-21.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-21.c (revision 0)
***************
*** 0 ****
--- 1,62 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (){
+ int i;
+ unsigned short j;
+ int sum = 0;
+ unsigned short sum_j;
+
+ for (i = 0; i < N; i++) {
+ sum += i;
+
+ sum_j = i;
+ for (j = 0; j < N; j++) {
+ sum_j += j;
+ }
+ a[i] = sum_j + 5;
+ }
+ return sum;
+ }
+
+ int main (void)
+ {
+ int i;
+ unsigned short j, sum_j;
+ int sum = 0;
+ int res;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ res = foo ();
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum += i;
+
+ sum_j = i;
+ for (j = 0; j < N; j++){
+ sum_j += j;
+ }
+ if (a[i] != sum_j + 5)
+ abort();
+ }
+ if (res != sum)
+ abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-2.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-2.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-2.c (revision 0)
***************
*** 0 ****
--- 1,18 ----
+ /* { dg-do compile } */
+ #define N 40
+
+ int
+ foo (){
+ int i,j;
+ int diff = 0;
+
+ for (i = 0; i < N; i++) {
+ for (j = 0; j < N; j++) {
+ diff += j;
+ }
+ }
+ return diff;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-13.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-13.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-13.c (revision 0)
***************
*** 0 ****
--- 1,67 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 16
+
+ unsigned short in[N];
+
+ unsigned int
+ foo (short scale){
+ int i;
+ unsigned short j;
+ unsigned int sum = 0;
+ unsigned short sum_j;
+
+ for (i = 0; i < N; i++) {
+ sum_j = 0;
+ for (j = 0; j < N; j++) {
+ sum_j += j;
+ }
+ sum += ((unsigned int) in[i] * (unsigned int) sum_j) >> scale;
+ }
+ return sum;
+ }
+
+ unsigned short
+ bar (void)
+ {
+ unsigned short j;
+ unsigned short sum_j;
+ sum_j = 0;
+ for (j = 0; j < N; j++) {
+ sum_j += j;
+ }
+ return sum_j;
+ }
+
+ int main (void)
+ {
+ int i;
+ unsigned short j, sum_j;
+ unsigned int sum = 0;
+ unsigned int res;
+
+ check_vect ();
+
+ for (i=0; i<N; i++){
+ in[i] = i;
+ }
+
+ res = foo (2);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum_j = bar ();
+ sum += ((unsigned int) in[i] * (unsigned int) sum_j) >> 2;
+ }
+ if (res != sum)
+ abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target vect_widen_mult_hi_to_si } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-1.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-1.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-1.c (revision 0)
***************
*** 0 ****
--- 1,50 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (){
+ int i,j,k=0;
+ int sum,x;
+
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ for (j = 0; j < N; j++) {
+ sum += (i + j);
+ i++;
+ }
+ a[k++] = sum;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j,k=0;
+ int sum;
+
+ check_vect ();
+
+ foo ();
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++){
+ sum += (j + i);
+ i++;
+ }
+ if (a[k++] != sum)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-22.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-22.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-22.c (revision 0)
***************
*** 0 ****
--- 1,54 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (int n){
+ int i,j;
+ int sum;
+
+ if (n<=0)
+ return 0;
+
+ /* inner-loop index j used after the inner-loop */
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ for (j = 0; j < n; j+=2) {
+ sum += j;
+ }
+ a[i] = sum + j;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ foo (N);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j+=2)
+ sum += j;
+ if (a[i] != sum + j)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-3.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-3.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-3.c (revision 0)
***************
*** 0 ****
--- 1,51 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (){
+ int i,j;
+ int sum;
+
+ /* inner-loop step > 1 */
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ for (j = 0; j < N; j+=2) {
+ sum += j;
+ }
+ a[i] = sum;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ foo ();
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j+=2)
+ sum += j;
+ if (a[i] != sum)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-14.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-14.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-14.c (revision 0)
***************
*** 0 ****
--- 1,61 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 64
+
+ unsigned short
+ foo (short scale){
+ int i;
+ unsigned short j;
+ unsigned short sum = 0;
+ unsigned short sum_j;
+
+ for (i = 0; i < N; i++) {
+ sum_j = 0;
+ for (j = 0; j < N; j++) {
+ sum_j += j;
+ }
+ sum += sum_j;
+ }
+ return sum;
+ }
+
+ unsigned short
+ bar (void)
+ {
+ unsigned short j;
+ unsigned short sum_j;
+ sum_j = 0;
+ for (j = 0; j < N; j++) {
+ sum_j += j;
+ }
+ return sum_j;
+ }
+
+ int main (void)
+ {
+ int i;
+ unsigned short j, sum_j;
+ unsigned short sum = 0;
+ unsigned short res;
+
+ check_vect ();
+
+ res = foo (2);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum_j = bar();
+ sum += sum_j;
+ }
+ if (res != sum)
+ abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { target vect_widen_mult_hi_to_si } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-2.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-2.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-2.c (revision 0)
***************
*** 0 ****
--- 1,49 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+ int a[200*N];
+
+ void
+ foo (){
+ int i,j;
+ int sum,s=0;
+
+ for (i = 0; i < 200*N; i++) {
+ sum = 0;
+ for (j = 0; j < N; j++) {
+ sum += (i + j);
+ i++;
+ }
+ a[i] = sum;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j,k=0;
+ int sum,s=0;
+
+ check_vect ();
+
+ foo ();
+
+ /* check results: */
+ for (i=0; i<200*N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++){
+ sum += (j + i);
+ i++;
+ }
+ if (a[i] != sum)
+ abort ();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-tree-scev-cprop-vect-iv-1.c
===================================================================
*** testsuite/gcc.dg/vect/no-tree-scev-cprop-vect-iv-1.c (revision 127356)
--- testsuite/gcc.dg/vect/no-tree-scev-cprop-vect-iv-1.c (working copy)
***************
*** 1,34 ****
- /* { dg-require-effective-target vect_int } */
-
- #include <stdarg.h>
- #include "tree-vect.h"
-
- #define N 26
-
- int main1 (int X)
- {
- int s = X;
- int i;
-
- /* vectorization of reduction with induction.
- Need -fno-tree-scev-cprop or else the loop is eliminated. */
- for (i = 0; i < N; i++)
- s += i;
-
- return s;
- }
-
- int main (void)
- {
- int s;
- check_vect ();
-
- s = main1 (3);
- if (s != 328)
- abort ();
-
- return 0;
- }
-
- /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } */
- /* { dg-final { cleanup-tree-dump "vect" } } */
--- 0 ----
Index: testsuite/gcc.dg/vect/no-scevccp-outer-4.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-4.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-4.c (revision 0)
***************
*** 0 ****
--- 1,55 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ /* induction variable k advances through inner and outer loops. */
+
+ int
+ foo (int n){
+ int i,j,k=0;
+ int sum;
+
+ if (n<=0)
+ return 0;
+
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ for (j = 0; j < n; j+=2) {
+ sum += k++;
+ }
+ a[i] = sum + j;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j,k=0;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ foo (N);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j+=2)
+ sum += k++;
+ if (a[i] != sum + j)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-15.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-15.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-15.c (revision 0)
***************
*** 0 ****
--- 1,48 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (int x){
+ int i,j;
+ int sum;
+
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ for (j = 0; j < N; j++) {
+ sum += j;
+ }
+ a[i] = sum + i + x;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+ int aa[N];
+
+ check_vect ();
+
+ foo (3);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += j;
+ if (a[i] != sum + i + 3)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-3.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-3.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-3.c (revision 0)
***************
*** 0 ****
--- 1,48 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (){
+ int i,j;
+ int sum,x;
+
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ for (j = 0; j < N; j++) {
+ sum += (i + j);
+ }
+ a[i] = sum;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ foo ();
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++){
+ sum += (j + i);
+ }
+ if (a[i] != sum)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-tree-scev-cprop-vect-iv-2.c
===================================================================
*** testsuite/gcc.dg/vect/no-tree-scev-cprop-vect-iv-2.c (revision 127356)
--- testsuite/gcc.dg/vect/no-tree-scev-cprop-vect-iv-2.c (working copy)
***************
*** 1,49 ****
- /* { dg-require-effective-target vect_int } */
-
- #include <stdarg.h>
- #include "tree-vect.h"
-
- #define N 16
-
- int main1 ()
- {
- int arr1[N];
- int k = 0;
- int m = 3, i = 0;
-
- /* Vectorization of induction that is used after the loop.
- Currently vectorizable because scev_ccp disconnects the
- use-after-the-loop from the iv def inside the loop. */
-
- do {
- k = k + 2;
- arr1[i] = k;
- m = m + k;
- i++;
- } while (i < N);
-
- /* check results: */
- for (i = 0; i < N; i++)
- {
- if (arr1[i] != 2+2*i)
- abort ();
- }
-
- return m + k;
- }
-
- int main (void)
- {
- int res;
-
- check_vect ();
-
- res = main1 ();
- if (res != 32 + 275)
- abort ();
-
- return 0;
- }
-
- /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { xfail *-*-* } } } */
- /* { dg-final { cleanup-tree-dump "vect" } } */
--- 0 ----
Index: testsuite/gcc.dg/vect/vect.exp
===================================================================
*** testsuite/gcc.dg/vect/vect.exp (revision 127371)
--- testsuite/gcc.dg/vect/vect.exp (working copy)
*************** dg-runtest [lsort [glob -nocomplain $src
*** 176,183 ****
# -fno-tree-scev-cprop
set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
lappend DEFAULT_VECTCFLAGS "-fno-tree-scev-cprop"
! dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-tree-scev-cprop-*.\[cS\]]] \
! "" $DEFAULT_VECTCFLAGS
# -fno-tree-dominator-opts
set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
--- 176,195 ----
# -fno-tree-scev-cprop
set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
lappend DEFAULT_VECTCFLAGS "-fno-tree-scev-cprop"
! dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-scevccp-vect-*.\[cS\]]] \
! "" $DEFAULT_VECTCFLAGS
!
! # -fno-tree-scev-cprop
! set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
! lappend DEFAULT_VECTCFLAGS "-fno-tree-scev-cprop"
! dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-scevccp-outer-*.\[cS\]]] \
! "" $DEFAULT_VECTCFLAGS
!
! # -fno-tree-scev-cprop -fno-tree-reassoc
! set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
! lappend DEFAULT_VECTCFLAGS "-fno-tree-scev-cprop" "-fno-tree-reassoc"
! dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/no-scevccp-noreassoc-*.\[cS\]]] \
! "" $DEFAULT_VECTCFLAGS
# -fno-tree-dominator-opts
set DEFAULT_VECTCFLAGS $SAVED_DEFAULT_VECTCFLAGS
Index: testsuite/gcc.dg/vect/no-scevccp-outer-5.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-5.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-5.c (revision 0)
***************
*** 0 ****
--- 1,53 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (){
+ int i,j;
+ int sum;
+
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ for (j = 0; j < N; j++) {
+ sum += j;
+ }
+ a[i] += sum + i;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+ int aa[N];
+
+ check_vect ();
+
+ for (i=0; i<N; i++){
+ a[i] = i;
+ aa[i] = i;
+ }
+
+ foo ();
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += j;
+ if (a[i] != aa[i] + sum + i)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-9a.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-9a.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-9a.c (revision 0)
***************
*** 0 ****
--- 1,54 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (int n){
+ int i,j;
+ int sum;
+
+ if (n<=0)
+ return 0;
+
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ j = 0;
+ do {
+ sum += j;
+ }while (++j < n);
+ a[i] = sum;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ foo (N);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += j;
+ if (a[i] != sum)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-16.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-16.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-16.c (revision 0)
***************
*** 0 ****
--- 1,62 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (){
+ int i;
+ unsigned short j;
+ int sum = 0;
+ unsigned short sum_j;
+
+ for (i = 0; i < N; i++) {
+ sum += i;
+
+ sum_j = 0;
+ for (j = 0; j < N; j++) {
+ sum_j += j;
+ }
+ a[i] = sum_j + 5;
+ }
+ return sum;
+ }
+
+ int main (void)
+ {
+ int i;
+ unsigned short j, sum_j;
+ int sum = 0;
+ int res;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ res = foo ();
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum += i;
+
+ sum_j = 0;
+ for (j = 0; j < N; j++){
+ sum_j += j;
+ }
+ if (a[i] != sum_j + 5)
+ abort();
+ }
+ if (res != sum)
+ abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-4.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-4.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-4.c (revision 0)
***************
*** 0 ****
--- 1,56 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int
+ foo (){
+ int i,j;
+ int sum,s=0;
+
+ for (i = 0; i < 200*N; i++) {
+ sum = 0;
+ for (j = 0; j < N; j++) {
+ sum += (i + j);
+ i++;
+ }
+ s += sum;
+ }
+ return s;
+ }
+
+ int bar (int i, int j)
+ {
+ return (i + j);
+ }
+
+ int main (void)
+ {
+ int i,j,k=0;
+ int sum,s=0;
+ int res;
+
+ check_vect ();
+
+ res = foo ();
+
+ /* check results: */
+ for (i=0; i<200*N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++){
+ sum += bar (i, j);
+ i++;
+ }
+ s += sum;
+ }
+ if (res != s)
+ abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-tree-scev-cprop-vect-iv-3.c
===================================================================
*** testsuite/gcc.dg/vect/no-tree-scev-cprop-vect-iv-3.c (revision 127356)
--- testsuite/gcc.dg/vect/no-tree-scev-cprop-vect-iv-3.c (working copy)
***************
*** 1,27 ****
- /* { dg-do compile } */
- /* { dg-require-effective-target vect_int } */
-
- #include <stdarg.h>
- #include "tree-vect.h"
-
- #define N 26
-
- unsigned int main1 ()
- {
- unsigned short i;
- unsigned int intsum = 0;
-
- /* vectorization of reduction with induction, and widenning sum:
- sum shorts into int.
- Need -fno-tree-scev-cprop or else the loop is eliminated. */
- for (i = 0; i < N; i++)
- {
- intsum += i;
- }
-
- return intsum;
- }
-
- /* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target vect_widen_sum_hi_to_si } } } */
- /* { dg-final { scan-tree-dump-times "vect_recog_widen_sum_pattern: detected" 1 "vect" { target vect_widen_sum_hi_to_si } } } */
- /* { dg-final { cleanup-tree-dump "vect" } } */
--- 0 ----
Index: testsuite/gcc.dg/vect/no-scevccp-outer-6.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-6.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-6.c (revision 0)
***************
*** 0 ****
--- 1,56 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int
+ foo (int * __restrict__ b, int k){
+ int i,j;
+ int sum,x;
+ int a[N];
+
+ for (i = 0; i < N; i++) {
+ sum = b[i];
+ for (j = 0; j < N; j++) {
+ sum += j;
+ }
+ a[i] = sum;
+ }
+
+ return a[k];
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+ int b[N];
+ int a[N];
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ b[i] = i + 2;
+
+ for (i=0; i<N; i++)
+ a[i] = foo (b,i);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = b[i];
+ for (j = 0; j < N; j++){
+ sum += j;
+ }
+ if (a[i] != sum)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { scan-tree-dump-times "vect_recog_widen_mult_pattern: detected" 1 "vect" { xfail *-*-* } } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-9b.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-9b.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-9b.c (revision 0)
***************
*** 0 ****
--- 1,53 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (int n){
+ int i,j;
+ int sum;
+
+ if (n<=0)
+ return 0;
+
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ for (j = 0; j < n; j++) {
+ sum += j;
+ }
+ a[i] = sum;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+
+ check_vect ();
+
+ for (i=0; i<N; i++)
+ a[i] = i;
+
+ foo (N);
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += j;
+ if (a[i] != sum)
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-outer-17.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-outer-17.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-outer-17.c (revision 0)
***************
*** 0 ****
--- 1,68 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+ int b[N];
+ int c[N];
+
+ int
+ foo (){
+ int i;
+ unsigned short j;
+ int sum = 0;
+ unsigned short sum_j;
+
+ for (i = 0; i < N; i++) {
+ int diff = b[i] - c[i];
+
+ sum_j = 0;
+ for (j = 0; j < N; j++) {
+ sum_j += j;
+ }
+ a[i] = sum_j + 5;
+
+ sum += diff;
+ }
+ return sum;
+ }
+
+ int main (void)
+ {
+ int i;
+ unsigned short j, sum_j;
+ int sum = 0;
+ int res;
+
+ check_vect ();
+
+ for (i=0; i<N; i++){
+ b[i] = i;
+ c[i] = 2*i;
+ }
+
+ res = foo ();
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum += (b[i] - c[i]);
+
+ sum_j = 0;
+ for (j = 0; j < N; j++){
+ sum_j += j;
+ }
+ if (a[i] != sum_j + 5)
+ abort();
+ }
+ if (res != sum)
+ abort ();
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-5.c
===================================================================
*** testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-5.c (revision 0)
--- testsuite/gcc.dg/vect/no-scevccp-noreassoc-outer-5.c (revision 0)
***************
*** 0 ****
--- 1,54 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include <stdarg.h>
+ #include "tree-vect.h"
+
+ #define N 40
+
+ int a[N];
+
+ int
+ foo (){
+ int i,j;
+ int sum,x;
+
+ for (i = 0; i < N; i++) {
+ sum = 0;
+ x = a[i];
+ for (j = 0; j < N; j++) {
+ sum += (x + j);
+ }
+ a[i] = sum + i + x;
+ }
+ }
+
+ int main (void)
+ {
+ int i,j;
+ int sum;
+ int aa[N];
+
+ check_vect ();
+
+ for (i=0; i<N; i++){
+ a[i] = i;
+ aa[i] = i;
+ }
+
+ foo ();
+
+ /* check results: */
+ for (i=0; i<N; i++)
+ {
+ sum = 0;
+ for (j = 0; j < N; j++)
+ sum += (j + aa[i]);
+ if (a[i] != sum + i + aa[i])
+ abort();
+ }
+
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED." 1 "vect" } } */
+ /* { dg-final { cleanup-tree-dump "vect" } } */
Index: tree-vectorizer.c
===================================================================
*** tree-vectorizer.c (revision 127356)
--- tree-vectorizer.c (working copy)
*************** new_stmt_vec_info (tree stmt, loop_vec_i
*** 1345,1351 ****
STMT_VINFO_IN_PATTERN_P (res) = false;
STMT_VINFO_RELATED_STMT (res) = NULL;
STMT_VINFO_DATA_REF (res) = NULL;
! if (TREE_CODE (stmt) == PHI_NODE)
STMT_VINFO_DEF_TYPE (res) = vect_unknown_def_type;
else
STMT_VINFO_DEF_TYPE (res) = vect_loop_def;
--- 1345,1351 ----
STMT_VINFO_IN_PATTERN_P (res) = false;
STMT_VINFO_RELATED_STMT (res) = NULL;
STMT_VINFO_DATA_REF (res) = NULL;
! if (TREE_CODE (stmt) == PHI_NODE && is_loop_header_bb_p (bb_for_stmt (stmt)))
STMT_VINFO_DEF_TYPE (res) = vect_unknown_def_type;
else
STMT_VINFO_DEF_TYPE (res) = vect_loop_def;
*************** new_stmt_vec_info (tree stmt, loop_vec_i
*** 1364,1369 ****
--- 1364,1383 ----
}
+ /* Function bb_in_loop_p
+
+ Used as predicate for dfs order traversal of the loop bbs. */
+
+ static bool
+ bb_in_loop_p (basic_block bb, void *data)
+ {
+ struct loop *loop = (struct loop *)data;
+ if (flow_bb_inside_loop_p (loop, bb))
+ return true;
+ return false;
+ }
+
+
/* Function new_loop_vec_info.
Create and initialize a new loop_vec_info struct for LOOP, as well as
*************** new_loop_vec_info (struct loop *loop)
*** 1375,1392 ****
loop_vec_info res;
basic_block *bbs;
block_stmt_iterator si;
! unsigned int i;
res = (loop_vec_info) xcalloc (1, sizeof (struct _loop_vec_info));
bbs = get_loop_body (loop);
! /* Create stmt_info for all stmts in the loop. */
for (i = 0; i < loop->num_nodes; i++)
{
basic_block bb = bbs[i];
tree phi;
for (phi = phi_nodes (bb); phi; phi = PHI_CHAIN (phi))
{
stmt_ann_t ann = get_stmt_ann (phi);
--- 1389,1437 ----
loop_vec_info res;
basic_block *bbs;
block_stmt_iterator si;
! unsigned int i, nbbs;
res = (loop_vec_info) xcalloc (1, sizeof (struct _loop_vec_info));
+ LOOP_VINFO_LOOP (res) = loop;
bbs = get_loop_body (loop);
! /* Create/Update stmt_info for all stmts in the loop. */
for (i = 0; i < loop->num_nodes; i++)
{
basic_block bb = bbs[i];
tree phi;
+ /* BBs in a nested inner-loop will have been already processed (because
+ we will have called vect_analyze_loop_form for any nested inner-loop).
+ Therefore, for stmts in an inner-loop we just want to update the
+ STMT_VINFO_LOOP_VINFO field of their stmt_info to point to the new
+ loop_info of the outer-loop we are currently considering to vectorize
+ (instead of the loop_info of the inner-loop).
+ For stmts in other BBs we need to create a stmt_info from scratch. */
+ if (bb->loop_father != loop)
+ {
+ /* Inner-loop bb. */
+ gcc_assert (loop->inner && bb->loop_father == loop->inner);
+ for (phi = phi_nodes (bb); phi; phi = PHI_CHAIN (phi))
+ {
+ stmt_vec_info stmt_info = vinfo_for_stmt (phi);
+ loop_vec_info inner_loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ gcc_assert (loop->inner == LOOP_VINFO_LOOP (inner_loop_vinfo));
+ STMT_VINFO_LOOP_VINFO (stmt_info) = res;
+ }
+ for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
+ {
+ tree stmt = bsi_stmt (si);
+ stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+ loop_vec_info inner_loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ gcc_assert (loop->inner == LOOP_VINFO_LOOP (inner_loop_vinfo));
+ STMT_VINFO_LOOP_VINFO (stmt_info) = res;
+ }
+ }
+ else
+ {
+ /* bb in current nest. */
for (phi = phi_nodes (bb); phi; phi = PHI_CHAIN (phi))
{
stmt_ann_t ann = get_stmt_ann (phi);
*************** new_loop_vec_info (struct loop *loop)
*** 1396,1411 ****
for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
{
tree stmt = bsi_stmt (si);
! stmt_ann_t ann;
!
! ann = stmt_ann (stmt);
set_stmt_info (ann, new_stmt_vec_info (stmt, res));
}
}
- LOOP_VINFO_LOOP (res) = loop;
LOOP_VINFO_BBS (res) = bbs;
- LOOP_VINFO_EXIT_COND (res) = NULL;
LOOP_VINFO_NITERS (res) = NULL;
LOOP_VINFO_COST_MODEL_MIN_ITERS (res) = 0;
LOOP_VINFO_VECTORIZABLE_P (res) = 0;
--- 1441,1464 ----
for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
{
tree stmt = bsi_stmt (si);
! stmt_ann_t ann = stmt_ann (stmt);
set_stmt_info (ann, new_stmt_vec_info (stmt, res));
}
}
+ }
+
+ /* CHECKME: We want to visit all BBs before their successors (except for
+ latch blocks, for which this assertion wouldn't hold). In the simple
+ case of the loop forms we allow, a dfs order of the BBs would the same
+ as reversed postorder traversal, so we are safe. */
+
+ free (bbs);
+ bbs = XCNEWVEC (basic_block, loop->num_nodes);
+ nbbs = dfs_enumerate_from (loop->header, 0, bb_in_loop_p,
+ bbs, loop->num_nodes, loop);
+ gcc_assert (nbbs == loop->num_nodes);
LOOP_VINFO_BBS (res) = bbs;
LOOP_VINFO_NITERS (res) = NULL;
LOOP_VINFO_COST_MODEL_MIN_ITERS (res) = 0;
LOOP_VINFO_VECTORIZABLE_P (res) = 0;
*************** new_loop_vec_info (struct loop *loop)
*** 1427,1433 ****
stmts in the loop. */
void
! destroy_loop_vec_info (loop_vec_info loop_vinfo)
{
struct loop *loop;
basic_block *bbs;
--- 1480,1486 ----
stmts in the loop. */
void
! destroy_loop_vec_info (loop_vec_info loop_vinfo, bool clean_stmts)
{
struct loop *loop;
basic_block *bbs;
*************** destroy_loop_vec_info (loop_vec_info loo
*** 1443,1448 ****
--- 1496,1513 ----
bbs = LOOP_VINFO_BBS (loop_vinfo);
nbbs = loop->num_nodes;
+ if (!clean_stmts)
+ {
+ free (LOOP_VINFO_BBS (loop_vinfo));
+ free_data_refs (LOOP_VINFO_DATAREFS (loop_vinfo));
+ free_dependence_relations (LOOP_VINFO_DDRS (loop_vinfo));
+ VEC_free (tree, heap, LOOP_VINFO_MAY_MISALIGN_STMTS (loop_vinfo));
+
+ free (loop_vinfo);
+ loop->aux = NULL;
+ return;
+ }
+
for (j = 0; j < nbbs; j++)
{
basic_block bb = bbs[j];
*************** vect_is_simple_use (tree operand, loop_v
*** 1714,1721 ****
{
case PHI_NODE:
*def = PHI_RESULT (*def_stmt);
- gcc_assert (*dt == vect_induction_def || *dt == vect_reduction_def
- || *dt == vect_invariant_def);
break;
case GIMPLE_MODIFY_STMT:
--- 1779,1784 ----
*************** supportable_widening_operation (enum tre
*** 1756,1761 ****
--- 1819,1826 ----
enum tree_code *code1, enum tree_code *code2)
{
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
+ loop_vec_info loop_info = STMT_VINFO_LOOP_VINFO (stmt_info);
+ struct loop *vect_loop = LOOP_VINFO_LOOP (loop_info);
bool ordered_p;
enum machine_mode vec_mode;
enum insn_code icode1, icode2;
*************** supportable_widening_operation (enum tre
*** 1778,1786 ****
Some targets can take advantage of this and generate more efficient code.
For example, targets like Altivec, that support widen_mult using a sequence
of {mult_even,mult_odd} generate the following vectors:
! vect1: [res1,res3,res5,res7], vect2: [res2,res4,res6,res8]. */
! if (STMT_VINFO_RELEVANT (stmt_info) == vect_used_by_reduction)
ordered_p = false;
else
ordered_p = true;
--- 1843,1857 ----
Some targets can take advantage of this and generate more efficient code.
For example, targets like Altivec, that support widen_mult using a sequence
of {mult_even,mult_odd} generate the following vectors:
! vect1: [res1,res3,res5,res7], vect2: [res2,res4,res6,res8].
! When vectorizaing outer-loops, we execute the inner-loop sequentially
! (each vectorized inner-loop iteration contributes to VF outer-loop
! iterations in parallel). We therefore don't allow to change the order
! of the computation in the inner-loop during outer-loop vectorization. */
!
! if (STMT_VINFO_RELEVANT (stmt_info) == vect_used_by_reduction
! && !nested_in_vect_loop_p (vect_loop, stmt))
ordered_p = false;
else
ordered_p = true;
*************** reduction_code_for_scalar_code (enum tre
*** 2004,2011 ****
Conditions 2,3 are tested in vect_mark_stmts_to_be_vectorized. */
tree
! vect_is_simple_reduction (struct loop *loop, tree phi)
{
edge latch_e = loop_latch_edge (loop);
tree loop_arg = PHI_ARG_DEF_FROM_EDGE (phi, latch_e);
tree def_stmt, def1, def2;
--- 2075,2084 ----
Conditions 2,3 are tested in vect_mark_stmts_to_be_vectorized. */
tree
! vect_is_simple_reduction (loop_vec_info loop_info, tree phi)
{
+ struct loop *loop = (bb_for_stmt (phi))->loop_father;
+ struct loop *vect_loop = LOOP_VINFO_LOOP (loop_info);
edge latch_e = loop_latch_edge (loop);
tree loop_arg = PHI_ARG_DEF_FROM_EDGE (phi, latch_e);
tree def_stmt, def1, def2;
*************** vect_is_simple_reduction (struct loop *l
*** 2018,2023 ****
--- 2091,2098 ----
imm_use_iterator imm_iter;
use_operand_p use_p;
+ gcc_assert (loop == vect_loop || flow_loop_nested_p (vect_loop, loop));
+
name = PHI_RESULT (phi);
nloop_uses = 0;
FOR_EACH_IMM_USE_FAST (use_p, imm_iter, name)
*************** vect_is_simple_reduction (struct loop *l
*** 2129,2136 ****
return NULL_TREE;
}
/* CHECKME: check for !flag_finite_math_only too? */
! if (SCALAR_FLOAT_TYPE_P (type) && !flag_unsafe_math_optimizations)
{
/* Changing the order of operations changes the semantics. */
if (vect_print_dump_info (REPORT_DETAILS))
--- 2204,2219 ----
return NULL_TREE;
}
+ /* Generally, when vectorizing a reduction we change the order of the
+ computation. This may change the behavior of the program in some
+ cases, so we need to check that this is ok. One exception is when
+ vectorizing an outer-loop: the inner-loop is executed sequentially,
+ and therefore vectorizing reductions in the inner-loop durint
+ outer-loop vectorization is safe. */
+
/* CHECKME: check for !flag_finite_math_only too? */
! if (SCALAR_FLOAT_TYPE_P (type) && !flag_unsafe_math_optimizations
! && !nested_in_vect_loop_p (vect_loop, def_stmt))
{
/* Changing the order of operations changes the semantics. */
if (vect_print_dump_info (REPORT_DETAILS))
*************** vect_is_simple_reduction (struct loop *l
*** 2140,2146 ****
}
return NULL_TREE;
}
! else if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_TRAPS (type))
{
/* Changing the order of operations changes the semantics. */
if (vect_print_dump_info (REPORT_DETAILS))
--- 2223,2230 ----
}
return NULL_TREE;
}
! else if (INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_TRAPS (type)
! && !nested_in_vect_loop_p (vect_loop, def_stmt))
{
/* Changing the order of operations changes the semantics. */
if (vect_print_dump_info (REPORT_DETAILS))
*************** vect_is_simple_reduction (struct loop *l
*** 2179,2191 ****
/* Check that one def is the reduction def, defined by PHI,
! the other def is either defined in the loop by a GIMPLE_MODIFY_STMT,
! or it's an induction (defined by some phi node). */
if (def2 == phi
&& flow_bb_inside_loop_p (loop, bb_for_stmt (def1))
&& (TREE_CODE (def1) == GIMPLE_MODIFY_STMT
! || STMT_VINFO_DEF_TYPE (vinfo_for_stmt (def1)) == vect_induction_def))
{
if (vect_print_dump_info (REPORT_DETAILS))
{
--- 2263,2278 ----
/* Check that one def is the reduction def, defined by PHI,
! the other def is either defined in the loop ("vect_loop_def"),
! or it's an induction (defined by a loop-header phi-node). */
if (def2 == phi
&& flow_bb_inside_loop_p (loop, bb_for_stmt (def1))
&& (TREE_CODE (def1) == GIMPLE_MODIFY_STMT
! || STMT_VINFO_DEF_TYPE (vinfo_for_stmt (def1)) == vect_induction_def
! || (TREE_CODE (def1) == PHI_NODE
! && STMT_VINFO_DEF_TYPE (vinfo_for_stmt (def1)) == vect_loop_def
! && !is_loop_header_bb_p (bb_for_stmt (def1)))))
{
if (vect_print_dump_info (REPORT_DETAILS))
{
*************** vect_is_simple_reduction (struct loop *l
*** 2197,2203 ****
else if (def1 == phi
&& flow_bb_inside_loop_p (loop, bb_for_stmt (def2))
&& (TREE_CODE (def2) == GIMPLE_MODIFY_STMT
! || STMT_VINFO_DEF_TYPE (vinfo_for_stmt (def2)) == vect_induction_def))
{
/* Swap operands (just for simplicity - so that the rest of the code
can assume that the reduction variable is always the last (second)
--- 2284,2293 ----
else if (def1 == phi
&& flow_bb_inside_loop_p (loop, bb_for_stmt (def2))
&& (TREE_CODE (def2) == GIMPLE_MODIFY_STMT
! || STMT_VINFO_DEF_TYPE (vinfo_for_stmt (def2)) == vect_induction_def
! || (TREE_CODE (def2) == PHI_NODE
! && STMT_VINFO_DEF_TYPE (vinfo_for_stmt (def2)) == vect_loop_def
! && !is_loop_header_bb_p (bb_for_stmt (def2)))))
{
/* Swap operands (just for simplicity - so that the rest of the code
can assume that the reduction variable is always the last (second)
*************** vectorize_loops (void)
*** 2336,2342 ****
if (!loop)
continue;
loop_vinfo = loop->aux;
! destroy_loop_vec_info (loop_vinfo);
loop->aux = NULL;
}
--- 2426,2432 ----
if (!loop)
continue;
loop_vinfo = loop->aux;
! destroy_loop_vec_info (loop_vinfo, true);
loop->aux = NULL;
}
Index: tree-vectorizer.h
===================================================================
*** tree-vectorizer.h (revision 127356)
--- tree-vectorizer.h (working copy)
*************** typedef struct _loop_vec_info {
*** 92,100 ****
/* The loop basic blocks. */
basic_block *bbs;
- /* The loop exit_condition. */
- tree exit_cond;
-
/* Number of iterations. */
tree num_iters;
--- 92,97 ----
*************** typedef struct _loop_vec_info {
*** 144,150 ****
/* Access Functions. */
#define LOOP_VINFO_LOOP(L) (L)->loop
#define LOOP_VINFO_BBS(L) (L)->bbs
- #define LOOP_VINFO_EXIT_COND(L) (L)->exit_cond
#define LOOP_VINFO_NITERS(L) (L)->num_iters
#define LOOP_VINFO_COST_MODEL_MIN_ITERS(L) (L)->min_profitable_iters
#define LOOP_VINFO_VECTORIZABLE_P(L) (L)->vectorizable
--- 141,146 ----
*************** typedef struct _loop_vec_info {
*** 165,170 ****
--- 161,179 ----
#define LOOP_VINFO_NITERS_KNOWN_P(L) \
NITERS_KNOWN_P((L)->num_iters)
+ static inline loop_vec_info
+ loop_vec_info_for_loop (struct loop *loop)
+ {
+ return (loop_vec_info) loop->aux;
+ }
+
+ static inline bool
+ nested_in_vect_loop_p (struct loop *loop, tree stmt)
+ {
+ return (loop->inner
+ && (loop->inner == (bb_for_stmt (stmt))->loop_father));
+ }
+
/*-----------------------------------------------------------------*/
/* Info on vectorized defs. */
/*-----------------------------------------------------------------*/
*************** enum stmt_vec_info_type {
*** 180,191 ****
induc_vec_info_type,
type_promotion_vec_info_type,
type_demotion_vec_info_type,
! type_conversion_vec_info_type
};
/* Indicates whether/how a variable is used in the loop. */
enum vect_relevant {
vect_unused_in_loop = 0,
/* defs that feed computations that end up (only) in a reduction. These
defs may be used by non-reduction stmts, but eventually, any
--- 189,203 ----
induc_vec_info_type,
type_promotion_vec_info_type,
type_demotion_vec_info_type,
! type_conversion_vec_info_type,
! loop_exit_ctrl_vec_info_type
};
/* Indicates whether/how a variable is used in the loop. */
enum vect_relevant {
vect_unused_in_loop = 0,
+ vect_used_in_outer_by_reduction,
+ vect_used_in_outer,
/* defs that feed computations that end up (only) in a reduction. These
defs may be used by non-reduction stmts, but eventually, any
*************** is_pattern_stmt_p (stmt_vec_info stmt_in
*** 403,408 ****
--- 415,429 ----
return false;
}
+ static inline bool
+ is_loop_header_bb_p (basic_block bb)
+ {
+ if (bb == (bb->loop_father)->header)
+ return true;
+ gcc_assert (EDGE_COUNT (bb->preds) == 1);
+ return false;
+ }
+
/*-----------------------------------------------------------------*/
/* Info on data references alignment. */
/*-----------------------------------------------------------------*/
*************** extern tree get_vectype_for_scalar_type
*** 462,468 ****
extern bool vect_is_simple_use (tree, loop_vec_info, tree *, tree *,
enum vect_def_type *);
extern bool vect_is_simple_iv_evolution (unsigned, tree, tree *, tree *);
! extern tree vect_is_simple_reduction (struct loop *, tree);
extern bool vect_can_force_dr_alignment_p (tree, unsigned int);
extern enum dr_alignment_support vect_supportable_dr_alignment
(struct data_reference *);
--- 483,489 ----
extern bool vect_is_simple_use (tree, loop_vec_info, tree *, tree *,
enum vect_def_type *);
extern bool vect_is_simple_iv_evolution (unsigned, tree, tree *, tree *);
! extern tree vect_is_simple_reduction (loop_vec_info, tree);
extern bool vect_can_force_dr_alignment_p (tree, unsigned int);
extern enum dr_alignment_support vect_supportable_dr_alignment
(struct data_reference *);
*************** extern bool supportable_narrowing_operat
*** 474,480 ****
/* Creation and deletion of loop and stmt info structs. */
extern loop_vec_info new_loop_vec_info (struct loop *loop);
! extern void destroy_loop_vec_info (loop_vec_info);
extern stmt_vec_info new_stmt_vec_info (tree stmt, loop_vec_info);
--- 495,501 ----
/* Creation and deletion of loop and stmt info structs. */
extern loop_vec_info new_loop_vec_info (struct loop *loop);
! extern void destroy_loop_vec_info (loop_vec_info, bool);
extern stmt_vec_info new_stmt_vec_info (tree stmt, loop_vec_info);
Index: tree-vect-analyze.c
===================================================================
*** tree-vect-analyze.c (revision 127356)
--- tree-vect-analyze.c (working copy)
*************** vect_analyze_operations (loop_vec_info l
*** 325,330 ****
--- 325,348 ----
print_generic_expr (vect_dump, phi, TDF_SLIM);
}
+ if (! is_loop_header_bb_p (bb))
+ {
+ /* inner-loop loop-closed exit phi in outer-loop vectorization
+ (i.e. a phi in the tail of the outer-loop).
+ FORNOW: we currently don't support the case that these phis
+ are not used in the outerloop, cause this case requires
+ to actually do something here. */
+ if (!STMT_VINFO_RELEVANT_P (stmt_info)
+ || STMT_VINFO_LIVE_P (stmt_info))
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump,
+ "Unsupported loop-closed phi in outer-loop.");
+ return false;
+ }
+ continue;
+ }
+
gcc_assert (stmt_info);
if (STMT_VINFO_LIVE_P (stmt_info))
*************** vect_analyze_operations (loop_vec_info l
*** 398,404 ****
break;
case vect_reduction_def:
! gcc_assert (relevance == vect_unused_in_loop);
break;
case vect_induction_def:
--- 416,424 ----
break;
case vect_reduction_def:
! gcc_assert (relevance == vect_used_in_outer
! || relevance == vect_used_in_outer_by_reduction
! || relevance == vect_unused_in_loop);
break;
case vect_induction_def:
*************** exist_non_indexing_operands_for_use_p (t
*** 589,638 ****
}
! /* Function vect_analyze_scalar_cycles.
!
! Examine the cross iteration def-use cycles of scalar variables, by
! analyzing the loop (scalar) PHIs; Classify each cycle as one of the
! following: invariant, induction, reduction, unknown.
!
! Some forms of scalar cycles are not yet supported.
!
! Example1: reduction: (unsupported yet)
!
! loop1:
! for (i=0; i<N; i++)
! sum += a[i];
!
! Example2: induction: (unsupported yet)
!
! loop2:
! for (i=0; i<N; i++)
! a[i] = i;
!
! Note: the following loop *is* vectorizable:
!
! loop3:
! for (i=0; i<N; i++)
! a[i] = b[i];
! even though it has a def-use cycle caused by the induction variable i:
!
! loop: i_2 = PHI (i_0, i_1)
! a[i_2] = ...;
! i_1 = i_2 + 1;
! GOTO loop;
!
! because the def-use cycle in loop3 is considered "not relevant" - i.e.,
! it does not need to be vectorized because it is only used for array
! indexing (see 'mark_stmts_to_be_vectorized'). The def-use cycle in
! loop2 on the other hand is relevant (it is being written to memory).
! */
static void
! vect_analyze_scalar_cycles (loop_vec_info loop_vinfo)
{
tree phi;
- struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
basic_block bb = loop->header;
tree dumy;
VEC(tree,heap) *worklist = VEC_alloc (tree, heap, 64);
--- 609,625 ----
}
! /* Function vect_analyze_scalar_cycles_1.
! Examine the cross iteration def-use cycles of scalar variables
! in LOOP. LOOP_VINFO represents the loop that is noe being
! considered for vectorization (can be LOOP, or an outer-loop
! enclosing LOOP). */
static void
! vect_analyze_scalar_cycles_1 (loop_vec_info loop_vinfo, struct loop *loop)
{
tree phi;
basic_block bb = loop->header;
tree dumy;
VEC(tree,heap) *worklist = VEC_alloc (tree, heap, 64);
*************** vect_analyze_scalar_cycles (loop_vec_inf
*** 698,704 ****
gcc_assert (is_gimple_reg (SSA_NAME_VAR (def)));
gcc_assert (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_unknown_def_type);
! reduc_stmt = vect_is_simple_reduction (loop, phi);
if (reduc_stmt)
{
if (vect_print_dump_info (REPORT_DETAILS))
--- 685,691 ----
gcc_assert (is_gimple_reg (SSA_NAME_VAR (def)));
gcc_assert (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_unknown_def_type);
! reduc_stmt = vect_is_simple_reduction (loop_vinfo, phi);
if (reduc_stmt)
{
if (vect_print_dump_info (REPORT_DETAILS))
*************** vect_analyze_scalar_cycles (loop_vec_inf
*** 717,722 ****
--- 704,751 ----
}
+ /* Function vect_analyze_scalar_cycles.
+
+ Examine the cross iteration def-use cycles of scalar variables, by
+ analyzing the loop-header PHIs of scalar variables; Classify each
+ cycle as one of the following: invariant, induction, reduction, unknown.
+ We do that for the loop represented by LOOP_VINFO, and also to its
+ inner-loop, if exists.
+ Examples for scalar cycles:
+
+ Example1: reduction:
+
+ loop1:
+ for (i=0; i<N; i++)
+ sum += a[i];
+
+ Example2: induction:
+
+ loop2:
+ for (i=0; i<N; i++)
+ a[i] = i; */
+
+ static void
+ vect_analyze_scalar_cycles (loop_vec_info loop_vinfo)
+ {
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+
+ vect_analyze_scalar_cycles_1 (loop_vinfo, loop);
+
+ /* When vectorizing an outer-loop, the inner-loop is executed sequentially.
+ Reductions in such inner-loop therefore have different properties than
+ the reductions in the nest that gets vectorized:
+ 1. When vectorized, they are executed in the same order as in the original
+ scalar loop, so we can't change the order of computation when
+ vectorizing them.
+ 2. FIXME: Inner-loop reductions can be used in the inner-loop, so the
+ current checks are too strict. */
+
+ if (loop->inner)
+ vect_analyze_scalar_cycles_1 (loop_vinfo, loop->inner);
+ }
+
+
/* Function vect_insert_into_interleaving_chain.
Insert DRA into the interleaving chain of DRB according to DRA's INIT. */
*************** vect_enhance_data_refs_alignment (loop_v
*** 1722,1728 ****
4) all misaligned data refs with a known misalignment are supported, and
5) the number of runtime alignment checks is within reason. */
! do_versioning = flag_tree_vect_loop_version && (!optimize_size);
if (do_versioning)
{
--- 1751,1760 ----
4) all misaligned data refs with a known misalignment are supported, and
5) the number of runtime alignment checks is within reason. */
! do_versioning =
! flag_tree_vect_loop_version
! && (!optimize_size)
! && (!loop->inner);
if (do_versioning)
{
*************** vect_analyze_data_refs (loop_vec_info lo
*** 2105,2110 ****
--- 2137,2143 ----
{
tree stmt;
stmt_vec_info stmt_info;
+ basic_block bb;
if (!dr || !DR_REF (dr))
{
*************** vect_analyze_data_refs (loop_vec_info lo
*** 2117,2122 ****
--- 2150,2165 ----
stmt = DR_STMT (dr);
stmt_info = vinfo_for_stmt (stmt);
+ /* If outer-loop vectorization: we don't yet support datarefs
+ in the innermost loop. */
+ bb = bb_for_stmt (stmt);
+ if (bb->loop_father != LOOP_VINFO_LOOP (loop_vinfo))
+ {
+ if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
+ fprintf (vect_dump, "not vectorized: data-ref in nested loop");
+ return false;
+ }
+
if (STMT_VINFO_DATA_REF (stmt_info))
{
if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
*************** vect_mark_relevant (VEC(tree,heap) **wor
*** 2204,2214 ****
/* This is the last stmt in a sequence that was detected as a
pattern that can potentially be vectorized. Don't mark the stmt
! as relevant/live because it's not going to vectorized.
Instead mark the pattern-stmt that replaces it. */
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "last stmt in pattern. don't mark relevant/live.");
- pattern_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
stmt_info = vinfo_for_stmt (pattern_stmt);
gcc_assert (STMT_VINFO_RELATED_STMT (stmt_info) == stmt);
save_relevant = STMT_VINFO_RELEVANT (stmt_info);
--- 2247,2259 ----
/* This is the last stmt in a sequence that was detected as a
pattern that can potentially be vectorized. Don't mark the stmt
! as relevant/live because it's not going to be vectorized.
Instead mark the pattern-stmt that replaces it. */
+
+ pattern_stmt = STMT_VINFO_RELATED_STMT (stmt_info);
+
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "last stmt in pattern. don't mark relevant/live.");
stmt_info = vinfo_for_stmt (pattern_stmt);
gcc_assert (STMT_VINFO_RELATED_STMT (stmt_info) == stmt);
save_relevant = STMT_VINFO_RELEVANT (stmt_info);
*************** vect_stmt_relevant_p (tree stmt, loop_ve
*** 2258,2264 ****
*live_p = false;
/* cond stmt other than loop exit cond. */
! if (is_ctrl_stmt (stmt) && (stmt != LOOP_VINFO_EXIT_COND (loop_vinfo)))
*relevant = vect_used_in_loop;
/* changing memory. */
--- 2303,2310 ----
*live_p = false;
/* cond stmt other than loop exit cond. */
! if (is_ctrl_stmt (stmt)
! && STMT_VINFO_TYPE (vinfo_for_stmt (stmt)) != loop_exit_ctrl_vec_info_type)
*relevant = vect_used_in_loop;
/* changing memory. */
*************** vect_stmt_relevant_p (tree stmt, loop_ve
*** 2315,2320 ****
--- 2361,2368 ----
of the respective DEF_STMT is left unchanged.
- case 2: If STMT is a reduction phi and DEF_STMT is a reduction stmt, we
skip DEF_STMT cause it had already been processed.
+ - case 3: If DEF_STMT and STMT are in different nests, then "relevant" will
+ be modified accordingly.
Return true if everything is as expected. Return false otherwise. */
*************** process_use (tree stmt, tree use, loop_v
*** 2325,2331 ****
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt);
stmt_vec_info dstmt_vinfo;
! basic_block def_bb;
tree def, def_stmt;
enum vect_def_type dt;
--- 2373,2379 ----
struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt);
stmt_vec_info dstmt_vinfo;
! basic_block bb, def_bb;
tree def, def_stmt;
enum vect_def_type dt;
*************** process_use (tree stmt, tree use, loop_v
*** 2346,2362 ****
def_bb = bb_for_stmt (def_stmt);
if (!flow_bb_inside_loop_p (loop, def_bb))
! return true;
! /* case 2: A reduction phi defining a reduction stmt (DEF_STMT). DEF_STMT
! must have already been processed, so we just check that everything is as
! expected, and we are done. */
dstmt_vinfo = vinfo_for_stmt (def_stmt);
if (TREE_CODE (stmt) == PHI_NODE
&& STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def
&& TREE_CODE (def_stmt) != PHI_NODE
! && STMT_VINFO_DEF_TYPE (dstmt_vinfo) == vect_reduction_def)
{
if (STMT_VINFO_IN_PATTERN_P (dstmt_vinfo))
dstmt_vinfo = vinfo_for_stmt (STMT_VINFO_RELATED_STMT (dstmt_vinfo));
gcc_assert (STMT_VINFO_RELEVANT (dstmt_vinfo) < vect_used_by_reduction);
--- 2394,2420 ----
def_bb = bb_for_stmt (def_stmt);
if (!flow_bb_inside_loop_p (loop, def_bb))
! {
! if (vect_print_dump_info (REPORT_DETAILS))
! fprintf (vect_dump, "def_stmt is out of loop.");
! return true;
! }
! /* case 2: A reduction phi (STMT) defined by a reduction stmt (DEF_STMT).
! DEF_STMT must have already been processed, because this should be the
! only way that STMT, which is a reduction-phi, was put in the worklist,
! as there should be no other uses for DEF_STMT in the loop. So we just
! check that everything is as expected, and we are done. */
dstmt_vinfo = vinfo_for_stmt (def_stmt);
+ bb = bb_for_stmt (stmt);
if (TREE_CODE (stmt) == PHI_NODE
&& STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def
&& TREE_CODE (def_stmt) != PHI_NODE
! && STMT_VINFO_DEF_TYPE (dstmt_vinfo) == vect_reduction_def
! && bb->loop_father == def_bb->loop_father)
{
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "reduc-stmt defining reduc-phi in the same nest.");
if (STMT_VINFO_IN_PATTERN_P (dstmt_vinfo))
dstmt_vinfo = vinfo_for_stmt (STMT_VINFO_RELATED_STMT (dstmt_vinfo));
gcc_assert (STMT_VINFO_RELEVANT (dstmt_vinfo) < vect_used_by_reduction);
*************** process_use (tree stmt, tree use, loop_v
*** 2365,2370 ****
--- 2423,2495 ----
return true;
}
+ /* case 3a: outer-loop stmt defining an inner-loop stmt:
+ outer-loop-header-bb:
+ d = def_stmt
+ inner-loop:
+ stmt # use (d)
+ outer-loop-tail-bb:
+ ... */
+ if (flow_loop_nested_p (def_bb->loop_father, bb->loop_father))
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "outer-loop def-stmt defining inner-loop stmt.");
+ switch (relevant)
+ {
+ case vect_unused_in_loop:
+ relevant = (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def) ?
+ vect_used_by_reduction : vect_unused_in_loop;
+ break;
+ case vect_used_in_outer_by_reduction:
+ relevant = vect_used_by_reduction;
+ break;
+ case vect_used_in_outer:
+ relevant = vect_used_in_loop;
+ break;
+ case vect_used_by_reduction:
+ case vect_used_in_loop:
+ break;
+
+ default:
+ gcc_unreachable ();
+ }
+ }
+
+ /* case 3b: inner-loop stmt defining an outer-loop stmt:
+ outer-loop-header-bb:
+ ...
+ inner-loop:
+ d = def_stmt
+ outer-loop-tail-bb:
+ stmt # use (d) */
+ else if (flow_loop_nested_p (bb->loop_father, def_bb->loop_father))
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "inner-loop def-stmt defining outer-loop stmt.");
+ switch (relevant)
+ {
+ case vect_unused_in_loop:
+ relevant = (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def) ?
+ vect_used_in_outer_by_reduction : vect_unused_in_loop;
+ break;
+
+ case vect_used_in_outer_by_reduction:
+ case vect_used_in_outer:
+ break;
+
+ case vect_used_by_reduction:
+ relevant = vect_used_in_outer_by_reduction;
+ break;
+
+ case vect_used_in_loop:
+ relevant = vect_used_in_outer;
+ break;
+
+ default:
+ gcc_unreachable ();
+ }
+ }
+
vect_mark_relevant (worklist, def_stmt, relevant, live_p);
return true;
}
*************** vect_mark_stmts_to_be_vectorized (loop_v
*** 2473,2497 ****
identify stmts that are used solely by a reduction, and therefore the
order of the results that they produce does not have to be kept.
! Reduction phis are expected to be used by a reduction stmt; Other
! reduction stmts are expected to be unused in the loop. These are the
! expected values of "relevant" for reduction phis/stmts in the loop:
relevance: phi stmt
vect_unused_in_loop ok
vect_used_by_reduction ok
vect_used_in_loop */
if (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def)
{
! switch (relevant)
{
case vect_unused_in_loop:
gcc_assert (TREE_CODE (stmt) != PHI_NODE);
break;
case vect_used_by_reduction:
if (TREE_CODE (stmt) == PHI_NODE)
break;
case vect_used_in_loop:
default:
if (vect_print_dump_info (REPORT_DETAILS))
--- 2598,2635 ----
identify stmts that are used solely by a reduction, and therefore the
order of the results that they produce does not have to be kept.
! Reduction phis are expected to be used by a reduction stmt, or by
! in an outer loop; Other reduction stmts are expected to be
! in the loop, and possibly used by a stmt in an outer loop.
! Here are the expected values of "relevant" for reduction phis/stmts:
relevance: phi stmt
vect_unused_in_loop ok
+ vect_used_in_outer_by_reduction ok ok
+ vect_used_in_outer ok ok
vect_used_by_reduction ok
vect_used_in_loop */
if (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def)
{
! enum vect_relevant tmp_relevant = relevant;
! switch (tmp_relevant)
{
case vect_unused_in_loop:
gcc_assert (TREE_CODE (stmt) != PHI_NODE);
+ relevant = vect_used_by_reduction;
+ break;
+
+ case vect_used_in_outer_by_reduction:
+ case vect_used_in_outer:
+ gcc_assert (TREE_CODE (stmt) != WIDEN_SUM_EXPR
+ && TREE_CODE (stmt) != DOT_PROD_EXPR);
break;
+
case vect_used_by_reduction:
if (TREE_CODE (stmt) == PHI_NODE)
break;
+ /* fall through */
case vect_used_in_loop:
default:
if (vect_print_dump_info (REPORT_DETAILS))
*************** vect_mark_stmts_to_be_vectorized (loop_v
*** 2499,2505 ****
VEC_free (tree, heap, worklist);
return false;
}
- relevant = vect_used_by_reduction;
live_p = false;
}
--- 2637,2642 ----
*************** vect_get_loop_niters (struct loop *loop,
*** 2641,2651 ****
}
/* Function vect_analyze_loop_form.
! Verify the following restrictions (some may be relaxed in the future):
! - it's an inner-most loop
! - number of BBs = 2 (which are the loop header and the latch)
- the loop has a pre-header
- the loop has a single entry and exit
- the loop exit condition is simple enough, and the number of iterations
--- 2778,2816 ----
}
+ /* Function vect_analyze_loop_1.
+
+ Apply a set of analyses on LOOP, and create a loop_vec_info struct
+ for it. The different analyses will record information in the
+ loop_vec_info struct. This is a subset of the analyses applied in
+ vect_analyze_loop, to be applied on an inner-loop nested in the loop
+ that is now considered for (outer-loop) vectorization. */
+
+ static loop_vec_info
+ vect_analyze_loop_1 (struct loop *loop)
+ {
+ loop_vec_info loop_vinfo;
+
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "===== analyze_loop_nest_1 =====");
+
+ /* Check the CFG characteristics of the loop (nesting, entry/exit, etc. */
+
+ loop_vinfo = vect_analyze_loop_form (loop);
+ if (!loop_vinfo)
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "bad inner-loop form.");
+ return NULL;
+ }
+
+ return loop_vinfo;
+ }
+
+
/* Function vect_analyze_loop_form.
! Verify that certain CFG restrictions hold, including:
- the loop has a pre-header
- the loop has a single entry and exit
- the loop exit condition is simple enough, and the number of iterations
*************** vect_analyze_loop_form (struct loop *loo
*** 2657,2687 ****
loop_vec_info loop_vinfo;
tree loop_cond;
tree number_of_iterations = NULL;
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "=== vect_analyze_loop_form ===");
! if (loop->inner)
{
! if (vect_print_dump_info (REPORT_OUTER_LOOPS))
! fprintf (vect_dump, "not vectorized: nested loop.");
return NULL;
}
if (!single_exit (loop)
- || loop->num_nodes != 2
|| EDGE_COUNT (loop->header->preds) != 2)
{
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
{
if (!single_exit (loop))
fprintf (vect_dump, "not vectorized: multiple exits.");
- else if (loop->num_nodes != 2)
- fprintf (vect_dump, "not vectorized: too many BBs in loop.");
else if (EDGE_COUNT (loop->header->preds) != 2)
fprintf (vect_dump, "not vectorized: too many incoming edges.");
}
!
return NULL;
}
--- 2822,2955 ----
loop_vec_info loop_vinfo;
tree loop_cond;
tree number_of_iterations = NULL;
+ loop_vec_info inner_loop_vinfo = NULL;
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "=== vect_analyze_loop_form ===");
! /* Different restrictions apply when we are considering an inner-most loop,
! vs. an outer (nested) loop.
! (FORNOW. May want to relax some of these restrictions in the future). */
!
! if (!loop->inner)
! {
! /* Inner-most loop. We currently require that the number of BBs is
! exactly 2 (the header and latch). Vectorizable inner-most loops
! look like this:
!
! (pre-header)
! |
! header <--------+
! | | |
! | +--> latch --+
! |
! (exit-bb) */
!
! if (loop->num_nodes != 2)
! {
! if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
! fprintf (vect_dump, "not vectorized: too many BBs in loop.");
! return NULL;
! }
!
! if (empty_block_p (loop->header))
{
! if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
! fprintf (vect_dump, "not vectorized: empty loop.");
return NULL;
}
+ }
+ else
+ {
+ struct loop *innerloop = loop->inner;
+ edge backedge, entryedge;
+
+ /* Nested loop. We currently require that the loop is doubly-nested,
+ contains a single inner loop, and the number of BBs is exactly 5.
+ Vectorizable outer-loops look like this:
+
+ (pre-header)
+ |
+ header <---+
+ | |
+ inner-loop |
+ | |
+ tail ------+
+ |
+ (exit-bb)
+
+ The inner-loop has the properties expected of inner-most loops
+ as described above. */
+
+ if ((loop->inner)->inner || (loop->inner)->next)
+ {
+ if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
+ fprintf (vect_dump, "not vectorized: multiple nested loops.");
+ return NULL;
+ }
+
+ /* Analyze the inner-loop. */
+ inner_loop_vinfo = vect_analyze_loop_1 (loop->inner);
+ if (!inner_loop_vinfo)
+ {
+ if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
+ fprintf (vect_dump, "not vectorized: Bad inner loop.");
+ return NULL;
+ }
+
+ if (!expr_invariant_in_loop_p (loop,
+ LOOP_VINFO_NITERS (inner_loop_vinfo)))
+ {
+ if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
+ fprintf (vect_dump,
+ "not vectorized: inner-loop count not invariant.");
+ destroy_loop_vec_info (inner_loop_vinfo, true);
+ return NULL;
+ }
+
+ if (loop->num_nodes != 5)
+ {
+ if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
+ fprintf (vect_dump, "not vectorized: too many BBs in loop.");
+ destroy_loop_vec_info (inner_loop_vinfo, true);
+ return NULL;
+ }
+
+ gcc_assert (EDGE_COUNT (innerloop->header->preds) == 2);
+ backedge = EDGE_PRED (innerloop->header, 1);
+ entryedge = EDGE_PRED (innerloop->header, 0);
+ if (EDGE_PRED (innerloop->header, 0)->src == innerloop->latch)
+ {
+ backedge = EDGE_PRED (innerloop->header, 0);
+ entryedge = EDGE_PRED (innerloop->header, 1);
+ }
+
+ if (entryedge->src != loop->header
+ || !single_exit (innerloop)
+ || single_exit (innerloop)->dest != EDGE_PRED (loop->latch, 0)->src)
+ {
+ if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
+ fprintf (vect_dump, "not vectorized: unsupported outerloop form.");
+ destroy_loop_vec_info (inner_loop_vinfo, true);
+ return NULL;
+ }
+
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "Considering outer-loop vectorization.");
+ }
if (!single_exit (loop)
|| EDGE_COUNT (loop->header->preds) != 2)
{
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
{
if (!single_exit (loop))
fprintf (vect_dump, "not vectorized: multiple exits.");
else if (EDGE_COUNT (loop->header->preds) != 2)
fprintf (vect_dump, "not vectorized: too many incoming edges.");
}
! if (inner_loop_vinfo)
! destroy_loop_vec_info (inner_loop_vinfo, true);
return NULL;
}
*************** vect_analyze_loop_form (struct loop *loo
*** 2694,2699 ****
--- 2962,2969 ----
{
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
fprintf (vect_dump, "not vectorized: unexpected loop form.");
+ if (inner_loop_vinfo)
+ destroy_loop_vec_info (inner_loop_vinfo, true);
return NULL;
}
*************** vect_analyze_loop_form (struct loop *loo
*** 2711,2732 ****
{
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
fprintf (vect_dump, "not vectorized: abnormal loop exit edge.");
return NULL;
}
}
- if (empty_block_p (loop->header))
- {
- if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
- fprintf (vect_dump, "not vectorized: empty loop.");
- return NULL;
- }
-
loop_cond = vect_get_loop_niters (loop, &number_of_iterations);
if (!loop_cond)
{
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
fprintf (vect_dump, "not vectorized: complicated exit condition.");
return NULL;
}
--- 2981,2999 ----
{
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
fprintf (vect_dump, "not vectorized: abnormal loop exit edge.");
+ if (inner_loop_vinfo)
+ destroy_loop_vec_info (inner_loop_vinfo, true);
return NULL;
}
}
loop_cond = vect_get_loop_niters (loop, &number_of_iterations);
if (!loop_cond)
{
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
fprintf (vect_dump, "not vectorized: complicated exit condition.");
+ if (inner_loop_vinfo)
+ destroy_loop_vec_info (inner_loop_vinfo, true);
return NULL;
}
*************** vect_analyze_loop_form (struct loop *loo
*** 2735,2740 ****
--- 3002,3009 ----
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
fprintf (vect_dump,
"not vectorized: number of iterations cannot be computed.");
+ if (inner_loop_vinfo)
+ destroy_loop_vec_info (inner_loop_vinfo, true);
return NULL;
}
*************** vect_analyze_loop_form (struct loop *loo
*** 2742,2748 ****
{
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
fprintf (vect_dump, "Infinite number of iterations.");
! return false;
}
if (!NITERS_KNOWN_P (number_of_iterations))
--- 3011,3019 ----
{
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
fprintf (vect_dump, "Infinite number of iterations.");
! if (inner_loop_vinfo)
! destroy_loop_vec_info (inner_loop_vinfo, true);
! return NULL;
}
if (!NITERS_KNOWN_P (number_of_iterations))
*************** vect_analyze_loop_form (struct loop *loo
*** 2757,2768 ****
{
if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
fprintf (vect_dump, "not vectorized: number of iterations = 0.");
return NULL;
}
loop_vinfo = new_loop_vec_info (loop);
LOOP_VINFO_NITERS (loop_vinfo) = number_of_iterations;
! LOOP_VINFO_EXIT_COND (loop_vinfo) = loop_cond;
gcc_assert (!loop->aux);
loop->aux = loop_vinfo;
--- 3028,3046 ----
{
if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
fprintf (vect_dump, "not vectorized: number of iterations = 0.");
+ if (inner_loop_vinfo)
+ destroy_loop_vec_info (inner_loop_vinfo, false);
return NULL;
}
loop_vinfo = new_loop_vec_info (loop);
LOOP_VINFO_NITERS (loop_vinfo) = number_of_iterations;
!
! STMT_VINFO_TYPE (vinfo_for_stmt (loop_cond)) = loop_exit_ctrl_vec_info_type;
!
! /* CHECKME: May want to keep it around it in the future. */
! if (inner_loop_vinfo)
! destroy_loop_vec_info (inner_loop_vinfo, false);
gcc_assert (!loop->aux);
loop->aux = loop_vinfo;
*************** vect_analyze_loop (struct loop *loop)
*** 2784,2789 ****
--- 3062,3076 ----
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "===== analyze_loop_nest =====");
+ if (loop_outer (loop)
+ && loop_vec_info_for_loop (loop_outer (loop))
+ && LOOP_VINFO_VECTORIZABLE_P (loop_vec_info_for_loop (loop_outer (loop))))
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "outer-loop already vectorized.");
+ return NULL;
+ }
+
/* Check the CFG characteristics of the loop (nesting, entry/exit, etc. */
loop_vinfo = vect_analyze_loop_form (loop);
*************** vect_analyze_loop (struct loop *loop)
*** 2805,2811 ****
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad data references.");
! destroy_loop_vec_info (loop_vinfo);
return NULL;
}
--- 3092,3098 ----
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad data references.");
! destroy_loop_vec_info (loop_vinfo, true);
return NULL;
}
*************** vect_analyze_loop (struct loop *loop)
*** 2823,2829 ****
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "unexpected pattern.");
! destroy_loop_vec_info (loop_vinfo);
return NULL;
}
--- 3110,3116 ----
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "unexpected pattern.");
! destroy_loop_vec_info (loop_vinfo, true);
return NULL;
}
*************** vect_analyze_loop (struct loop *loop)
*** 2835,2841 ****
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad data alignment.");
! destroy_loop_vec_info (loop_vinfo);
return NULL;
}
--- 3122,3128 ----
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad data alignment.");
! destroy_loop_vec_info (loop_vinfo, true);
return NULL;
}
*************** vect_analyze_loop (struct loop *loop)
*** 2844,2850 ****
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "can't determine vectorization factor.");
! destroy_loop_vec_info (loop_vinfo);
return NULL;
}
--- 3131,3137 ----
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "can't determine vectorization factor.");
! destroy_loop_vec_info (loop_vinfo, true);
return NULL;
}
*************** vect_analyze_loop (struct loop *loop)
*** 2856,2862 ****
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad data dependence.");
! destroy_loop_vec_info (loop_vinfo);
return NULL;
}
--- 3143,3149 ----
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad data dependence.");
! destroy_loop_vec_info (loop_vinfo, true);
return NULL;
}
*************** vect_analyze_loop (struct loop *loop)
*** 2868,2874 ****
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad data access.");
! destroy_loop_vec_info (loop_vinfo);
return NULL;
}
--- 3155,3161 ----
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad data access.");
! destroy_loop_vec_info (loop_vinfo, true);
return NULL;
}
*************** vect_analyze_loop (struct loop *loop)
*** 2880,2886 ****
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad data alignment.");
! destroy_loop_vec_info (loop_vinfo);
return NULL;
}
--- 3167,3173 ----
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad data alignment.");
! destroy_loop_vec_info (loop_vinfo, true);
return NULL;
}
*************** vect_analyze_loop (struct loop *loop)
*** 2892,2898 ****
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad operation or unsupported loop bound.");
! destroy_loop_vec_info (loop_vinfo);
return NULL;
}
--- 3179,3185 ----
{
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "bad operation or unsupported loop bound.");
! destroy_loop_vec_info (loop_vinfo, true);
return NULL;
}
Index: tree-vect-patterns.c
===================================================================
*** tree-vect-patterns.c (revision 127356)
--- tree-vect-patterns.c (working copy)
*************** widened_name_p (tree name, tree use_stmt
*** 148,154 ****
* Return value: A new stmt that will be used to replace the sequence of
stmts that constitute the pattern. In this case it will be:
WIDEN_DOT_PRODUCT <x_t, y_t, sum_0>
! */
static tree
vect_recog_dot_prod_pattern (tree last_stmt, tree *type_in, tree *type_out)
--- 148,161 ----
* Return value: A new stmt that will be used to replace the sequence of
stmts that constitute the pattern. In this case it will be:
WIDEN_DOT_PRODUCT <x_t, y_t, sum_0>
!
! Note: The dot-prod idiom is a widening reduction pattern that is
! vectorized without preserving all the intermediate results. It
! produces only N/2 (widened) results (by summing up pairs of
! intermediate results) rather than all N results. Therefore, we
! cannot allow this pattern when we want to get all the results and in
! the correct order (as is the case when this computation is in an
! inner-loop nested in an outer-loop that us being vectorized). */
static tree
vect_recog_dot_prod_pattern (tree last_stmt, tree *type_in, tree *type_out)
*************** vect_recog_dot_prod_pattern (tree last_s
*** 160,165 ****
--- 167,174 ----
tree type, half_type;
tree pattern_expr;
tree prod_type;
+ loop_vec_info loop_info = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
+ struct loop *loop = LOOP_VINFO_LOOP (loop_info);
if (TREE_CODE (last_stmt) != GIMPLE_MODIFY_STMT)
return NULL;
*************** vect_recog_dot_prod_pattern (tree last_s
*** 242,247 ****
--- 251,260 ----
gcc_assert (stmt_vinfo);
if (STMT_VINFO_DEF_TYPE (stmt_vinfo) != vect_loop_def)
return NULL;
+ /* FORNOW. Can continue analyzing the def-use chain when this stmt in a phi
+ inside the loop (in case we are analyzing an outer-loop). */
+ if (TREE_CODE (stmt) != GIMPLE_MODIFY_STMT)
+ return NULL;
expr = GIMPLE_STMT_OPERAND (stmt, 1);
if (TREE_CODE (expr) != MULT_EXPR)
return NULL;
*************** vect_recog_dot_prod_pattern (tree last_s
*** 295,300 ****
--- 308,323 ----
fprintf (vect_dump, "vect_recog_dot_prod_pattern: detected: ");
print_generic_expr (vect_dump, pattern_expr, TDF_SLIM);
}
+
+ /* We don't allow changing the order of the computation in the inner-loop
+ when doing outer-loop vectorization. */
+ if (nested_in_vect_loop_p (loop, last_stmt))
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "vect_recog_dot_prod_pattern: not allowed.");
+ return NULL;
+ }
+
return pattern_expr;
}
*************** vect_recog_pow_pattern (tree last_stmt,
*** 521,527 ****
* Return value: A new stmt that will be used to replace the sequence of
stmts that constitute the pattern. In this case it will be:
WIDEN_SUM <x_t, sum_0>
! */
static tree
vect_recog_widen_sum_pattern (tree last_stmt, tree *type_in, tree *type_out)
--- 544,557 ----
* Return value: A new stmt that will be used to replace the sequence of
stmts that constitute the pattern. In this case it will be:
WIDEN_SUM <x_t, sum_0>
!
! Note: The widneing-sum idiom is a widening reduction pattern that is
! vectorized without preserving all the intermediate results. It
! produces only N/2 (widened) results (by summing up pairs of
! intermediate results) rather than all N results. Therefore, we
! cannot allow this pattern when we want to get all the results and in
! the correct order (as is the case when this computation is in an
! inner-loop nested in an outer-loop that us being vectorized). */
static tree
vect_recog_widen_sum_pattern (tree last_stmt, tree *type_in, tree *type_out)
*************** vect_recog_widen_sum_pattern (tree last_
*** 531,536 ****
--- 561,568 ----
stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt);
tree type, half_type;
tree pattern_expr;
+ loop_vec_info loop_info = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
+ struct loop *loop = LOOP_VINFO_LOOP (loop_info);
if (TREE_CODE (last_stmt) != GIMPLE_MODIFY_STMT)
return NULL;
*************** vect_recog_widen_sum_pattern (tree last_
*** 580,585 ****
--- 612,627 ----
fprintf (vect_dump, "vect_recog_widen_sum_pattern: detected: ");
print_generic_expr (vect_dump, pattern_expr, TDF_SLIM);
}
+
+ /* We don't allow changing the order of the computation in the inner-loop
+ when doing outer-loop vectorization. */
+ if (nested_in_vect_loop_p (loop, last_stmt))
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "vect_recog_widen_sum_pattern: not allowed.");
+ return NULL;
+ }
+
return pattern_expr;
}
Index: tree-vect-transform.c
===================================================================
*** tree-vect-transform.c (revision 127356)
--- tree-vect-transform.c (working copy)
*************** vect_estimate_min_profitable_iters (loop
*** 124,129 ****
--- 124,130 ----
basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
int nbbs = loop->num_nodes;
int byte_misalign;
+ int innerloop_iters, factor;
/* Cost model disabled. */
if (!flag_vect_cost_model)
*************** vect_estimate_min_profitable_iters (loop
*** 152,162 ****
--- 153,172 ----
TODO: Consider assigning different costs to different scalar
statements. */
+ /* FORNOW. */
+ if (loop->inner)
+ innerloop_iters = 50; /* FIXME */
+
for (i = 0; i < nbbs; i++)
{
block_stmt_iterator si;
basic_block bb = bbs[i];
+ if (bb->loop_father == loop->inner)
+ factor = innerloop_iters;
+ else
+ factor = 1;
+
for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si))
{
tree stmt = bsi_stmt (si);
*************** vect_estimate_min_profitable_iters (loop
*** 164,171 ****
if (!STMT_VINFO_RELEVANT_P (stmt_info)
&& !STMT_VINFO_LIVE_P (stmt_info))
continue;
! scalar_single_iter_cost += cost_for_stmt (stmt);
! vec_inside_cost += STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info);
vec_outside_cost += STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info);
}
}
--- 174,183 ----
if (!STMT_VINFO_RELEVANT_P (stmt_info)
&& !STMT_VINFO_LIVE_P (stmt_info))
continue;
! scalar_single_iter_cost += cost_for_stmt (stmt) * factor;
! vec_inside_cost += STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) * factor;
! /* FIXME: for stmts in the inner-loop in outer-loop vectorization,
! some of the "outside" costs are generated inside the outer-loop. */
vec_outside_cost += STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info);
}
}
*************** vect_init_vector (tree stmt, tree vector
*** 1071,1076 ****
--- 1083,1091 ----
tree new_temp;
basic_block new_bb;
+ if (nested_in_vect_loop_p (loop, stmt))
+ loop = loop->inner;
+
new_var = vect_get_new_vect_var (vector_type, vect_simple_var, "cst_");
add_referenced_var (new_var);
*************** vect_init_vector (tree stmt, tree vector
*** 1096,1101 ****
--- 1111,1117 ----
/* Function get_initial_def_for_induction
Input:
+ STMT - a stmt that performs an induction operation in the loop.
IV_PHI - the initial value of the induction variable
Output:
*************** get_initial_def_for_induction (tree iv_p
*** 1114,1121 ****
tree vectype = get_vectype_for_scalar_type (scalar_type);
int nunits = TYPE_VECTOR_SUBPARTS (vectype);
edge pe = loop_preheader_edge (loop);
basic_block new_bb;
- block_stmt_iterator bsi;
tree vec, vec_init, vec_step, t;
tree access_fn;
tree new_var;
--- 1130,1137 ----
tree vectype = get_vectype_for_scalar_type (scalar_type);
int nunits = TYPE_VECTOR_SUBPARTS (vectype);
edge pe = loop_preheader_edge (loop);
+ struct loop *iv_loop;
basic_block new_bb;
tree vec, vec_init, vec_step, t;
tree access_fn;
tree new_var;
*************** get_initial_def_for_induction (tree iv_p
*** 1129,1136 ****
int ncopies = vf / nunits;
tree expr;
stmt_vec_info phi_info = vinfo_for_stmt (iv_phi);
tree stmts;
! tree stmt = NULL_TREE;
block_stmt_iterator si;
basic_block bb = bb_for_stmt (iv_phi);
--- 1145,1157 ----
int ncopies = vf / nunits;
tree expr;
stmt_vec_info phi_info = vinfo_for_stmt (iv_phi);
+ bool nested_in_vect_loop = false;
tree stmts;
! imm_use_iterator imm_iter;
! use_operand_p use_p;
! tree exit_phi;
! edge latch_e;
! tree loop_arg;
block_stmt_iterator si;
basic_block bb = bb_for_stmt (iv_phi);
*************** get_initial_def_for_induction (tree iv_p
*** 1139,1203 ****
/* Find the first insertion point in the BB. */
si = bsi_after_labels (bb);
- stmt = bsi_stmt (si);
! access_fn = analyze_scalar_evolution (loop, PHI_RESULT (iv_phi));
gcc_assert (access_fn);
! ok = vect_is_simple_iv_evolution (loop->num, access_fn,
! &init_expr, &step_expr);
gcc_assert (ok);
/* Create the vector that holds the initial_value of the induction. */
! new_var = vect_get_new_vect_var (scalar_type, vect_scalar_var, "var_");
! add_referenced_var (new_var);
!
! new_name = force_gimple_operand (init_expr, &stmts, false, new_var);
! if (stmts)
{
! new_bb = bsi_insert_on_edge_immediate (pe, stmts);
! gcc_assert (!new_bb);
}
!
! t = NULL_TREE;
! t = tree_cons (NULL_TREE, new_name, t);
! for (i = 1; i < nunits; i++)
{
! tree tmp;
! /* Create: new_name = new_name + step_expr */
! tmp = fold_build2 (PLUS_EXPR, scalar_type, new_name, step_expr);
! init_stmt = build_gimple_modify_stmt (new_var, tmp);
! new_name = make_ssa_name (new_var, init_stmt);
! GIMPLE_STMT_OPERAND (init_stmt, 0) = new_name;
! new_bb = bsi_insert_on_edge_immediate (pe, init_stmt);
! gcc_assert (!new_bb);
! if (vect_print_dump_info (REPORT_DETAILS))
! {
! fprintf (vect_dump, "created new init_stmt: ");
! print_generic_expr (vect_dump, init_stmt, TDF_SLIM);
! }
! t = tree_cons (NULL_TREE, new_name, t);
}
- vec = build_constructor_from_list (vectype, nreverse (t));
- vec_init = vect_init_vector (stmt, vec, vectype);
/* Create the vector that holds the step of the induction. */
! expr = build_int_cst (scalar_type, vf);
! new_name = fold_build2 (MULT_EXPR, scalar_type, expr, step_expr);
t = NULL_TREE;
for (i = 0; i < nunits; i++)
t = tree_cons (NULL_TREE, unshare_expr (new_name), t);
vec = build_constructor_from_list (vectype, t);
! vec_step = vect_init_vector (stmt, vec, vectype);
/* Create the following def-use cycle:
loop prolog:
! vec_init = [X, X+S, X+2*S, X+3*S]
! vec_step = [VF*S, VF*S, VF*S, VF*S]
loop:
vec_iv = PHI <vec_init, vec_loop>
...
--- 1160,1266 ----
/* Find the first insertion point in the BB. */
si = bsi_after_labels (bb);
! if (INTEGRAL_TYPE_P (scalar_type))
! step_expr = build_int_cst (scalar_type, 0);
! else
! step_expr = build_real (scalar_type, dconst0);
!
! /* Is phi in an inner-loop, while vectorizing an enclosing outer-loop? */
! if (nested_in_vect_loop_p (loop, iv_phi))
! {
! nested_in_vect_loop = true;
! iv_loop = loop->inner;
! }
! else
! iv_loop = loop;
! gcc_assert (iv_loop == (bb_for_stmt (iv_phi))->loop_father);
!
! latch_e = loop_latch_edge (iv_loop);
! loop_arg = PHI_ARG_DEF_FROM_EDGE (iv_phi, latch_e);
!
! access_fn = analyze_scalar_evolution (iv_loop, PHI_RESULT (iv_phi));
gcc_assert (access_fn);
! ok = vect_is_simple_iv_evolution (iv_loop->num, access_fn,
! &init_expr, &step_expr);
gcc_assert (ok);
+ pe = loop_preheader_edge (iv_loop);
/* Create the vector that holds the initial_value of the induction. */
! if (nested_in_vect_loop)
{
! /* iv_loop is nested in the loop to be vectorized. init_expr had already
! been created during vectorization of previous stmts; We obtain it from
! the STMT_VINFO_VEC_STMT of the defining stmt. */
! tree iv_def = PHI_ARG_DEF_FROM_EDGE (iv_phi, loop_preheader_edge (iv_loop));
! vec_init = vect_get_vec_def_for_operand (iv_def, iv_phi, NULL);
}
! else
{
! /* iv_loop is the loop to be vectorized. Create:
! vec_init = [X, X+S, X+2*S, X+3*S] (S = step_expr, X = init_expr) */
! new_var = vect_get_new_vect_var (scalar_type, vect_scalar_var, "var_");
! add_referenced_var (new_var);
!
! new_name = force_gimple_operand (init_expr, &stmts, false, new_var);
! if (stmts)
! {
! new_bb = bsi_insert_on_edge_immediate (pe, stmts);
! gcc_assert (!new_bb);
! }
!
! t = NULL_TREE;
! t = tree_cons (NULL_TREE, init_expr, t);
! for (i = 1; i < nunits; i++)
! {
! tree tmp;
! /* Create: new_name_i = new_name + step_expr */
! tmp = fold_build2 (PLUS_EXPR, scalar_type, new_name, step_expr);
! init_stmt = build_gimple_modify_stmt (new_var, tmp);
! new_name = make_ssa_name (new_var, init_stmt);
! GIMPLE_STMT_OPERAND (init_stmt, 0) = new_name;
! new_bb = bsi_insert_on_edge_immediate (pe, init_stmt);
! gcc_assert (!new_bb);
! if (vect_print_dump_info (REPORT_DETAILS))
! {
! fprintf (vect_dump, "created new init_stmt: ");
! print_generic_expr (vect_dump, init_stmt, TDF_SLIM);
! }
! t = tree_cons (NULL_TREE, new_name, t);
! }
! /* Create a vector from [new_name_0, new_name_1, ..., new_name_nunits-1] */
! vec = build_constructor_from_list (vectype, nreverse (t));
! vec_init = vect_init_vector (iv_phi, vec, vectype);
}
/* Create the vector that holds the step of the induction. */
! if (nested_in_vect_loop)
! /* iv_loop is nested in the loop to be vectorized. Generate:
! vec_step = [S, S, S, S] */
! new_name = step_expr;
! else
! {
! /* iv_loop is the loop to be vectorized. Generate:
! vec_step = [VF*S, VF*S, VF*S, VF*S] */
! expr = build_int_cst (scalar_type, vf);
! new_name = fold_build2 (MULT_EXPR, scalar_type, expr, step_expr);
! }
!
t = NULL_TREE;
for (i = 0; i < nunits; i++)
t = tree_cons (NULL_TREE, unshare_expr (new_name), t);
vec = build_constructor_from_list (vectype, t);
! vec_step = vect_init_vector (iv_phi, vec, vectype);
/* Create the following def-use cycle:
loop prolog:
! vec_init = ...
! vec_step = ...
loop:
vec_iv = PHI <vec_init, vec_loop>
...
*************** get_initial_def_for_induction (tree iv_p
*** 1208,1214 ****
/* Create the induction-phi that defines the induction-operand. */
vec_dest = vect_get_new_vect_var (vectype, vect_simple_var, "vec_iv_");
add_referenced_var (vec_dest);
! induction_phi = create_phi_node (vec_dest, loop->header);
set_stmt_info (get_stmt_ann (induction_phi),
new_stmt_vec_info (induction_phi, loop_vinfo));
induc_def = PHI_RESULT (induction_phi);
--- 1271,1277 ----
/* Create the induction-phi that defines the induction-operand. */
vec_dest = vect_get_new_vect_var (vectype, vect_simple_var, "vec_iv_");
add_referenced_var (vec_dest);
! induction_phi = create_phi_node (vec_dest, iv_loop->header);
set_stmt_info (get_stmt_ann (induction_phi),
new_stmt_vec_info (induction_phi, loop_vinfo));
induc_def = PHI_RESULT (induction_phi);
*************** get_initial_def_for_induction (tree iv_p
*** 1219,1233 ****
induc_def, vec_step));
vec_def = make_ssa_name (vec_dest, new_stmt);
GIMPLE_STMT_OPERAND (new_stmt, 0) = vec_def;
! bsi = bsi_for_stmt (stmt);
! vect_finish_stmt_generation (stmt, new_stmt, &bsi);
/* Set the arguments of the phi node: */
! add_phi_arg (induction_phi, vec_init, loop_preheader_edge (loop));
! add_phi_arg (induction_phi, vec_def, loop_latch_edge (loop));
! /* In case the vectorization factor (VF) is bigger than the number
of elements that we can fit in a vectype (nunits), we have to generate
more than one vector stmt - i.e - we need to "unroll" the
vector stmt by a factor VF/nunits. For more details see documentation
--- 1282,1297 ----
induc_def, vec_step));
vec_def = make_ssa_name (vec_dest, new_stmt);
GIMPLE_STMT_OPERAND (new_stmt, 0) = vec_def;
! bsi_insert_before (&si, new_stmt, BSI_SAME_STMT);
! set_stmt_info (get_stmt_ann (new_stmt),
! new_stmt_vec_info (new_stmt, loop_vinfo));
/* Set the arguments of the phi node: */
! add_phi_arg (induction_phi, vec_init, pe);
! add_phi_arg (induction_phi, vec_def, loop_latch_edge (iv_loop));
! /* In case that vectorization factor (VF) is bigger than the number
of elements that we can fit in a vectype (nunits), we have to generate
more than one vector stmt - i.e - we need to "unroll" the
vector stmt by a factor VF/nunits. For more details see documentation
*************** get_initial_def_for_induction (tree iv_p
*** 1236,1241 ****
--- 1300,1307 ----
if (ncopies > 1)
{
stmt_vec_info prev_stmt_vinfo;
+ /* FORNOW. This restriction should be relaxed. */
+ gcc_assert (!nested_in_vect_loop);
/* Create the vector that holds the step of the induction. */
expr = build_int_cst (scalar_type, nunits);
*************** get_initial_def_for_induction (tree iv_p
*** 1244,1250 ****
for (i = 0; i < nunits; i++)
t = tree_cons (NULL_TREE, unshare_expr (new_name), t);
vec = build_constructor_from_list (vectype, t);
! vec_step = vect_init_vector (stmt, vec, vectype);
vec_def = induc_def;
prev_stmt_vinfo = vinfo_for_stmt (induction_phi);
--- 1310,1316 ----
for (i = 0; i < nunits; i++)
t = tree_cons (NULL_TREE, unshare_expr (new_name), t);
vec = build_constructor_from_list (vectype, t);
! vec_step = vect_init_vector (iv_phi, vec, vectype);
vec_def = induc_def;
prev_stmt_vinfo = vinfo_for_stmt (induction_phi);
*************** get_initial_def_for_induction (tree iv_p
*** 1252,1270 ****
{
tree tmp;
! /* vec_i = vec_prev + vec_{step*nunits} */
tmp = build2 (PLUS_EXPR, vectype, vec_def, vec_step);
new_stmt = build_gimple_modify_stmt (NULL_TREE, tmp);
vec_def = make_ssa_name (vec_dest, new_stmt);
GIMPLE_STMT_OPERAND (new_stmt, 0) = vec_def;
! bsi = bsi_for_stmt (stmt);
! vect_finish_stmt_generation (stmt, new_stmt, &bsi);
!
STMT_VINFO_RELATED_STMT (prev_stmt_vinfo) = new_stmt;
prev_stmt_vinfo = vinfo_for_stmt (new_stmt);
}
}
if (vect_print_dump_info (REPORT_DETAILS))
{
fprintf (vect_dump, "transform induction: created def-use cycle:");
--- 1318,1367 ----
{
tree tmp;
! /* vec_i = vec_prev + vec_step */
tmp = build2 (PLUS_EXPR, vectype, vec_def, vec_step);
new_stmt = build_gimple_modify_stmt (NULL_TREE, tmp);
vec_def = make_ssa_name (vec_dest, new_stmt);
GIMPLE_STMT_OPERAND (new_stmt, 0) = vec_def;
! bsi_insert_before (&si, new_stmt, BSI_SAME_STMT);
! set_stmt_info (get_stmt_ann (new_stmt),
! new_stmt_vec_info (new_stmt, loop_vinfo));
STMT_VINFO_RELATED_STMT (prev_stmt_vinfo) = new_stmt;
prev_stmt_vinfo = vinfo_for_stmt (new_stmt);
}
}
+ if (nested_in_vect_loop)
+ {
+ /* Find the loop-closed exit-phi of the induction, and record
+ the final vector of induction results: */
+ exit_phi = NULL;
+ FOR_EACH_IMM_USE_FAST (use_p, imm_iter, loop_arg)
+ {
+ if (!flow_bb_inside_loop_p (iv_loop, bb_for_stmt (USE_STMT (use_p))))
+ {
+ exit_phi = USE_STMT (use_p);
+ break;
+ }
+ }
+ if (exit_phi)
+ {
+ stmt_vec_info stmt_vinfo = vinfo_for_stmt (exit_phi);
+ /* FORNOW. Currently not supporting the case that an inner-loop induction
+ is not used in the outer-loop (i.e. only outside the outer-loop). */
+ gcc_assert (STMT_VINFO_RELEVANT_P (stmt_vinfo)
+ && !STMT_VINFO_LIVE_P (stmt_vinfo));
+
+ STMT_VINFO_VEC_STMT (stmt_vinfo) = new_stmt;
+ if (vect_print_dump_info (REPORT_DETAILS))
+ {
+ fprintf (vect_dump, "vector of inductions after inner-loop:");
+ print_generic_expr (vect_dump, new_stmt, TDF_SLIM);
+ }
+ }
+ }
+
+
if (vect_print_dump_info (REPORT_DETAILS))
{
fprintf (vect_dump, "transform induction: created def-use cycle:");
*************** vect_get_vec_def_for_operand (tree op, t
*** 1300,1306 ****
tree vectype = STMT_VINFO_VECTYPE (stmt_vinfo);
int nunits = TYPE_VECTOR_SUBPARTS (vectype);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
- struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
tree vec_inv;
tree vec_cst;
tree t = NULL_TREE;
--- 1397,1402 ----
*************** vect_get_vec_def_for_operand (tree op, t
*** 1386,1399 ****
def_stmt_info = vinfo_for_stmt (def_stmt);
vec_stmt = STMT_VINFO_VEC_STMT (def_stmt_info);
gcc_assert (vec_stmt);
! vec_oprnd = GIMPLE_STMT_OPERAND (vec_stmt, 0);
return vec_oprnd;
}
/* Case 4: operand is defined by a loop header phi - reduction */
case vect_reduction_def:
{
gcc_assert (TREE_CODE (def_stmt) == PHI_NODE);
/* Get the def before the loop */
op = PHI_ARG_DEF_FROM_EDGE (def_stmt, loop_preheader_edge (loop));
--- 1482,1501 ----
def_stmt_info = vinfo_for_stmt (def_stmt);
vec_stmt = STMT_VINFO_VEC_STMT (def_stmt_info);
gcc_assert (vec_stmt);
! if (TREE_CODE (vec_stmt) == PHI_NODE)
! vec_oprnd = PHI_RESULT (vec_stmt);
! else
! vec_oprnd = GIMPLE_STMT_OPERAND (vec_stmt, 0);
return vec_oprnd;
}
/* Case 4: operand is defined by a loop header phi - reduction */
case vect_reduction_def:
{
+ struct loop *loop;
+
gcc_assert (TREE_CODE (def_stmt) == PHI_NODE);
+ loop = (bb_for_stmt (def_stmt))->loop_father;
/* Get the def before the loop */
op = PHI_ARG_DEF_FROM_EDGE (def_stmt, loop_preheader_edge (loop));
*************** vect_get_vec_def_for_operand (tree op, t
*** 1405,1412 ****
{
gcc_assert (TREE_CODE (def_stmt) == PHI_NODE);
! /* Get the def before the loop */
! return get_initial_def_for_induction (def_stmt);
}
default:
--- 1507,1518 ----
{
gcc_assert (TREE_CODE (def_stmt) == PHI_NODE);
! /* Get the def from the vectorized stmt. */
! def_stmt_info = vinfo_for_stmt (def_stmt);
! vec_stmt = STMT_VINFO_VEC_STMT (def_stmt_info);
! gcc_assert (vec_stmt && (TREE_CODE (vec_stmt) == PHI_NODE));
! vec_oprnd = PHI_RESULT (vec_stmt);
! return vec_oprnd;
}
default:
*************** vect_get_vec_def_for_stmt_copy (enum vec
*** 1487,1493 ****
vec_stmt_for_operand = STMT_VINFO_RELATED_STMT (def_stmt_info);
gcc_assert (vec_stmt_for_operand);
vec_oprnd = GIMPLE_STMT_OPERAND (vec_stmt_for_operand, 0);
-
return vec_oprnd;
}
--- 1593,1598 ----
*************** vect_finish_stmt_generation (tree stmt,
*** 1503,1509 ****
--- 1608,1618 ----
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ gcc_assert (stmt == bsi_stmt (*bsi));
+ gcc_assert (TREE_CODE (stmt) != LABEL_EXPR);
+
bsi_insert_before (bsi, vec_stmt, BSI_SAME_STMT);
+
set_stmt_info (get_stmt_ann (vec_stmt),
new_stmt_vec_info (vec_stmt, loop_vinfo));
*************** static tree
*** 1571,1576 ****
--- 1680,1687 ----
get_initial_def_for_reduction (tree stmt, tree init_val, tree *adjustment_def)
{
stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt);
+ loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
tree vectype = STMT_VINFO_VECTYPE (stmt_vinfo);
int nunits = TYPE_VECTOR_SUBPARTS (vectype);
enum tree_code code = TREE_CODE (GIMPLE_STMT_OPERAND (stmt, 1));
*************** get_initial_def_for_reduction (tree stmt
*** 1581,1588 ****
--- 1692,1705 ----
tree t = NULL_TREE;
int i;
tree vector_type;
+ bool nested_in_vect_loop = false;
gcc_assert (INTEGRAL_TYPE_P (type) || SCALAR_FLOAT_TYPE_P (type));
+ if (nested_in_vect_loop_p (loop, stmt))
+ nested_in_vect_loop = true;
+ else
+ gcc_assert (loop == (bb_for_stmt (stmt))->loop_father);
+
vecdef = vect_get_vec_def_for_operand (init_val, stmt, NULL);
switch (code)
*************** get_initial_def_for_reduction (tree stmt
*** 1590,1596 ****
case WIDEN_SUM_EXPR:
case DOT_PROD_EXPR:
case PLUS_EXPR:
! *adjustment_def = init_val;
/* Create a vector of zeros for init_def. */
if (INTEGRAL_TYPE_P (type))
def_for_init = build_int_cst (type, 0);
--- 1707,1716 ----
case WIDEN_SUM_EXPR:
case DOT_PROD_EXPR:
case PLUS_EXPR:
! if (nested_in_vect_loop)
! *adjustment_def = vecdef;
! else
! *adjustment_def = init_val;
/* Create a vector of zeros for init_def. */
if (INTEGRAL_TYPE_P (type))
def_for_init = build_int_cst (type, 0);
*************** vect_create_epilog_for_reduction (tree v
*** 1679,1702 ****
tree new_phi;
block_stmt_iterator exit_bsi;
tree vec_dest;
! tree new_temp;
tree new_name;
! tree epilog_stmt;
! tree new_scalar_dest, exit_phi;
tree bitsize, bitpos, bytesize;
enum tree_code code = TREE_CODE (GIMPLE_STMT_OPERAND (stmt, 1));
! tree scalar_initial_def;
tree vec_initial_def;
tree orig_name;
imm_use_iterator imm_iter;
use_operand_p use_p;
! bool extract_scalar_result;
! tree reduction_op;
tree orig_stmt;
tree use_stmt;
tree operation = GIMPLE_STMT_OPERAND (stmt, 1);
int op_type;
op_type = TREE_OPERAND_LENGTH (operation);
reduction_op = TREE_OPERAND (operation, op_type-1);
vectype = get_vectype_for_scalar_type (TREE_TYPE (reduction_op));
--- 1799,1829 ----
tree new_phi;
block_stmt_iterator exit_bsi;
tree vec_dest;
! tree new_temp = NULL_TREE;
tree new_name;
! tree epilog_stmt = NULL_TREE;
! tree new_scalar_dest, exit_phi, new_dest;
tree bitsize, bitpos, bytesize;
enum tree_code code = TREE_CODE (GIMPLE_STMT_OPERAND (stmt, 1));
! tree adjustment_def;
tree vec_initial_def;
tree orig_name;
imm_use_iterator imm_iter;
use_operand_p use_p;
! bool extract_scalar_result = false;
! tree reduction_op, expr;
tree orig_stmt;
tree use_stmt;
tree operation = GIMPLE_STMT_OPERAND (stmt, 1);
+ bool nested_in_vect_loop = false;
int op_type;
+ if (nested_in_vect_loop_p (loop, stmt))
+ {
+ loop = loop->inner;
+ nested_in_vect_loop = true;
+ }
+
op_type = TREE_OPERAND_LENGTH (operation);
reduction_op = TREE_OPERAND (operation, op_type-1);
vectype = get_vectype_for_scalar_type (TREE_TYPE (reduction_op));
*************** vect_create_epilog_for_reduction (tree v
*** 1709,1715 ****
the scalar def before the loop, that defines the initial value
of the reduction variable. */
vec_initial_def = vect_get_vec_def_for_operand (reduction_op, stmt,
! &scalar_initial_def);
add_phi_arg (reduction_phi, vec_initial_def, loop_preheader_edge (loop));
/* 1.2 set the loop-latch arg for the reduction-phi: */
--- 1836,1842 ----
the scalar def before the loop, that defines the initial value
of the reduction variable. */
vec_initial_def = vect_get_vec_def_for_operand (reduction_op, stmt,
! &adjustment_def);
add_phi_arg (reduction_phi, vec_initial_def, loop_preheader_edge (loop));
/* 1.2 set the loop-latch arg for the reduction-phi: */
*************** vect_create_epilog_for_reduction (tree v
*** 1788,1793 ****
--- 1915,1929 ----
bitsize = TYPE_SIZE (scalar_type);
bytesize = TYPE_SIZE_UNIT (scalar_type);
+
+ /* In case this is a reduction in an inner-loop while vectorizing an outer
+ loop - we don't need to extract a single scalar result at the end of the
+ inner-loop. The final vector of partial results will be used in the
+ vectorized outer-loop, or reduced to a scalar result at the end of the
+ outer-loop. */
+ if (nested_in_vect_loop)
+ goto vect_finalize_reduction;
+
/* 2.3 Create the reduction code, using one of the three schemes described
above. */
*************** vect_create_epilog_for_reduction (tree v
*** 1934,1939 ****
--- 2070,2076 ----
{
tree rhs;
+ gcc_assert (!nested_in_vect_loop);
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "extract scalar result");
*************** vect_create_epilog_for_reduction (tree v
*** 1952,1976 ****
bsi_insert_before (&exit_bsi, epilog_stmt, BSI_SAME_STMT);
}
! /* 2.4 Adjust the final result by the initial value of the reduction
variable. (When such adjustment is not needed, then
! 'scalar_initial_def' is zero).
! Create:
! s_out4 = scalar_expr <s_out3, scalar_initial_def> */
!
! if (scalar_initial_def)
{
! tree tmp = build2 (code, scalar_type, new_temp, scalar_initial_def);
! epilog_stmt = build_gimple_modify_stmt (new_scalar_dest, tmp);
! new_temp = make_ssa_name (new_scalar_dest, epilog_stmt);
GIMPLE_STMT_OPERAND (epilog_stmt, 0) = new_temp;
bsi_insert_before (&exit_bsi, epilog_stmt, BSI_SAME_STMT);
}
- /* 2.6 Replace uses of s_out0 with uses of s_out3 */
! /* Find the loop-closed-use at the loop exit of the original scalar result.
(The reduction result is expected to have two immediate uses - one at the
latch block, and one at the loop exit). */
exit_phi = NULL;
--- 2089,2130 ----
bsi_insert_before (&exit_bsi, epilog_stmt, BSI_SAME_STMT);
}
! vect_finalize_reduction:
!
! /* 2.5 Adjust the final result by the initial value of the reduction
variable. (When such adjustment is not needed, then
! 'adjustment_def' is zero). For example, if code is PLUS we create:
! new_temp = loop_exit_def + adjustment_def */
! if (adjustment_def)
{
! if (nested_in_vect_loop)
! {
! gcc_assert (TREE_CODE (TREE_TYPE (adjustment_def)) == VECTOR_TYPE);
! expr = build2 (code, vectype, PHI_RESULT (new_phi), adjustment_def);
! new_dest = vect_create_destination_var (scalar_dest, vectype);
! }
! else
! {
! gcc_assert (TREE_CODE (TREE_TYPE (adjustment_def)) != VECTOR_TYPE);
! expr = build2 (code, scalar_type, new_temp, adjustment_def);
! new_dest = vect_create_destination_var (scalar_dest, scalar_type);
! }
! epilog_stmt = build_gimple_modify_stmt (new_dest, expr);
! new_temp = make_ssa_name (new_dest, epilog_stmt);
GIMPLE_STMT_OPERAND (epilog_stmt, 0) = new_temp;
+ #if 0
+ bsi_insert_after (&exit_bsi, epilog_stmt, BSI_NEW_STMT);
+ #else
bsi_insert_before (&exit_bsi, epilog_stmt, BSI_SAME_STMT);
+ #endif
}
! /* 2.6 Handle the loop-exit phi */
!
! /* Replace uses of s_out0 with uses of s_out3:
! Find the loop-closed-use at the loop exit of the original scalar result.
(The reduction result is expected to have two immediate uses - one at the
latch block, and one at the loop exit). */
exit_phi = NULL;
*************** vect_create_epilog_for_reduction (tree v
*** 1984,1989 ****
--- 2138,2166 ----
}
/* We expect to have found an exit_phi because of loop-closed-ssa form. */
gcc_assert (exit_phi);
+
+ if (nested_in_vect_loop)
+ {
+ stmt_vec_info stmt_vinfo = vinfo_for_stmt (exit_phi);
+
+ /* FORNOW. Currently not supporting the case that an inner-loop reduction
+ is not used in the outer-loop (but only outside the outer-loop). */
+ gcc_assert (STMT_VINFO_RELEVANT_P (stmt_vinfo)
+ && !STMT_VINFO_LIVE_P (stmt_vinfo));
+
+ epilog_stmt = adjustment_def ? epilog_stmt : new_phi;
+ STMT_VINFO_VEC_STMT (stmt_vinfo) = epilog_stmt;
+ set_stmt_info (get_stmt_ann (epilog_stmt),
+ new_stmt_vec_info (epilog_stmt, loop_vinfo));
+
+ if (vect_print_dump_info (REPORT_DETAILS))
+ {
+ fprintf (vect_dump, "vector of partial results after inner-loop:");
+ print_generic_expr (vect_dump, epilog_stmt, TDF_SLIM);
+ }
+ return;
+ }
+
/* Replace the uses: */
orig_name = PHI_RESULT (exit_phi);
FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, orig_name)
*************** vectorizable_reduction (tree stmt, block
*** 2065,2079 ****
tree new_stmt = NULL_TREE;
int j;
gcc_assert (ncopies >= 1);
/* 1. Is vectorizable reduction? */
/* Not supportable if the reduction variable is used in the loop. */
! if (STMT_VINFO_RELEVANT_P (stmt_info))
return false;
! if (!STMT_VINFO_LIVE_P (stmt_info))
return false;
/* Make sure it was already recognized as a reduction computation. */
--- 2242,2271 ----
tree new_stmt = NULL_TREE;
int j;
+ if (nested_in_vect_loop_p (loop, stmt))
+ {
+ loop = loop->inner;
+ /* FORNOW. This restriction should be relaxed. */
+ if (ncopies > 1)
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "multiple types in nested loop.");
+ return false;
+ }
+ }
+
gcc_assert (ncopies >= 1);
/* 1. Is vectorizable reduction? */
/* Not supportable if the reduction variable is used in the loop. */
! if (STMT_VINFO_RELEVANT (stmt_info) > vect_used_in_outer)
return false;
! /* Reductions that are not used even in an enclosing outer-loop,
! are expected to be "live" (used out of the loop). */
! if (STMT_VINFO_RELEVANT (stmt_info) == vect_unused_in_loop
! && !STMT_VINFO_LIVE_P (stmt_info))
return false;
/* Make sure it was already recognized as a reduction computation. */
*************** vectorizable_reduction (tree stmt, block
*** 2130,2138 ****
gcc_assert (dt == vect_reduction_def);
gcc_assert (TREE_CODE (def_stmt) == PHI_NODE);
if (orig_stmt)
! gcc_assert (orig_stmt == vect_is_simple_reduction (loop, def_stmt));
else
! gcc_assert (stmt == vect_is_simple_reduction (loop, def_stmt));
if (STMT_VINFO_LIVE_P (vinfo_for_stmt (def_stmt)))
return false;
--- 2322,2330 ----
gcc_assert (dt == vect_reduction_def);
gcc_assert (TREE_CODE (def_stmt) == PHI_NODE);
if (orig_stmt)
! gcc_assert (orig_stmt == vect_is_simple_reduction (loop_vinfo, def_stmt));
else
! gcc_assert (stmt == vect_is_simple_reduction (loop_vinfo, def_stmt));
if (STMT_VINFO_LIVE_P (vinfo_for_stmt (def_stmt)))
return false;
*************** vectorizable_call (tree stmt, block_stmt
*** 2357,2362 ****
--- 2549,2555 ----
int nunits_in;
int nunits_out;
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
tree fndecl, rhs, new_temp, def, def_stmt, rhs_type, lhs_type;
enum vect_def_type dt[2] = {vect_unknown_def_type, vect_unknown_def_type};
tree new_stmt;
*************** vectorizable_call (tree stmt, block_stmt
*** 2466,2471 ****
--- 2659,2672 ----
needs to be generated. */
gcc_assert (ncopies >= 1);
+ /* FORNOW. This restriction should be relaxed. */
+ if (nested_in_vect_loop_p (loop, stmt) && ncopies > 1)
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "multiple types in nested loop.");
+ return false;
+ }
+
if (!vec_stmt) /* transformation not required. */
{
STMT_VINFO_TYPE (stmt_info) = call_vec_info_type;
*************** vectorizable_call (tree stmt, block_stmt
*** 2480,2485 ****
--- 2681,2694 ----
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "transform operation.");
+ /* FORNOW. This restriction should be relaxed. */
+ if (nested_in_vect_loop_p (loop, stmt) && ncopies > 1)
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "multiple types in nested loop.");
+ return false;
+ }
+
/* Handle def. */
scalar_dest = GIMPLE_STMT_OPERAND (stmt, 0);
vec_dest = vect_create_destination_var (scalar_dest, vectype_out);
*************** vectorizable_conversion (tree stmt, bloc
*** 2671,2676 ****
--- 2880,2886 ----
tree vec_oprnd0 = NULL_TREE, vec_oprnd1 = NULL_TREE;
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
enum tree_code code, code1 = ERROR_MARK, code2 = ERROR_MARK;
tree decl1 = NULL_TREE, decl2 = NULL_TREE;
tree new_temp;
*************** vectorizable_conversion (tree stmt, bloc
*** 2752,2757 ****
--- 2962,2975 ----
needs to be generated. */
gcc_assert (ncopies >= 1);
+ /* FORNOW. This restriction should be relaxed. */
+ if (nested_in_vect_loop_p (loop, stmt) && ncopies > 1)
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "multiple types in nested loop.");
+ return false;
+ }
+
/* Check the operands of the operation. */
if (!vect_is_simple_use (op0, loop_vinfo, &def_stmt, &def, &dt0))
{
*************** vectorizable_operation (tree stmt, block
*** 3093,3098 ****
--- 3311,3317 ----
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
tree vectype = STMT_VINFO_VECTYPE (stmt_info);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
enum tree_code code;
enum machine_mode vec_mode;
tree new_temp;
*************** vectorizable_operation (tree stmt, block
*** 3111,3116 ****
--- 3330,3342 ----
int j;
gcc_assert (ncopies >= 1);
+ /* FORNOW. This restriction should be relaxed. */
+ if (nested_in_vect_loop_p (loop, stmt) && ncopies > 1)
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "multiple types in nested loop.");
+ return false;
+ }
if (!STMT_VINFO_RELEVANT_P (stmt_info))
return false;
*************** vectorizable_type_demotion (tree stmt, b
*** 3373,3378 ****
--- 3599,3605 ----
tree vec_oprnd0=NULL, vec_oprnd1=NULL;
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
enum tree_code code, code1 = ERROR_MARK;
tree new_temp;
tree def, def_stmt;
*************** vectorizable_type_demotion (tree stmt, b
*** 3425,3430 ****
--- 3652,3664 ----
ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_out;
gcc_assert (ncopies >= 1);
+ /* FORNOW. This restriction should be relaxed. */
+ if (nested_in_vect_loop_p (loop, stmt) && ncopies > 1)
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "multiple types in nested loop.");
+ return false;
+ }
if (! ((INTEGRAL_TYPE_P (TREE_TYPE (scalar_dest))
&& INTEGRAL_TYPE_P (TREE_TYPE (op0)))
*************** vectorizable_type_promotion (tree stmt,
*** 3522,3527 ****
--- 3756,3762 ----
tree vec_oprnd0=NULL, vec_oprnd1=NULL;
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
enum tree_code code, code1 = ERROR_MARK, code2 = ERROR_MARK;
tree decl1 = NULL_TREE, decl2 = NULL_TREE;
int op_type;
*************** vectorizable_type_promotion (tree stmt,
*** 3575,3580 ****
--- 3810,3822 ----
ncopies = LOOP_VINFO_VECT_FACTOR (loop_vinfo) / nunits_in;
gcc_assert (ncopies >= 1);
+ /* FORNOW. This restriction should be relaxed. */
+ if (nested_in_vect_loop_p (loop, stmt) && ncopies > 1)
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "multiple types in nested loop.");
+ return false;
+ }
if (! ((INTEGRAL_TYPE_P (TREE_TYPE (scalar_dest))
&& INTEGRAL_TYPE_P (TREE_TYPE (op0)))
*************** vectorizable_store (tree stmt, block_stm
*** 3867,3872 ****
--- 4109,4115 ----
struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info), *first_dr = NULL;
tree vectype = STMT_VINFO_VECTYPE (stmt_info);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
enum machine_mode vec_mode;
tree dummy;
enum dr_alignment_support alignment_support_cheme;
*************** vectorizable_store (tree stmt, block_stm
*** 3882,3887 ****
--- 4125,4137 ----
unsigned int group_size, i;
VEC(tree,heap) *dr_chain = NULL, *oprnds = NULL, *result_chain = NULL;
gcc_assert (ncopies >= 1);
+ /* FORNOW. This restriction should be relaxed. */
+ if (nested_in_vect_loop_p (loop, stmt) && ncopies > 1)
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "multiple types in nested loop.");
+ return false;
+ }
if (!STMT_VINFO_RELEVANT_P (stmt_info))
return false;
*************** vectorizable_load (tree stmt, block_stmt
*** 4517,4522 ****
--- 4767,4781 ----
bool strided_load = false;
tree first_stmt;
+ gcc_assert (ncopies >= 1);
+ /* FORNOW. This restriction should be relaxed. */
+ if (nested_in_vect_loop_p (loop, stmt) && ncopies > 1)
+ {
+ if (vect_print_dump_info (REPORT_DETAILS))
+ fprintf (vect_dump, "multiple types in nested loop.");
+ return false;
+ }
+
if (!STMT_VINFO_RELEVANT_P (stmt_info))
return false;
*************** vectorizable_live_operation (tree stmt,
*** 4812,4817 ****
--- 5071,5077 ----
tree operation;
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
int i;
int op_type;
tree op;
*************** vectorizable_live_operation (tree stmt,
*** 4829,4834 ****
--- 5089,5098 ----
if (TREE_CODE (GIMPLE_STMT_OPERAND (stmt, 0)) != SSA_NAME)
return false;
+ /* FORNOW. CHECKME. */
+ if (nested_in_vect_loop_p (loop, stmt))
+ return false;
+
operation = GIMPLE_STMT_OPERAND (stmt, 1);
op_type = TREE_OPERAND_LENGTH (operation);
*************** vect_transform_loop (loop_vec_info loop_
*** 5965,5972 ****
fprintf (vect_dump, "------>vectorizing statement: ");
print_generic_expr (vect_dump, stmt, TDF_SLIM);
}
stmt_info = vinfo_for_stmt (stmt);
! gcc_assert (stmt_info);
if (!STMT_VINFO_RELEVANT_P (stmt_info)
&& !STMT_VINFO_LIVE_P (stmt_info))
{
--- 6229,6246 ----
fprintf (vect_dump, "------>vectorizing statement: ");
print_generic_expr (vect_dump, stmt, TDF_SLIM);
}
+
stmt_info = vinfo_for_stmt (stmt);
!
! /* vector stmts created in the outer-loop during vectorization of
! stmts in an inner-loop may not have a stmt_info, and do not
! need to be vectorized. */
! if (!stmt_info)
! {
! bsi_next (&si);
! continue;
! }
!
if (!STMT_VINFO_RELEVANT_P (stmt_info)
&& !STMT_VINFO_LIVE_P (stmt_info))
{
*************** vect_transform_loop (loop_vec_info loop_
*** 6038,6041 ****
--- 6312,6317 ----
if (vect_print_dump_info (REPORT_VECTORIZED_LOOPS))
fprintf (vect_dump, "LOOP VECTORIZED.");
+ if (loop->inner && vect_print_dump_info (REPORT_VECTORIZED_LOOPS))
+ fprintf (vect_dump, "OUTER LOOP VECTORIZED.");
}
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2007-08-12 15:02 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-08-08 21:23 [patch] [4.3 projects] outer-loop vectorization patch 1/n Dorit Nuzman
2007-08-09 11:58 ` Dorit Nuzman
2007-08-12 15:02 ` Dorit Nuzman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).