* [patch] [4.3 projects] outer-loop vectorization patch 3/n
@ 2007-08-11 17:10 Dorit Nuzman
0 siblings, 0 replies; only message in thread
From: Dorit Nuzman @ 2007-08-11 17:10 UTC (permalink / raw)
To: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 4471 bytes --]
Hi,
This part is not specific to outer-loop vectorization, but may needed even
more when doing outer-loop vectorization. This back to the issue raised
here: http://gcc.gnu.org/ml/gcc-patches/2007-06/msg00136.html. In cases
that the compiler can't prove that the loop-count is non-zero, it returns
"scev_not_known", and vectorization fails cause it thinks that the loop is
not countable. Instead, with this patch, we have the number-of-iterations
analysis return both the loop-bound expression and the maybe_zero
expression, and from them build a COND_EXPR that represents the loop count.
i.e. : "lb = may_be_zero ? zero : lb".
The one limitation is that currently the compiler does not let us generate
COND_EXPR as a rhs (at least until this patch is committed:
http://gcc.gnu.org/ml/gcc-patches/2007-07/msg02052.html), so we have to
prevent situations in which the vectorizer would actually generate stmts
using this expression (e.g. peeling the loop).
Bootstrapped on powerpc64-linux,
bootstrapped with vectorization enabled on i386-linux,
passed full regression testing on both platforms.
This part needs approval.
thanks,
dorit
ChangeLog:
* tree-scalar-evolutions.c (number_of_latch_executions_1): New.
Contains code that was factored out from
number_of_latch_executions.
(number_of_latch_executions): Call number_of_latch_executions_1
instead of code that was factored out.
(number_of_exit_cond_executions): Takes additional argument
may_be_zero. Call number_of_latch_executions_1 instead of
number_of_latch_executions.
* tree-scalar-evolutions.h (number_of_exit_cond_executions): Takes
additional argument.
* tree-vect-analyze.c (vect_analyze_operations): Avoid
vectorization
in case the loop-bound is a COND_EXPR and peeling needs to be done.
(vect_get_loop_niters): Call number_of_exit_cond_executions with
additional argument may_be_zero. Use it to build a (potentially)
conditional expression for the loop niters.
testsuite/ChangeLog:
* gcc.dg/vect/vect-outer-fir-lb.c: First loop now gets vectorized.
P.S.1 One example where this helps is in this case:
for (k = 0; k < 4; k++) {
for (i = 0; i < N; i++) {
diff = 0;
j = k;
do {
diff += in[j+i]*coeff[j];
j+=4;
} while (j < M);
out[i] += diff;
}
}
(taken from vect-outer-fir-lb.c, as posted here:
http://gcc.gnu.org/ml/gcc-patches/2007-08/msg00742.html).
Without the patch we get:
"
Analyzing # of iterations of loop 4
exit condition [k_45 + 4, + , 4](no_overflow) <= 63
bounds on difference of bases: -2147483584 ... 2147483707
result:
zero if k_45 > 63
# of iterations (63 - (unsigned int) k_45) /[fl] 4, bounded by
536870927
(set_nb_iterations_in_loop = scev_not_known))
(get_loop_exit_condition
if (j_19 <= 63))
vect-outer-fir-lb.c:33: note: not vectorized: number of iterations cannot
be computed.
"
With the patch, the loop gets vectorized.
Of-course, what would really be good is if range propoagation could help
the number-of-iterations analysis realize that since k<4 it holds that
k<127 and so the loop-bound is never zero.
P.S.2 When the above loop is written this way:
for (k = 0; k < 4; k++) {
for (i = 0; i < N; i++) {
diff = 0;
for (j = k; j < M; j+=4) {
diff += in[j+i]*coeff[j];
}
out[i] += diff;
}
}
(i.e. the inner-loop is a for-loop instead of a do-while loop; taken from
vect-outer-fir.c, as posted here:
http://gcc.gnu.org/ml/gcc-patches/2007-08/msg00729.html), the compiler
returns the following expression:
"
Analyzing # of iterations of loop 4
exit condition [k_45 + 4, + , 4](no_overflow) <= 127
bounds on difference of bases: -4 ... 2147483771
result:
# of iterations (127 - (unsigned int) k_45) /[fl] 4, bounded by
536870943
(set_nb_iterations_in_loop = (127 - (unsigned int) k_45) /[fl] 4))
vect-outer-fir.c:29: note: ==> get_loop_niters:(127 - (unsigned int) k_45)
/[fl] 4 + 1
"
...but although there isn't a maybe_zero part, the compiler still thinks
that the inner-loop may iterate 0 times, and so it generates a guard code
around it to skip the inner-loop in case it's loop-bound it 0 (which the
outer-loop vectorizer doesn't like...):
outer-loop-header:
...
if (k_45 <= 127)
then goto inner-loop
else
goto outer-loop tail
inner-loop:
...
outer-loop tail:
...
(See attached file: p4.txt)
[-- Attachment #2: p4.txt --]
[-- Type: text/plain, Size: 8246 bytes --]
*** tree-vect-analyze.c.p3 2007-08-07 09:32:38.000000000 +0300
--- tree-vect-analyze.c 2007-08-09 14:34:58.000000000 +0300
*************** vect_analyze_operations (loop_vec_info l
*** 551,561 ****
"not vectorized: can't create epilog loop 1.");
return false;
}
if (!slpeel_can_duplicate_loop_p (loop, single_exit (loop)))
{
if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
fprintf (vect_dump,
! "not vectorized: can't create epilog loop 2.");
return false;
}
}
--- 551,570 ----
"not vectorized: can't create epilog loop 1.");
return false;
}
+ if (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
+ && LOOP_VINFO_NITERS (loop_vinfo)
+ && TREE_CODE (LOOP_VINFO_NITERS (loop_vinfo)) == COND_EXPR)
+ {
+ if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
+ fprintf (vect_dump,
+ "not vectorized: can't create epilog loop 2.");
+ return false;
+ }
if (!slpeel_can_duplicate_loop_p (loop, single_exit (loop)))
{
if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS))
fprintf (vect_dump,
! "not vectorized: can't create epilog loop 3.");
return false;
}
}
*************** static tree
*** 2911,2927 ****
vect_get_loop_niters (struct loop *loop, tree *number_of_iterations)
{
tree niters;
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "=== get_loop_niters ===");
! niters = number_of_exit_cond_executions (loop);
! if (niters != NULL_TREE
! && niters != chrec_dont_know)
{
! *number_of_iterations = niters;
if (vect_print_dump_info (REPORT_DETAILS))
{
fprintf (vect_dump, "==> get_loop_niters:" );
--- 2920,2955 ----
vect_get_loop_niters (struct loop *loop, tree *number_of_iterations)
{
tree niters;
+ tree may_be_zero;
if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, "=== get_loop_niters ===");
! niters = number_of_exit_cond_executions (loop, &may_be_zero);
! if (niters != NULL_TREE && niters != chrec_dont_know)
{
! tree type = TREE_TYPE (niters);
! tree zero = build_int_cst (type, 0);
+ *number_of_iterations = niters;
+ if (may_be_zero)
+ {
+ if (integer_nonzerop (may_be_zero))
+ *number_of_iterations = zero;
+ else if (integer_zerop (may_be_zero))
+ ;
+ else
+ {
+ /* CHECKME: other things to check? */
+ if (COMPARISON_CLASS_P (may_be_zero))
+ *number_of_iterations =
+ build3 (COND_EXPR, type, may_be_zero, zero, niters);
+ else
+ *number_of_iterations = chrec_dont_know;
+ }
+ }
+
if (vect_print_dump_info (REPORT_DETAILS))
{
fprintf (vect_dump, "==> get_loop_niters:" );
Index: tree-scalar-evolution.c
===================================================================
*** tree-scalar-evolution.c (revision 127356)
--- tree-scalar-evolution.c (working copy)
*************** resolve_mixers (struct loop *loop, tree
*** 2282,2299 ****
and EF (6) = 53. In this case the value of 'b' on exit is '53' and
the loop body has been executed 6 times. */
! tree
! number_of_latch_executions (struct loop *loop)
{
! tree res, type;
edge exit;
struct tree_niter_desc niter_desc;
/* Determine whether the number_of_iterations_in_loop has already
been computed. */
res = loop->nb_iterations;
if (res)
return res;
res = chrec_dont_know;
if (dump_file && (dump_flags & TDF_DETAILS))
--- 2282,2302 ----
and EF (6) = 53. In this case the value of 'b' on exit is '53' and
the loop body has been executed 6 times. */
! static tree
! number_of_latch_executions_1 (struct loop *loop, tree *may_be_zero)
{
! tree res;
edge exit;
struct tree_niter_desc niter_desc;
+ *may_be_zero = NULL_TREE;
+
/* Determine whether the number_of_iterations_in_loop has already
been computed. */
res = loop->nb_iterations;
if (res)
return res;
+
res = chrec_dont_know;
if (dump_file && (dump_flags & TDF_DETAILS))
*************** number_of_latch_executions (struct loop
*** 2306,2323 ****
if (!number_of_iterations_exit (loop, exit, &niter_desc, false))
goto end;
! type = TREE_TYPE (niter_desc.niter);
! if (integer_nonzerop (niter_desc.may_be_zero))
! res = build_int_cst (type, 0);
! else if (integer_zerop (niter_desc.may_be_zero))
! res = niter_desc.niter;
! else
! res = chrec_dont_know;
end:
return set_nb_iterations_in_loop (loop, res);
}
/* Returns the number of executions of the exit condition of LOOP,
i.e., the number by one higher than number_of_latch_executions.
Note that unline number_of_latch_executions, this number does
--- 2309,2344 ----
if (!number_of_iterations_exit (loop, exit, &niter_desc, false))
goto end;
! *may_be_zero = niter_desc.may_be_zero;
! res = niter_desc.niter;
end:
return set_nb_iterations_in_loop (loop, res);
}
+
+ tree
+ number_of_latch_executions (struct loop *loop)
+ {
+ tree res, type;
+ tree may_be_zero;
+
+ res = number_of_latch_executions_1 (loop, &may_be_zero);
+
+ if (may_be_zero)
+ {
+ type = TREE_TYPE (res);
+ if (integer_nonzerop (may_be_zero))
+ res = build_int_cst (type, 0);
+ else if (integer_zerop (may_be_zero))
+ ;
+ else
+ res = chrec_dont_know;
+ }
+
+ return res;
+ }
+
/* Returns the number of executions of the exit condition of LOOP,
i.e., the number by one higher than number_of_latch_executions.
Note that unline number_of_latch_executions, this number does
*************** end:
*** 2329,2337 ****
the possible overflow. */
tree
! number_of_exit_cond_executions (struct loop *loop)
{
! tree ret = number_of_latch_executions (loop);
tree type = chrec_type (ret);
if (chrec_contains_undetermined (ret))
--- 2350,2358 ----
the possible overflow. */
tree
! number_of_exit_cond_executions (struct loop *loop, tree *may_be_zero)
{
! tree ret = number_of_latch_executions_1 (loop, may_be_zero);
tree type = chrec_type (ret);
if (chrec_contains_undetermined (ret))
Index: tree-scalar-evolution.h
===================================================================
*** tree-scalar-evolution.h (revision 127356)
--- tree-scalar-evolution.h (working copy)
*************** along with GCC; see the file COPYING3.
*** 22,28 ****
#define GCC_TREE_SCALAR_EVOLUTION_H
extern tree number_of_latch_executions (struct loop *);
! extern tree number_of_exit_cond_executions (struct loop *);
extern tree get_loop_exit_condition (struct loop *);
extern void scev_initialize (void);
--- 22,28 ----
#define GCC_TREE_SCALAR_EVOLUTION_H
extern tree number_of_latch_executions (struct loop *);
! extern tree number_of_exit_cond_executions (struct loop *, tree *);
extern tree get_loop_exit_condition (struct loop *);
extern void scev_initialize (void);
*** testsuite/gcc.dg/vect/vect-outer-fir-lb.c 2007-08-11 19:48:20.000000000 +0300
--- testsuite/gcc.dg/vect/vect-outer-fir-lb.c.p4 2007-08-11 19:48:36.000000000 +0300
*************** float out[N];
*** 11,19 ****
float fir_out[N];
/* Should be vectorized. Fixed misaligment in the inner-loop. */
- /* Currently not vectorized because the loop-count for the inner-loop
- has a maybe_zero component. Will be fixed when we incorporate the
- "cond_expr in rhs" patch. */
void foo (){
int i,j,k;
float diff;
--- 11,16 ----
*************** int main (void)
*** 75,80 ****
return 0;
}
! /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 2 "vect" { xfail *-*-* } } } */
! /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 1 "vect" { xfail vect_no_align } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
--- 72,76 ----
return 0;
}
! /* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 2 "vect" { xfail vect_no_align } } } */
/* { dg-final { cleanup-tree-dump "vect" } } */
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2007-08-11 17:10 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-08-11 17:10 [patch] [4.3 projects] outer-loop vectorization patch 3/n Dorit Nuzman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).