From: Dorit Nuzman <DORIT@il.ibm.com>
To: Dorit Nuzman <DORIT@il.ibm.com>
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [patch] [4.3 projects] outer-loop vectorization patch 1/n
Date: Thu, 09 Aug 2007 11:58:00 -0000 [thread overview]
Message-ID: <OF94EE711F.937FD020-ONC2257332.004186D9-C2257332.004205D1@il.ibm.com> (raw)
In-Reply-To: <OFAA5B43DE.5A53A693-ONC2257331.00666354-C2257331.0075A438@il.ibm.com>
>
> Hi,
>
> This patch is the first part of
> http://gcc.gnu.org/ml/gcc-patches/2007-08/msg00461.html. It adds initial
> support for outer-loop vectorization. It basicaly brings over this patch:
> http://gcc.gnu.org/ml/gcc-patches/2007-04/msg00044.html, along with some
> fixes that went in later.
> This patch can vectorize outer-loops only if there are no
memory-references
> in the inner-loop.
>
> The patch includes the following changes to the vectorizer:
>
> 1) So far we supported single-BB loops (+empty latch), so the order by
> which we traversed the loop BBs did not matter. Now, it does - we sort in
> BBs in dfs order (since we don't allow if's in the loop, this should
> guarantee visiting defs before their uses).
>
> 2) vect_analyze_loop_form was extend to allow a restricted form of
> outer-loops. We currently support doubly-nested loops that consist of a
> header, a single inner(most)-loop, a tail, and an empty latch (5 BBs all
> together).
>
the following bit was missing from vect_analyze_loop_form: it's not enough
to check that the inner-loop bound is countable, it also needs to be
invariant in the outer-loop:
*************** vect_analyze_loop_form (struct loop *loo
*** 3052,3057 ****
--- 3080,3095 ----
return NULL;
}
+ if (!expr_invariant_in_loop_p (loop,
+ LOOP_VINFO_NITERS
(inner_loop_vinfo)))
+ {
+ if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
+ fprintf (vect_dump,
+ "not vectorized: inner-loop count not invariant.");
+ destroy_loop_vec_info (inner_loop_vinfo, true);
+ return NULL;
+ }
+
if (loop->num_nodes != 5)
{
if (vect_print_dump_info (REPORT_BAD_FORM_LOOPS))
Index: vect-outer-2d.c
===================================================================
--- vect-outer-2d.c (revision 0)
+++ vect-outer-2d.c (revision 0)
@@ -0,0 +1,41 @@
+/* { dg-require-effective-target vect_float } */
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 40
+float image[N][N][N+1] __attribute__ ((__aligned__(16)));
+
+void
+foo (){
+ int i,j,k;
+
+ for (k=0; k<N; k++) {
+ for (i = 0; i < N; i++) {
+ for (j = 0; j < i+1; j++) {
+ image[k][j][i] = j+i+k;
+ }
+ }
+ }
+}
+
+int main (void)
+{
+ check_vect ();
+ int i, j, k;
+
+ foo ();
+
+ for (k=0; k<N; k++) {
+ for (i = 0; i < N; i++) {
+ for (j = 0; j < i+1; j++) {
+ if (image[k][j][i] != j+i+k)
+ abort ();
+ }
+ }
+ }
+
+ return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "OUTER LOOP VECTORIZED" 0 "vect" } }
*/
+/* { dg-final { cleanup-tree-dump "vect" } } */
dorit
> 3) vect_analyze_loop_form calls a new function - vect_analyze_loop_1 - to
> do a few analyses on the inner-loop (currently only one analysis:
> analyze_loop_form), and to build a loop_info for the inner-loop. It is
> destroyed soon after, but w/o destroying the stmt_info's that were set up
> for the inner-loop stmts. Maybe later we'll keep the inner-loop_info
> around, if needed.
>
> 4) Support for outer-loops breaks the assumption that phi nodes are only
in
> the loop-header, and represent a scalar-cycle (induction or reduction).
In
> outer-loops we also have phi-nodes inside the loop - these are the
> loop-closed phis after the inner-loop. This required a way to distinguish
> between these two kinds of phis (we use 'is_loop_header_bb_p' for that),
> and a few small changes in several places:
> o new_stmt_vec_info: different def-type initialization for the two kinds
of
> phis
> o vect_is_simple_reduction: the uses that are not the reduction-variable
> can now be defined by a phi, though not a loop-header phi.
> o vect_recog_dot_prod_pattern: a vect_loop_def might be a phi, and not
> necessarily a gimple_modify_stmt.
> o vect_get_vec_def_for_oprnd: a vect_loop_def can be a phi node, and not
> necessarily a gimple_modify_stmt.
>
> 5) the enum "relevant" has two new values -
> vect_used_in_outer[_by_reduction], which are propagated during the
> mark_relevant pass.
>
> 6) since we don't yet support multiple-data-types in the inner-loop, we
> check in all relevant places, that this is not the case.
>
> The more significant changes are to vectorization of reduction and
> induction. In both cases we need to be aware of whether the
> induction/reduction-phi that we are vectorizing is in the same nest that
is
> being vectorized, or is 'nested_in_vect_loop' (is inside the inner-loop
> while vectorizing the outer-loop):
>
> 7) vectorization of induction: In get_initial_def_for_induction, if this
is
> a 'nested_in_vect_loop' case, then:
> o the initialization vector can be obtained using
> vect_get_vec_def_for_operand (does not need to be built from scratch).
> o the vector that holds the step of the vectorized induction is {S,S,S,S}
> rather than {VF*S,VF*S,VF*S,VF*S} (where S is the step of the induction),
> because in the vectorized inner-loop we are advancing sequentially
(though
> in parallel for VF outer-loop iterations).
> o the final vector for inductions is recorded in the corresponding
> loop-exit phi (of the inner-loop) so that we can easily obtain it when we
> vectorize stmts in the outer-loop that use it.
>
> 8) vectorization of reduction: The main thing here is that we don't need
to
> reduce the reduction to a single result; the final vector of partial
> results will feed the vector operations that may use it in the
outer-loop.
> So:
> o In get_initial_def_for_reduction, we may return a vector for the epilog
> adjustment, rather than a scalar.
> o epilog_for_reduction - skip the part that computes the final scalar
> result in case this is a 'nested_in_vect_loop' case.
> o and in vectorizable_reduction, we don't check that the reduction is
> LIVE_P anymore (used out of the loop), cause it may be not used outside
the
> (outer) loop, but used inside the outer-loop (so as far as the inner-loop
> reduction is concerned, it is used_in_outer_loop, but not live).
>
> Bootstrpped on powerpc64-linux,
> bootstrapped with vectorization enabled on i386-linux,
> passed full regression testing on both platforms.
>
> I will wait at least a week to give people a chance to review and
comment.
>
> thanks,
> dorit
>
> ChangeLog:
>
> * tree-vectorizer.h (vect_is_simple_reduction): Takes a
> loop_vec_info
> as argument instead of struct loop.
> (nested_in_vect_loop_p): New function.
> (vect_relevant): Add enum values vect_used_in_outer_by_reduction
> and
> vect_used_in_outer.
> (is_loop_header_bb_p): New. Used to differentiate loop-header
phis
> from other phis in the loop.
> (destroy_loop_vec_info): Add additional argument to declaration.
>
> * tree-vectorizer.c (supportable_widening_operation): Also check
if
> nested_in_vect_loop_p (don't allow changing the order in this
> case).
> (vect_is_simple_reduction): Takes a loop_vec_info as argument
> instead
> of struct loop. Call nested_in_vect_loop_p and don't require
> flag_unsafe_math_optimizations if it returns true.
> * tree-vectorizer.c (new_stmt_vec_info): When setting def_type
for
> phis differentiate loop-header phis from other phis.
> (bb_in_loop_p): New function.
> (new_loop_vec_info): Inner-loop phis already have a stmt_vinfo,
so
> just
> update their loop_vinfo. Order of BB traversal now matters -
call
> dfs_enumerate_from with bb_in_loop_p.
> (destroy_loop_vec_info): Takes additional argument to control
> whether
> stmt_vinfo of the loop stmts should be destroyed as well.
> (vect_is_simple_reduction): Allow the "non-reduction" use of a
> reduction stmt to be defines by a non loop-header phi.
> (vectorize_loops): Call destroy_loop_vec_info with additional
> argument.
>
> * tree-vect-transform.c (vectorizable_reduction): Call
> nested_in_vect_loop_p. Check for multitypes in the inner-loop.
> (vectorizable_call): Likewise.
> (vectorizable_conversion): Likewise.
> (vectorizable_operation): Likewise.
> (vectorizable_type_promotion): Likewise.
> (vectorizable_type_demotion): Likewise.
> (vectorizable_store): Likewise.
> (vectorizable_live_operation): Likewise.
> (vectorizable_reduction): Likewise. Also pass loop_info to
> vect_is_simple_reduction instead of loop.
> (vect_init_vector): Call nested_in_vect_loop_p.
> (get_initial_def_for_reduction): Likewise.
> (vect_create_epilog_for_reduction): Likewise.
> (vect_init_vector): Check which loop to work with, in case
there's
> an
> inner-loop.
> (get_initial_def_for_inducion): Extend to handle outer-loop
> vectorization. Fix indentation.
> (vect_get_vec_def_for_operand): Support phis in the case
> vect_loop_def.
> In the case vect_induction_def get the vector def from the
> induction
> phi node, instead of calling get_initial_def_for_inducion.
> (get_initial_def_for_reduction): Extend to handle outer-loop
> vectorization.
> (vect_create_epilog_for_reduction): Extend to handle outer-loop
> vectorization.
> (vect_transform_loop): Change assert to just skip this case. Add
a
> dump printout.
> (vect_finish_stmt_generation): Add a couple asserts.
>
> (vect_estimate_min_profitable_iters): Multiply
> cost of inner-loop stmts (in outer-loop vectorization) by
estimated
> inner-loop bound.
> (vect_model_reduction_cost): Don't add reduction epilogue cost in
> case
> this is an inner-loop reduction in outer-loop vectorization.
>
> * tree-vect-analyze.c (vect_analyze_scalar_cycles_1): New
function.
> Same code as what used to be vect_analyze_scalar_cycles, only
with
> additional argument loop, and loop_info passed to
> vect_is_simple_reduction instead of loop.
> (vect_analyze_scalar_cycles): Code factored out into
> vect_analyze_scalar_cycles_1. Call it for each relevant
loop-nest.
> Updated documentation.
> (analyze_operations): Check for inner-loop loop-closed exit-phis
> during
> outer-loop vectorization that are live or not used in the
> outerloop,
> cause this requires special handling.
> (vect_enhance_data_refs_alignment): Don't consider versioning for
> nested-loops.
> (vect_analyze_data_refs): Check that there are no datarefs in the
> inner-loop.
> (vect_mark_stmts_to_be_vectorized): Also consider
> vect_used_in_outer
> and vect_used_in_outer_by_reduction cases.
> (process_use): Also consider the case of outer-loop stmt defining
> an
> inner-loop stmt and vice versa.
> (vect_analyze_loop_1): New function.
> (vect_analyze_loop_form): Extend, to allow a restricted form of
> nested
> loops. Call vect_analyze_loop_1.
> (vect_analyze_loop): Skip (inner-)loops within outer-loops that
> have
> been vectorized. Call destroy_loop_vec_info with additional
> argument.
>
> * tree-vect-patterns.c (vect_recog_widen_sum_pattern): Don't
allow
> in the inner-loop when doing outer-loop vectorization. Add
> documentation and printout.
> (vect_recog_dot_prod_pattern): Likewise. Also add check for
> GIMPLE_MODIFY_STMT (in case we encounter a phi in the loop).
>
> testsuite/ChangeLog:
>
> * gcc.dg/vect/vect.exp: Compile tests with -fno-tree-scev-cprop
> and -fno-tree-reassoc.
> * gcc.dg/vect/no-tree-scev-cprop-vect-iv-1.c: Moved to...
> * gcc.dg/vect/no-scevccp-vect-iv-1.c: New test.
> * gcc.dg/vect/no-tree-scev-cprop-vect-iv-2.c: Moved to...
> * gcc.dg/vect/no-scevccp-vect-iv-2.c: New test.
> * gcc.dg/vect/no-scevccp-noreassoc-outer-1.c: New test.
> * gcc.dg/vect/no-scevccp-noreassoc-outer-2.c: New test.
> * gcc.dg/vect/no-scevccp-noreassoc-outer-3.c: New test.
> * gcc.dg/vect/no-scevccp-noreassoc-outer-4.c: New test.
> * gcc.dg/vect/no-scevccp-noreassoc-outer-5.c: New test.
> * gcc.dg/vect/no-scevccp-outer-1.c: New test.
> * gcc.dg/vect/no-scevccp-outer-2.c: New test.
> * gcc.dg/vect/no-scevccp-outer-3.c: New test.
> * gcc.dg/vect/no-scevccp-outer-4.c: New test.
> * gcc.dg/vect/no-scevccp-outer-5.c: New test.
> * gcc.dg/vect/no-scevccp-outer-6.c: New test.
> * gcc.dg/vect/no-scevccp-outer-7.c: New test.
> * gcc.dg/vect/no-scevccp-outer-8.c: New test.
> * gcc.dg/vect/no-scevccp-outer-9.c: New test.
> * gcc.dg/vect/no-scevccp-outer-9a.c: New test.
> * gcc.dg/vect/no-scevccp-outer-9b.c: New test.
> * gcc.dg/vect/no-scevccp-outer-10.c: New test.
> * gcc.dg/vect/no-scevccp-outer-10a.c: New test.
> * gcc.dg/vect/no-scevccp-outer-10b.c: New test.
> * gcc.dg/vect/no-scevccp-outer-11.c: New test.
> * gcc.dg/vect/no-scevccp-outer-12.c: New test.
> * gcc.dg/vect/no-scevccp-outer-13.c: New test.
> * gcc.dg/vect/no-scevccp-outer-14.c: New test.
> * gcc.dg/vect/no-scevccp-outer-15.c: New test.
> * gcc.dg/vect/no-scevccp-outer-16.c: New test.
> * gcc.dg/vect/no-scevccp-outer-17.c: New test.
> * gcc.dg/vect/no-scevccp-outer-18.c: New test.
> * gcc.dg/vect/no-scevccp-outer-19.c: New test.
> * gcc.dg/vect/no-scevccp-outer-20.c: New test.
> * gcc.dg/vect/no-scevccp-outer-21.c: New test.
> * gcc.dg/vect/no-scevccp-outer-22.c: New test.
>
> (See attached file: mainlineouterloopdiff1t.txt)[attachment
> "mainlineouterloopdiff1t.txt" deleted by Dorit Nuzman/Haifa/IBM]
next prev parent reply other threads:[~2007-08-09 11:58 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2007-08-08 21:23 Dorit Nuzman
2007-08-09 11:58 ` Dorit Nuzman [this message]
2007-08-12 15:02 ` Dorit Nuzman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=OF94EE711F.937FD020-ONC2257332.004186D9-C2257332.004205D1@il.ibm.com \
--to=dorit@il.ibm.com \
--cc=gcc-patches@gcc.gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).