* [PATCH] Make BB vectorizer work on sub-BBs
@ 2015-11-06 9:43 Richard Biener
2015-11-06 11:10 ` Richard Biener
0 siblings, 1 reply; 8+ messages in thread
From: Richard Biener @ 2015-11-06 9:43 UTC (permalink / raw)
To: gcc-patches
The following patch makes the BB vectorizer not only handle BB heads
(until the first stmt with a data reference it cannot handle) but
arbitrary regions in a BB separated by such stmts.
This improves the number of BB vectorizations from 469 to 556
in a quick test on SPEC CPU 2006 with -Ofast on x86_64 and
1x400.perlbench 1x410.bwaves 1x416.gamess 1x450.soplex 1x453.povray
1x481.wrf failing both patched and unpatched (have to update my
config used for such experiments it seems ...)
Bootstrapped and tested on x86_64-unknown-linux-gnu, aarch64 cross built.
I'm currently re-testing for a cosmetic change I made when writing
the changelog.
I expected (and there are) some issues with compile-time. Left
is unpatched and right is patched.
'403.gcc': 00:00:54 (54) | '403.gcc': 00:00:55 (55)
'483.xalancbmk': 00:02:20 (140) | '483.xalancbmk': 00:02:24 (144)
'416.gamess': 00:02:36 (156) | '416.gamess': 00:02:37 (157)
'435.gromacs': 00:00:18 (18) | '435.gromacs': 00:00:19 (19)
'447.dealII': 00:01:31 (91) | '447.dealII': 00:01:33 (93)
'453.povray': 00:04:54 (294) | '453.povray': 00:08:54 (534)
'454.calculix': 00:00:34 (34) | '454.calculix': 00:00:52 (52)
'481.wrf': 00:01:57 (117) | '481.wrf': 00:01:59 (119)
other benchmarks are unchanged. I'm double-checking now that a followup
patch I have which re-implements BB vectorization dependence checking
fixes this (that's the only quadraticness I know of).
Richard.
2015-11-06 Richard Biener <rguenther@suse.de>
* tree-vectorizer.h (struct _bb_vec_info): Add region_begin/end
members.
(vect_stmt_in_region_p): Declare.
* tree-vect-slp.c (new_bb_vec_info): Work on a region.
(destroy_bb_vec_info): Likewise.
(vect_bb_slp_scalar_cost): Use vect_stmt_in_region_p.
(vect_get_and_check_slp_defs): Likewise.
(vect_slp_analyze_bb_1): Refactor to make it work on sub-BBs.
(vect_slp_bb): Likewise.
* tree-vect-patterns.c (vect_same_loop_or_bb_p): Implement
in terms of vect_stmt_in_region_p.
(vect_pattern_recog): Iterate over the BB region.
* tree-vect-stmts.c (vect_is_simple_use): Use vect_stmt_in_region_p.
* tree-vectorizer.c (vect_stmt_in_region_p): New function.
(pass_slp_vectorize::execute): Initialize all stmt UIDs to -1.
* config/i386/i386.c: Include gimple-iterator.h.
* config/aarch64/aarch64.c: Likewise.
* gcc.dg/vect/bb-slp-38.c: New testcase.
Index: gcc/tree-vectorizer.h
===================================================================
*** gcc/tree-vectorizer.h.orig 2015-11-05 09:52:00.640227178 +0100
--- gcc/tree-vectorizer.h 2015-11-05 13:20:58.385786476 +0100
*************** nested_in_vect_loop_p (struct loop *loop
*** 390,395 ****
--- 390,397 ----
typedef struct _bb_vec_info : public vec_info
{
basic_block bb;
+ gimple_stmt_iterator region_begin;
+ gimple_stmt_iterator region_end;
} *bb_vec_info;
#define BB_VINFO_BB(B) (B)->bb
*************** void vect_pattern_recog (vec_info *);
*** 1085,1089 ****
--- 1087,1092 ----
/* In tree-vectorizer.c. */
unsigned vectorize_loops (void);
void vect_destroy_datarefs (vec_info *);
+ bool vect_stmt_in_region_p (vec_info *, gimple *);
#endif /* GCC_TREE_VECTORIZER_H */
Index: gcc/tree-vect-slp.c
===================================================================
*** gcc/tree-vect-slp.c.orig 2015-11-05 09:52:00.640227178 +0100
--- gcc/tree-vect-slp.c 2015-11-06 10:22:56.707880233 +0100
*************** vect_get_and_check_slp_defs (vec_info *v
*** 209,215 ****
unsigned int i, number_of_oprnds;
gimple *def_stmt;
enum vect_def_type dt = vect_uninitialized_def;
- struct loop *loop = NULL;
bool pattern = false;
slp_oprnd_info oprnd_info;
int first_op_idx = 1;
--- 209,214 ----
*************** vect_get_and_check_slp_defs (vec_info *v
*** 218,226 ****
bool first = stmt_num == 0;
bool second = stmt_num == 1;
- if (is_a <loop_vec_info> (vinfo))
- loop = LOOP_VINFO_LOOP (as_a <loop_vec_info> (vinfo));
-
if (is_gimple_call (stmt))
{
number_of_oprnds = gimple_call_num_args (stmt);
--- 217,222 ----
*************** again:
*** 276,286 ****
from the pattern. Check that all the stmts of the node are in the
pattern. */
if (def_stmt && gimple_bb (def_stmt)
! && ((is_a <loop_vec_info> (vinfo)
! && flow_bb_inside_loop_p (loop, gimple_bb (def_stmt)))
! || (is_a <bb_vec_info> (vinfo)
! && gimple_bb (def_stmt) == as_a <bb_vec_info> (vinfo)->bb
! && gimple_code (def_stmt) != GIMPLE_PHI))
&& vinfo_for_stmt (def_stmt)
&& STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (def_stmt))
&& !STMT_VINFO_RELEVANT (vinfo_for_stmt (def_stmt))
--- 272,278 ----
from the pattern. Check that all the stmts of the node are in the
pattern. */
if (def_stmt && gimple_bb (def_stmt)
! && vect_stmt_in_region_p (vinfo, def_stmt)
&& vinfo_for_stmt (def_stmt)
&& STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (def_stmt))
&& !STMT_VINFO_RELEVANT (vinfo_for_stmt (def_stmt))
*************** vect_detect_hybrid_slp (loop_vec_info lo
*** 2076,2091 ****
stmt_vec_info structs for all the stmts in it. */
static bb_vec_info
! new_bb_vec_info (basic_block bb)
{
bb_vec_info res = NULL;
gimple_stmt_iterator gsi;
res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info));
res->kind = vec_info::bb;
BB_VINFO_BB (res) = bb;
! for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
{
gimple *stmt = gsi_stmt (gsi);
gimple_set_uid (stmt, 0);
--- 2068,2088 ----
stmt_vec_info structs for all the stmts in it. */
static bb_vec_info
! new_bb_vec_info (gimple_stmt_iterator region_begin,
! gimple_stmt_iterator region_end)
{
+ basic_block bb = gsi_bb (region_begin);
bb_vec_info res = NULL;
gimple_stmt_iterator gsi;
res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info));
res->kind = vec_info::bb;
BB_VINFO_BB (res) = bb;
+ res->region_begin = region_begin;
+ res->region_end = region_end;
! for (gsi = region_begin; gsi_stmt (gsi) != gsi_stmt (region_end);
! gsi_next (&gsi))
{
gimple *stmt = gsi_stmt (gsi);
gimple_set_uid (stmt, 0);
*************** destroy_bb_vec_info (bb_vec_info bb_vinf
*** 2118,2124 ****
bb = BB_VINFO_BB (bb_vinfo);
! for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
{
gimple *stmt = gsi_stmt (si);
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
--- 2115,2122 ----
bb = BB_VINFO_BB (bb_vinfo);
! for (si = bb_vinfo->region_begin;
! gsi_stmt (si) != gsi_stmt (bb_vinfo->region_end); gsi_next (&si))
{
gimple *stmt = gsi_stmt (si);
stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
*************** destroy_bb_vec_info (bb_vec_info bb_vinf
*** 2126,2131 ****
--- 2124,2132 ----
if (stmt_info)
/* Free stmt_vec_info. */
free_stmt_vec_info (stmt);
+
+ /* Reset region marker. */
+ gimple_set_uid (stmt, -1);
}
vect_destroy_datarefs (bb_vinfo);
*************** vect_bb_slp_scalar_cost (basic_block bb,
*** 2247,2254 ****
gimple *use_stmt;
FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p))
if (!is_gimple_debug (use_stmt)
! && (gimple_code (use_stmt) == GIMPLE_PHI
! || gimple_bb (use_stmt) != bb
|| !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (use_stmt))))
{
(*life)[i] = true;
--- 2248,2255 ----
gimple *use_stmt;
FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p))
if (!is_gimple_debug (use_stmt)
! && (! vect_stmt_in_region_p (vinfo_for_stmt (stmt)->vinfo,
! use_stmt)
|| !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (use_stmt))))
{
(*life)[i] = true;
*************** vect_bb_vectorization_profitable_p (bb_v
*** 2327,2366 ****
/* Check if the basic block can be vectorized. */
static bb_vec_info
! vect_slp_analyze_bb_1 (basic_block bb)
{
bb_vec_info bb_vinfo;
vec<slp_instance> slp_instances;
slp_instance instance;
int i;
int min_vf = 2;
- unsigned n_stmts = 0;
! bb_vinfo = new_bb_vec_info (bb);
if (!bb_vinfo)
return NULL;
! /* Gather all data references in the basic-block. */
!
! for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
! !gsi_end_p (gsi); gsi_next (&gsi))
! {
! gimple *stmt = gsi_stmt (gsi);
! if (is_gimple_debug (stmt))
! continue;
! ++n_stmts;
! if (!find_data_references_in_stmt (NULL, stmt,
! &BB_VINFO_DATAREFS (bb_vinfo)))
! {
! /* Mark the rest of the basic-block as unvectorizable. */
! for (; !gsi_end_p (gsi); gsi_next (&gsi))
! {
! stmt = gsi_stmt (gsi);
! STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt)) = false;
! }
! break;
! }
! }
/* Analyze the data references. */
--- 2328,2358 ----
/* Check if the basic block can be vectorized. */
static bb_vec_info
! vect_slp_analyze_bb_1 (gimple_stmt_iterator region_begin,
! gimple_stmt_iterator region_end,
! vec<data_reference_p> datarefs, int n_stmts)
{
bb_vec_info bb_vinfo;
vec<slp_instance> slp_instances;
slp_instance instance;
int i;
int min_vf = 2;
! if (n_stmts > PARAM_VALUE (PARAM_SLP_MAX_INSNS_IN_BB))
! {
! if (dump_enabled_p ())
! dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
! "not vectorized: too many instructions in "
! "basic block.\n");
! free_data_refs (datarefs);
! return NULL;
! }
!
! bb_vinfo = new_bb_vec_info (region_begin, region_end);
if (!bb_vinfo)
return NULL;
! BB_VINFO_DATAREFS (bb_vinfo) = datarefs;
/* Analyze the data references. */
*************** vect_slp_analyze_bb_1 (basic_block bb)
*** 2438,2445 ****
}
/* Mark all the statements that we do not want to vectorize. */
! for (gimple_stmt_iterator gsi = gsi_start_bb (BB_VINFO_BB (bb_vinfo));
! !gsi_end_p (gsi); gsi_next (&gsi))
{
stmt_vec_info vinfo = vinfo_for_stmt (gsi_stmt (gsi));
if (STMT_SLP_TYPE (vinfo) != pure_slp)
--- 2430,2437 ----
}
/* Mark all the statements that we do not want to vectorize. */
! for (gimple_stmt_iterator gsi = bb_vinfo->region_begin;
! gsi_stmt (gsi) != gsi_stmt (bb_vinfo->region_end); gsi_next (&gsi))
{
stmt_vec_info vinfo = vinfo_for_stmt (gsi_stmt (gsi));
if (STMT_SLP_TYPE (vinfo) != pure_slp)
*************** bool
*** 2509,2585 ****
vect_slp_bb (basic_block bb)
{
bb_vec_info bb_vinfo;
- int insns = 0;
gimple_stmt_iterator gsi;
unsigned int vector_sizes;
if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location, "===vect_slp_analyze_bb===\n");
- for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
- {
- gimple *stmt = gsi_stmt (gsi);
- if (!is_gimple_debug (stmt)
- && !gimple_nop_p (stmt)
- && gimple_code (stmt) != GIMPLE_LABEL)
- insns++;
- if (gimple_location (stmt) != UNKNOWN_LOCATION)
- vect_location = gimple_location (stmt);
- }
-
- if (insns > PARAM_VALUE (PARAM_SLP_MAX_INSNS_IN_BB))
- {
- if (dump_enabled_p ())
- dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
- "not vectorized: too many instructions in "
- "basic block.\n");
-
- return false;
- }
-
/* Autodetect first vector size we try. */
current_vector_size = 0;
vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
while (1)
{
! bb_vinfo = vect_slp_analyze_bb_1 (bb);
! if (bb_vinfo)
{
! if (!dbg_cnt (vect_slp))
! {
! destroy_bb_vec_info (bb_vinfo);
! return false;
! }
if (dump_enabled_p ())
! dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB\n");
vect_schedule_slp (bb_vinfo);
if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
! "BASIC BLOCK VECTORIZED\n");
destroy_bb_vec_info (bb_vinfo);
! return true;
}
! destroy_bb_vec_info (bb_vinfo);
vector_sizes &= ~current_vector_size;
! if (vector_sizes == 0
! || current_vector_size == 0)
! return false;
! /* Try the next biggest vector size. */
! current_vector_size = 1 << floor_log2 (vector_sizes);
! if (dump_enabled_p ())
! dump_printf_loc (MSG_NOTE, vect_location,
! "***** Re-trying analysis with "
! "vector size %d\n", current_vector_size);
}
}
--- 2501,2605 ----
vect_slp_bb (basic_block bb)
{
bb_vec_info bb_vinfo;
gimple_stmt_iterator gsi;
unsigned int vector_sizes;
+ bool any_vectorized = false;
if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location, "===vect_slp_analyze_bb===\n");
/* Autodetect first vector size we try. */
current_vector_size = 0;
vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
+ gsi = gsi_start_bb (bb);
+
while (1)
{
! if (gsi_end_p (gsi))
! break;
!
! gimple_stmt_iterator region_begin = gsi;
! vec<data_reference_p> datarefs = vNULL;
! int insns = 0;
!
! for (; !gsi_end_p (gsi); gsi_next (&gsi))
{
! gimple *stmt = gsi_stmt (gsi);
! if (is_gimple_debug (stmt))
! continue;
! insns++;
!
! if (gimple_location (stmt) != UNKNOWN_LOCATION)
! vect_location = gimple_location (stmt);
!
! if (!find_data_references_in_stmt (NULL, stmt, &datarefs))
! break;
! }
!
! /* Skip leading unhandled stmts. */
! if (gsi_stmt (region_begin) == gsi_stmt (gsi))
! {
! gsi_next (&gsi);
! continue;
! }
!
! gimple_stmt_iterator region_end = gsi;
+ bool vectorized = false;
+ bb_vinfo = vect_slp_analyze_bb_1 (region_begin, region_end,
+ datarefs, insns);
+ if (bb_vinfo
+ && dbg_cnt (vect_slp))
+ {
if (dump_enabled_p ())
! dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB part\n");
vect_schedule_slp (bb_vinfo);
if (dump_enabled_p ())
dump_printf_loc (MSG_NOTE, vect_location,
! "basic block part vectorized\n");
destroy_bb_vec_info (bb_vinfo);
! vectorized = true;
}
+ else
+ destroy_bb_vec_info (bb_vinfo);
! any_vectorized |= vectorized;
vector_sizes &= ~current_vector_size;
! if (vectorized
! || vector_sizes == 0
! || current_vector_size == 0)
! {
! if (gsi_end_p (region_end))
! break;
!
! /* Skip the unhandled stmt. */
! gsi_next (&gsi);
!
! /* And reset vector sizes. */
! current_vector_size = 0;
! vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
! }
! else
! {
! /* Try the next biggest vector size. */
! current_vector_size = 1 << floor_log2 (vector_sizes);
! if (dump_enabled_p ())
! dump_printf_loc (MSG_NOTE, vect_location,
! "***** Re-trying analysis with "
! "vector size %d\n", current_vector_size);
! /* Start over. */
! gsi = region_begin;
! }
}
+
+ return any_vectorized;
}
Index: gcc/tree-vect-patterns.c
===================================================================
*** gcc/tree-vect-patterns.c.orig 2015-11-05 09:52:00.640227178 +0100
--- gcc/tree-vect-patterns.c 2015-11-05 13:25:46.060011765 +0100
*************** static bool
*** 107,133 ****
vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2)
{
stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1);
! loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
! bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_vinfo);
!
! if (!gimple_bb (stmt2))
! return false;
!
! if (loop_vinfo)
! {
! struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
! if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt2)))
! return false;
! }
! else
! {
! if (gimple_bb (stmt2) != BB_VINFO_BB (bb_vinfo)
! || gimple_code (stmt2) == GIMPLE_PHI)
! return false;
! }
!
! gcc_assert (vinfo_for_stmt (stmt2));
! return true;
}
/* If the LHS of DEF_STMT has a single use, and that statement is
--- 107,113 ----
vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2)
{
stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1);
! return vect_stmt_in_region_p (stmt_vinfo->vinfo, stmt2);
}
/* If the LHS of DEF_STMT has a single use, and that statement is
*************** vect_pattern_recog (vec_info *vinfo)
*** 3611,3643 ****
loop = LOOP_VINFO_LOOP (loop_vinfo);
bbs = LOOP_VINFO_BBS (loop_vinfo);
nbbs = loop->num_nodes;
}
else
{
! bbs = &as_a <bb_vec_info> (vinfo)->bb;
! nbbs = 1;
! }
!
! /* Scan through the loop stmts, applying the pattern recognition
! functions starting at each stmt visited: */
! for (i = 0; i < nbbs; i++)
! {
! basic_block bb = bbs[i];
! for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
! {
! if (is_a <bb_vec_info> (vinfo)
! && (stmt = gsi_stmt (si))
&& vinfo_for_stmt (stmt)
&& !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt)))
! continue;
! /* Scan over all generic vect_recog_xxx_pattern functions. */
! for (j = 0; j < NUM_PATTERNS; j++)
! {
vect_recog_func = vect_vect_recog_func_ptrs[j];
vect_pattern_recog_1 (vect_recog_func, si,
&stmts_to_replace);
! }
! }
}
}
--- 3591,3632 ----
loop = LOOP_VINFO_LOOP (loop_vinfo);
bbs = LOOP_VINFO_BBS (loop_vinfo);
nbbs = loop->num_nodes;
+
+ /* Scan through the loop stmts, applying the pattern recognition
+ functions starting at each stmt visited: */
+ for (i = 0; i < nbbs; i++)
+ {
+ basic_block bb = bbs[i];
+ for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
+ {
+ /* Scan over all generic vect_recog_xxx_pattern functions. */
+ for (j = 0; j < NUM_PATTERNS; j++)
+ {
+ vect_recog_func = vect_vect_recog_func_ptrs[j];
+ vect_pattern_recog_1 (vect_recog_func, si,
+ &stmts_to_replace);
+ }
+ }
+ }
}
else
{
! bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo);
! for (si = bb_vinfo->region_begin;
! gsi_stmt (si) != gsi_stmt (bb_vinfo->region_end); gsi_next (&si))
! {
! if ((stmt = gsi_stmt (si))
&& vinfo_for_stmt (stmt)
&& !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt)))
! continue;
! /* Scan over all generic vect_recog_xxx_pattern functions. */
! for (j = 0; j < NUM_PATTERNS; j++)
! {
vect_recog_func = vect_vect_recog_func_ptrs[j];
vect_pattern_recog_1 (vect_recog_func, si,
&stmts_to_replace);
! }
! }
}
}
Index: gcc/config/i386/i386.c
===================================================================
*** gcc/config/i386/i386.c.orig 2015-11-05 09:52:42.239687133 +0100
--- gcc/config/i386/i386.c 2015-11-05 11:09:09.451774562 +0100
*************** along with GCC; see the file COPYING3.
*** 64,69 ****
--- 64,70 ----
#include "context.h"
#include "pass_manager.h"
#include "target-globals.h"
+ #include "gimple-iterator.h"
#include "tree-vectorizer.h"
#include "shrink-wrap.h"
#include "builtins.h"
Index: gcc/testsuite/gcc.dg/vect/bb-slp-38.c
===================================================================
*** /dev/null 1970-01-01 00:00:00.000000000 +0000
--- gcc/testsuite/gcc.dg/vect/bb-slp-38.c 2015-11-05 14:00:48.177644327 +0100
***************
*** 0 ****
--- 1,44 ----
+ /* { dg-require-effective-target vect_int } */
+
+ #include "tree-vect.h"
+
+ extern void abort (void);
+
+ int a[8], b[8];
+ int x;
+
+ void __attribute__((noinline,noclone))
+ bar (void)
+ {
+ x = 1;
+ }
+
+ void __attribute__((noinline,noclone))
+ foo(void)
+ {
+ a[0] = b[0];
+ a[1] = b[0];
+ a[2] = b[3];
+ a[3] = b[3];
+ bar ();
+ a[4] = b[4];
+ a[5] = b[7];
+ a[6] = b[4];
+ a[7] = b[7];
+ }
+
+ int main()
+ {
+ int i;
+ check_vect ();
+ for (i = 0; i < 8; ++i)
+ b[i] = i;
+ foo ();
+ if (a[0] != 0 || a[1] != 0 || a[2] != 3 || a[3] != 3
+ || a[4] != 4 || a[5] != 7 || a[6] != 4 || a[7] != 7)
+ abort ();
+ return 0;
+ }
+
+ /* { dg-final { scan-tree-dump "basic block vectorized" "slp2" { target vect_perm } } } */
+ /* { dg-final { scan-tree-dump-times "basic block part vectorized" 2 "slp2" { target vect_perm } } } */
Index: gcc/tree-vect-stmts.c
===================================================================
*** gcc/tree-vect-stmts.c.orig 2015-11-02 12:37:11.074249388 +0100
--- gcc/tree-vect-stmts.c 2015-11-05 13:29:21.413423692 +0100
*************** vect_is_simple_use (tree operand, vec_in
*** 8196,8207 ****
dump_gimple_stmt (MSG_NOTE, TDF_SLIM, *def_stmt, 0);
}
! basic_block bb = gimple_bb (*def_stmt);
! if ((is_a <loop_vec_info> (vinfo)
! && !flow_bb_inside_loop_p (as_a <loop_vec_info> (vinfo)->loop, bb))
! || (is_a <bb_vec_info> (vinfo)
! && (bb != as_a <bb_vec_info> (vinfo)->bb
! || gimple_code (*def_stmt) == GIMPLE_PHI)))
*dt = vect_external_def;
else
{
--- 8196,8202 ----
dump_gimple_stmt (MSG_NOTE, TDF_SLIM, *def_stmt, 0);
}
! if (! vect_stmt_in_region_p (vinfo, *def_stmt))
*dt = vect_external_def;
else
{
Index: gcc/tree-vectorizer.c
===================================================================
*** gcc/tree-vectorizer.c.orig 2015-11-04 09:23:53.724687806 +0100
--- gcc/tree-vectorizer.c 2015-11-05 13:55:08.299817570 +0100
*************** vect_destroy_datarefs (vec_info *vinfo)
*** 350,355 ****
--- 350,382 ----
}
+ /* Return whether STMT is inside the region we try to vectorize. */
+
+ bool
+ vect_stmt_in_region_p (vec_info *vinfo, gimple *stmt)
+ {
+ if (!gimple_bb (stmt))
+ return false;
+
+ if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
+ {
+ struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
+ if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt)))
+ return false;
+ }
+ else
+ {
+ bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo);
+ if (gimple_bb (stmt) != BB_VINFO_BB (bb_vinfo)
+ || gimple_uid (stmt) == -1U
+ || gimple_code (stmt) == GIMPLE_PHI)
+ return false;
+ }
+
+ return true;
+ }
+
+
/* If LOOP has been versioned during ifcvt, return the internal call
guarding it. */
*************** pass_slp_vectorize::execute (function *f
*** 692,697 ****
--- 719,732 ----
scev_initialize ();
}
+ /* Mark all stmts as not belonging to the current region. */
+ FOR_EACH_BB_FN (bb, fun)
+ {
+ for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
+ gsi_next (&gsi))
+ gimple_set_uid (gsi_stmt (gsi), -1);
+ }
+
init_stmt_vec_info_vec ();
FOR_EACH_BB_FN (bb, fun)
Index: gcc/config/aarch64/aarch64.c
===================================================================
*** gcc/config/aarch64/aarch64.c.orig 2015-10-28 11:22:25.290823112 +0100
--- gcc/config/aarch64/aarch64.c 2015-11-06 10:24:21.539818027 +0100
***************
*** 52,57 ****
--- 52,58 ----
#include "params.h"
#include "gimplify.h"
#include "dwarf2.h"
+ #include "gimple-iterator.h"
#include "tree-vectorizer.h"
#include "aarch64-cost-tables.h"
#include "dumpfile.h"
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Make BB vectorizer work on sub-BBs
2015-11-06 9:43 [PATCH] Make BB vectorizer work on sub-BBs Richard Biener
@ 2015-11-06 11:10 ` Richard Biener
2015-11-06 11:12 ` Kyrill Tkachov
2015-11-06 16:13 ` Jeff Law
0 siblings, 2 replies; 8+ messages in thread
From: Richard Biener @ 2015-11-06 11:10 UTC (permalink / raw)
To: gcc-patches
On Fri, 6 Nov 2015, Richard Biener wrote:
>
> The following patch makes the BB vectorizer not only handle BB heads
> (until the first stmt with a data reference it cannot handle) but
> arbitrary regions in a BB separated by such stmts.
>
> This improves the number of BB vectorizations from 469 to 556
> in a quick test on SPEC CPU 2006 with -Ofast on x86_64 and
> 1x400.perlbench 1x410.bwaves 1x416.gamess 1x450.soplex 1x453.povray
> 1x481.wrf failing both patched and unpatched (have to update my
> config used for such experiments it seems ...)
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, aarch64 cross built.
>
> I'm currently re-testing for a cosmetic change I made when writing
> the changelog.
>
> I expected (and there are) some issues with compile-time. Left
> is unpatched and right is patched.
>
> '403.gcc': 00:00:54 (54) | '403.gcc': 00:00:55 (55)
> '483.xalancbmk': 00:02:20 (140) | '483.xalancbmk': 00:02:24 (144)
> '416.gamess': 00:02:36 (156) | '416.gamess': 00:02:37 (157)
> '435.gromacs': 00:00:18 (18) | '435.gromacs': 00:00:19 (19)
> '447.dealII': 00:01:31 (91) | '447.dealII': 00:01:33 (93)
> '453.povray': 00:04:54 (294) | '453.povray': 00:08:54 (534)
> '454.calculix': 00:00:34 (34) | '454.calculix': 00:00:52 (52)
> '481.wrf': 00:01:57 (117) | '481.wrf': 00:01:59 (119)
>
> other benchmarks are unchanged. I'm double-checking now that a followup
> patch I have which re-implements BB vectorization dependence checking
> fixes this (that's the only quadraticness I know of).
Fixes all but
'453.povray': 00:04:54 (294) | '453.povray': 00:06:46 (406)
it even improves compile-time on some:
'464.h264ref': 00:00:26 (26) | '464.h264ref': 00:00:21 (21)
it also increases the number of vectorized BBs to 722.
Needs some work still though.
Richard.
> Richard.
>
> 2015-11-06 Richard Biener <rguenther@suse.de>
>
> * tree-vectorizer.h (struct _bb_vec_info): Add region_begin/end
> members.
> (vect_stmt_in_region_p): Declare.
> * tree-vect-slp.c (new_bb_vec_info): Work on a region.
> (destroy_bb_vec_info): Likewise.
> (vect_bb_slp_scalar_cost): Use vect_stmt_in_region_p.
> (vect_get_and_check_slp_defs): Likewise.
> (vect_slp_analyze_bb_1): Refactor to make it work on sub-BBs.
> (vect_slp_bb): Likewise.
> * tree-vect-patterns.c (vect_same_loop_or_bb_p): Implement
> in terms of vect_stmt_in_region_p.
> (vect_pattern_recog): Iterate over the BB region.
> * tree-vect-stmts.c (vect_is_simple_use): Use vect_stmt_in_region_p.
> * tree-vectorizer.c (vect_stmt_in_region_p): New function.
> (pass_slp_vectorize::execute): Initialize all stmt UIDs to -1.
>
> * config/i386/i386.c: Include gimple-iterator.h.
> * config/aarch64/aarch64.c: Likewise.
>
> * gcc.dg/vect/bb-slp-38.c: New testcase.
>
> Index: gcc/tree-vectorizer.h
> ===================================================================
> *** gcc/tree-vectorizer.h.orig 2015-11-05 09:52:00.640227178 +0100
> --- gcc/tree-vectorizer.h 2015-11-05 13:20:58.385786476 +0100
> *************** nested_in_vect_loop_p (struct loop *loop
> *** 390,395 ****
> --- 390,397 ----
> typedef struct _bb_vec_info : public vec_info
> {
> basic_block bb;
> + gimple_stmt_iterator region_begin;
> + gimple_stmt_iterator region_end;
> } *bb_vec_info;
>
> #define BB_VINFO_BB(B) (B)->bb
> *************** void vect_pattern_recog (vec_info *);
> *** 1085,1089 ****
> --- 1087,1092 ----
> /* In tree-vectorizer.c. */
> unsigned vectorize_loops (void);
> void vect_destroy_datarefs (vec_info *);
> + bool vect_stmt_in_region_p (vec_info *, gimple *);
>
> #endif /* GCC_TREE_VECTORIZER_H */
> Index: gcc/tree-vect-slp.c
> ===================================================================
> *** gcc/tree-vect-slp.c.orig 2015-11-05 09:52:00.640227178 +0100
> --- gcc/tree-vect-slp.c 2015-11-06 10:22:56.707880233 +0100
> *************** vect_get_and_check_slp_defs (vec_info *v
> *** 209,215 ****
> unsigned int i, number_of_oprnds;
> gimple *def_stmt;
> enum vect_def_type dt = vect_uninitialized_def;
> - struct loop *loop = NULL;
> bool pattern = false;
> slp_oprnd_info oprnd_info;
> int first_op_idx = 1;
> --- 209,214 ----
> *************** vect_get_and_check_slp_defs (vec_info *v
> *** 218,226 ****
> bool first = stmt_num == 0;
> bool second = stmt_num == 1;
>
> - if (is_a <loop_vec_info> (vinfo))
> - loop = LOOP_VINFO_LOOP (as_a <loop_vec_info> (vinfo));
> -
> if (is_gimple_call (stmt))
> {
> number_of_oprnds = gimple_call_num_args (stmt);
> --- 217,222 ----
> *************** again:
> *** 276,286 ****
> from the pattern. Check that all the stmts of the node are in the
> pattern. */
> if (def_stmt && gimple_bb (def_stmt)
> ! && ((is_a <loop_vec_info> (vinfo)
> ! && flow_bb_inside_loop_p (loop, gimple_bb (def_stmt)))
> ! || (is_a <bb_vec_info> (vinfo)
> ! && gimple_bb (def_stmt) == as_a <bb_vec_info> (vinfo)->bb
> ! && gimple_code (def_stmt) != GIMPLE_PHI))
> && vinfo_for_stmt (def_stmt)
> && STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (def_stmt))
> && !STMT_VINFO_RELEVANT (vinfo_for_stmt (def_stmt))
> --- 272,278 ----
> from the pattern. Check that all the stmts of the node are in the
> pattern. */
> if (def_stmt && gimple_bb (def_stmt)
> ! && vect_stmt_in_region_p (vinfo, def_stmt)
> && vinfo_for_stmt (def_stmt)
> && STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (def_stmt))
> && !STMT_VINFO_RELEVANT (vinfo_for_stmt (def_stmt))
> *************** vect_detect_hybrid_slp (loop_vec_info lo
> *** 2076,2091 ****
> stmt_vec_info structs for all the stmts in it. */
>
> static bb_vec_info
> ! new_bb_vec_info (basic_block bb)
> {
> bb_vec_info res = NULL;
> gimple_stmt_iterator gsi;
>
> res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info));
> res->kind = vec_info::bb;
> BB_VINFO_BB (res) = bb;
>
> ! for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> {
> gimple *stmt = gsi_stmt (gsi);
> gimple_set_uid (stmt, 0);
> --- 2068,2088 ----
> stmt_vec_info structs for all the stmts in it. */
>
> static bb_vec_info
> ! new_bb_vec_info (gimple_stmt_iterator region_begin,
> ! gimple_stmt_iterator region_end)
> {
> + basic_block bb = gsi_bb (region_begin);
> bb_vec_info res = NULL;
> gimple_stmt_iterator gsi;
>
> res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info));
> res->kind = vec_info::bb;
> BB_VINFO_BB (res) = bb;
> + res->region_begin = region_begin;
> + res->region_end = region_end;
>
> ! for (gsi = region_begin; gsi_stmt (gsi) != gsi_stmt (region_end);
> ! gsi_next (&gsi))
> {
> gimple *stmt = gsi_stmt (gsi);
> gimple_set_uid (stmt, 0);
> *************** destroy_bb_vec_info (bb_vec_info bb_vinf
> *** 2118,2124 ****
>
> bb = BB_VINFO_BB (bb_vinfo);
>
> ! for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
> {
> gimple *stmt = gsi_stmt (si);
> stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> --- 2115,2122 ----
>
> bb = BB_VINFO_BB (bb_vinfo);
>
> ! for (si = bb_vinfo->region_begin;
> ! gsi_stmt (si) != gsi_stmt (bb_vinfo->region_end); gsi_next (&si))
> {
> gimple *stmt = gsi_stmt (si);
> stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> *************** destroy_bb_vec_info (bb_vec_info bb_vinf
> *** 2126,2131 ****
> --- 2124,2132 ----
> if (stmt_info)
> /* Free stmt_vec_info. */
> free_stmt_vec_info (stmt);
> +
> + /* Reset region marker. */
> + gimple_set_uid (stmt, -1);
> }
>
> vect_destroy_datarefs (bb_vinfo);
> *************** vect_bb_slp_scalar_cost (basic_block bb,
> *** 2247,2254 ****
> gimple *use_stmt;
> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p))
> if (!is_gimple_debug (use_stmt)
> ! && (gimple_code (use_stmt) == GIMPLE_PHI
> ! || gimple_bb (use_stmt) != bb
> || !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (use_stmt))))
> {
> (*life)[i] = true;
> --- 2248,2255 ----
> gimple *use_stmt;
> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p))
> if (!is_gimple_debug (use_stmt)
> ! && (! vect_stmt_in_region_p (vinfo_for_stmt (stmt)->vinfo,
> ! use_stmt)
> || !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (use_stmt))))
> {
> (*life)[i] = true;
> *************** vect_bb_vectorization_profitable_p (bb_v
> *** 2327,2366 ****
> /* Check if the basic block can be vectorized. */
>
> static bb_vec_info
> ! vect_slp_analyze_bb_1 (basic_block bb)
> {
> bb_vec_info bb_vinfo;
> vec<slp_instance> slp_instances;
> slp_instance instance;
> int i;
> int min_vf = 2;
> - unsigned n_stmts = 0;
>
> ! bb_vinfo = new_bb_vec_info (bb);
> if (!bb_vinfo)
> return NULL;
>
> ! /* Gather all data references in the basic-block. */
> !
> ! for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
> ! !gsi_end_p (gsi); gsi_next (&gsi))
> ! {
> ! gimple *stmt = gsi_stmt (gsi);
> ! if (is_gimple_debug (stmt))
> ! continue;
> ! ++n_stmts;
> ! if (!find_data_references_in_stmt (NULL, stmt,
> ! &BB_VINFO_DATAREFS (bb_vinfo)))
> ! {
> ! /* Mark the rest of the basic-block as unvectorizable. */
> ! for (; !gsi_end_p (gsi); gsi_next (&gsi))
> ! {
> ! stmt = gsi_stmt (gsi);
> ! STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt)) = false;
> ! }
> ! break;
> ! }
> ! }
>
> /* Analyze the data references. */
>
> --- 2328,2358 ----
> /* Check if the basic block can be vectorized. */
>
> static bb_vec_info
> ! vect_slp_analyze_bb_1 (gimple_stmt_iterator region_begin,
> ! gimple_stmt_iterator region_end,
> ! vec<data_reference_p> datarefs, int n_stmts)
> {
> bb_vec_info bb_vinfo;
> vec<slp_instance> slp_instances;
> slp_instance instance;
> int i;
> int min_vf = 2;
>
> ! if (n_stmts > PARAM_VALUE (PARAM_SLP_MAX_INSNS_IN_BB))
> ! {
> ! if (dump_enabled_p ())
> ! dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> ! "not vectorized: too many instructions in "
> ! "basic block.\n");
> ! free_data_refs (datarefs);
> ! return NULL;
> ! }
> !
> ! bb_vinfo = new_bb_vec_info (region_begin, region_end);
> if (!bb_vinfo)
> return NULL;
>
> ! BB_VINFO_DATAREFS (bb_vinfo) = datarefs;
>
> /* Analyze the data references. */
>
> *************** vect_slp_analyze_bb_1 (basic_block bb)
> *** 2438,2445 ****
> }
>
> /* Mark all the statements that we do not want to vectorize. */
> ! for (gimple_stmt_iterator gsi = gsi_start_bb (BB_VINFO_BB (bb_vinfo));
> ! !gsi_end_p (gsi); gsi_next (&gsi))
> {
> stmt_vec_info vinfo = vinfo_for_stmt (gsi_stmt (gsi));
> if (STMT_SLP_TYPE (vinfo) != pure_slp)
> --- 2430,2437 ----
> }
>
> /* Mark all the statements that we do not want to vectorize. */
> ! for (gimple_stmt_iterator gsi = bb_vinfo->region_begin;
> ! gsi_stmt (gsi) != gsi_stmt (bb_vinfo->region_end); gsi_next (&gsi))
> {
> stmt_vec_info vinfo = vinfo_for_stmt (gsi_stmt (gsi));
> if (STMT_SLP_TYPE (vinfo) != pure_slp)
> *************** bool
> *** 2509,2585 ****
> vect_slp_bb (basic_block bb)
> {
> bb_vec_info bb_vinfo;
> - int insns = 0;
> gimple_stmt_iterator gsi;
> unsigned int vector_sizes;
>
> if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location, "===vect_slp_analyze_bb===\n");
>
> - for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> - {
> - gimple *stmt = gsi_stmt (gsi);
> - if (!is_gimple_debug (stmt)
> - && !gimple_nop_p (stmt)
> - && gimple_code (stmt) != GIMPLE_LABEL)
> - insns++;
> - if (gimple_location (stmt) != UNKNOWN_LOCATION)
> - vect_location = gimple_location (stmt);
> - }
> -
> - if (insns > PARAM_VALUE (PARAM_SLP_MAX_INSNS_IN_BB))
> - {
> - if (dump_enabled_p ())
> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> - "not vectorized: too many instructions in "
> - "basic block.\n");
> -
> - return false;
> - }
> -
> /* Autodetect first vector size we try. */
> current_vector_size = 0;
> vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
>
> while (1)
> {
> ! bb_vinfo = vect_slp_analyze_bb_1 (bb);
> ! if (bb_vinfo)
> {
> ! if (!dbg_cnt (vect_slp))
> ! {
> ! destroy_bb_vec_info (bb_vinfo);
> ! return false;
> ! }
>
> if (dump_enabled_p ())
> ! dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB\n");
>
> vect_schedule_slp (bb_vinfo);
>
> if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location,
> ! "BASIC BLOCK VECTORIZED\n");
>
> destroy_bb_vec_info (bb_vinfo);
>
> ! return true;
> }
>
> ! destroy_bb_vec_info (bb_vinfo);
>
> vector_sizes &= ~current_vector_size;
> ! if (vector_sizes == 0
> ! || current_vector_size == 0)
> ! return false;
>
> ! /* Try the next biggest vector size. */
> ! current_vector_size = 1 << floor_log2 (vector_sizes);
> ! if (dump_enabled_p ())
> ! dump_printf_loc (MSG_NOTE, vect_location,
> ! "***** Re-trying analysis with "
> ! "vector size %d\n", current_vector_size);
> }
> }
>
>
> --- 2501,2605 ----
> vect_slp_bb (basic_block bb)
> {
> bb_vec_info bb_vinfo;
> gimple_stmt_iterator gsi;
> unsigned int vector_sizes;
> + bool any_vectorized = false;
>
> if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location, "===vect_slp_analyze_bb===\n");
>
> /* Autodetect first vector size we try. */
> current_vector_size = 0;
> vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
>
> + gsi = gsi_start_bb (bb);
> +
> while (1)
> {
> ! if (gsi_end_p (gsi))
> ! break;
> !
> ! gimple_stmt_iterator region_begin = gsi;
> ! vec<data_reference_p> datarefs = vNULL;
> ! int insns = 0;
> !
> ! for (; !gsi_end_p (gsi); gsi_next (&gsi))
> {
> ! gimple *stmt = gsi_stmt (gsi);
> ! if (is_gimple_debug (stmt))
> ! continue;
> ! insns++;
> !
> ! if (gimple_location (stmt) != UNKNOWN_LOCATION)
> ! vect_location = gimple_location (stmt);
> !
> ! if (!find_data_references_in_stmt (NULL, stmt, &datarefs))
> ! break;
> ! }
> !
> ! /* Skip leading unhandled stmts. */
> ! if (gsi_stmt (region_begin) == gsi_stmt (gsi))
> ! {
> ! gsi_next (&gsi);
> ! continue;
> ! }
> !
> ! gimple_stmt_iterator region_end = gsi;
>
> + bool vectorized = false;
> + bb_vinfo = vect_slp_analyze_bb_1 (region_begin, region_end,
> + datarefs, insns);
> + if (bb_vinfo
> + && dbg_cnt (vect_slp))
> + {
> if (dump_enabled_p ())
> ! dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB part\n");
>
> vect_schedule_slp (bb_vinfo);
>
> if (dump_enabled_p ())
> dump_printf_loc (MSG_NOTE, vect_location,
> ! "basic block part vectorized\n");
>
> destroy_bb_vec_info (bb_vinfo);
>
> ! vectorized = true;
> }
> + else
> + destroy_bb_vec_info (bb_vinfo);
>
> ! any_vectorized |= vectorized;
>
> vector_sizes &= ~current_vector_size;
> ! if (vectorized
> ! || vector_sizes == 0
> ! || current_vector_size == 0)
> ! {
> ! if (gsi_end_p (region_end))
> ! break;
> !
> ! /* Skip the unhandled stmt. */
> ! gsi_next (&gsi);
> !
> ! /* And reset vector sizes. */
> ! current_vector_size = 0;
> ! vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
> ! }
> ! else
> ! {
> ! /* Try the next biggest vector size. */
> ! current_vector_size = 1 << floor_log2 (vector_sizes);
> ! if (dump_enabled_p ())
> ! dump_printf_loc (MSG_NOTE, vect_location,
> ! "***** Re-trying analysis with "
> ! "vector size %d\n", current_vector_size);
>
> ! /* Start over. */
> ! gsi = region_begin;
> ! }
> }
> +
> + return any_vectorized;
> }
>
>
> Index: gcc/tree-vect-patterns.c
> ===================================================================
> *** gcc/tree-vect-patterns.c.orig 2015-11-05 09:52:00.640227178 +0100
> --- gcc/tree-vect-patterns.c 2015-11-05 13:25:46.060011765 +0100
> *************** static bool
> *** 107,133 ****
> vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2)
> {
> stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1);
> ! loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
> ! bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_vinfo);
> !
> ! if (!gimple_bb (stmt2))
> ! return false;
> !
> ! if (loop_vinfo)
> ! {
> ! struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> ! if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt2)))
> ! return false;
> ! }
> ! else
> ! {
> ! if (gimple_bb (stmt2) != BB_VINFO_BB (bb_vinfo)
> ! || gimple_code (stmt2) == GIMPLE_PHI)
> ! return false;
> ! }
> !
> ! gcc_assert (vinfo_for_stmt (stmt2));
> ! return true;
> }
>
> /* If the LHS of DEF_STMT has a single use, and that statement is
> --- 107,113 ----
> vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2)
> {
> stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1);
> ! return vect_stmt_in_region_p (stmt_vinfo->vinfo, stmt2);
> }
>
> /* If the LHS of DEF_STMT has a single use, and that statement is
> *************** vect_pattern_recog (vec_info *vinfo)
> *** 3611,3643 ****
> loop = LOOP_VINFO_LOOP (loop_vinfo);
> bbs = LOOP_VINFO_BBS (loop_vinfo);
> nbbs = loop->num_nodes;
> }
> else
> {
> ! bbs = &as_a <bb_vec_info> (vinfo)->bb;
> ! nbbs = 1;
> ! }
> !
> ! /* Scan through the loop stmts, applying the pattern recognition
> ! functions starting at each stmt visited: */
> ! for (i = 0; i < nbbs; i++)
> ! {
> ! basic_block bb = bbs[i];
> ! for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
> ! {
> ! if (is_a <bb_vec_info> (vinfo)
> ! && (stmt = gsi_stmt (si))
> && vinfo_for_stmt (stmt)
> && !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt)))
> ! continue;
>
> ! /* Scan over all generic vect_recog_xxx_pattern functions. */
> ! for (j = 0; j < NUM_PATTERNS; j++)
> ! {
> vect_recog_func = vect_vect_recog_func_ptrs[j];
> vect_pattern_recog_1 (vect_recog_func, si,
> &stmts_to_replace);
> ! }
> ! }
> }
> }
> --- 3591,3632 ----
> loop = LOOP_VINFO_LOOP (loop_vinfo);
> bbs = LOOP_VINFO_BBS (loop_vinfo);
> nbbs = loop->num_nodes;
> +
> + /* Scan through the loop stmts, applying the pattern recognition
> + functions starting at each stmt visited: */
> + for (i = 0; i < nbbs; i++)
> + {
> + basic_block bb = bbs[i];
> + for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
> + {
> + /* Scan over all generic vect_recog_xxx_pattern functions. */
> + for (j = 0; j < NUM_PATTERNS; j++)
> + {
> + vect_recog_func = vect_vect_recog_func_ptrs[j];
> + vect_pattern_recog_1 (vect_recog_func, si,
> + &stmts_to_replace);
> + }
> + }
> + }
> }
> else
> {
> ! bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo);
> ! for (si = bb_vinfo->region_begin;
> ! gsi_stmt (si) != gsi_stmt (bb_vinfo->region_end); gsi_next (&si))
> ! {
> ! if ((stmt = gsi_stmt (si))
> && vinfo_for_stmt (stmt)
> && !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt)))
> ! continue;
>
> ! /* Scan over all generic vect_recog_xxx_pattern functions. */
> ! for (j = 0; j < NUM_PATTERNS; j++)
> ! {
> vect_recog_func = vect_vect_recog_func_ptrs[j];
> vect_pattern_recog_1 (vect_recog_func, si,
> &stmts_to_replace);
> ! }
> ! }
> }
> }
> Index: gcc/config/i386/i386.c
> ===================================================================
> *** gcc/config/i386/i386.c.orig 2015-11-05 09:52:42.239687133 +0100
> --- gcc/config/i386/i386.c 2015-11-05 11:09:09.451774562 +0100
> *************** along with GCC; see the file COPYING3.
> *** 64,69 ****
> --- 64,70 ----
> #include "context.h"
> #include "pass_manager.h"
> #include "target-globals.h"
> + #include "gimple-iterator.h"
> #include "tree-vectorizer.h"
> #include "shrink-wrap.h"
> #include "builtins.h"
> Index: gcc/testsuite/gcc.dg/vect/bb-slp-38.c
> ===================================================================
> *** /dev/null 1970-01-01 00:00:00.000000000 +0000
> --- gcc/testsuite/gcc.dg/vect/bb-slp-38.c 2015-11-05 14:00:48.177644327 +0100
> ***************
> *** 0 ****
> --- 1,44 ----
> + /* { dg-require-effective-target vect_int } */
> +
> + #include "tree-vect.h"
> +
> + extern void abort (void);
> +
> + int a[8], b[8];
> + int x;
> +
> + void __attribute__((noinline,noclone))
> + bar (void)
> + {
> + x = 1;
> + }
> +
> + void __attribute__((noinline,noclone))
> + foo(void)
> + {
> + a[0] = b[0];
> + a[1] = b[0];
> + a[2] = b[3];
> + a[3] = b[3];
> + bar ();
> + a[4] = b[4];
> + a[5] = b[7];
> + a[6] = b[4];
> + a[7] = b[7];
> + }
> +
> + int main()
> + {
> + int i;
> + check_vect ();
> + for (i = 0; i < 8; ++i)
> + b[i] = i;
> + foo ();
> + if (a[0] != 0 || a[1] != 0 || a[2] != 3 || a[3] != 3
> + || a[4] != 4 || a[5] != 7 || a[6] != 4 || a[7] != 7)
> + abort ();
> + return 0;
> + }
> +
> + /* { dg-final { scan-tree-dump "basic block vectorized" "slp2" { target vect_perm } } } */
> + /* { dg-final { scan-tree-dump-times "basic block part vectorized" 2 "slp2" { target vect_perm } } } */
> Index: gcc/tree-vect-stmts.c
> ===================================================================
> *** gcc/tree-vect-stmts.c.orig 2015-11-02 12:37:11.074249388 +0100
> --- gcc/tree-vect-stmts.c 2015-11-05 13:29:21.413423692 +0100
> *************** vect_is_simple_use (tree operand, vec_in
> *** 8196,8207 ****
> dump_gimple_stmt (MSG_NOTE, TDF_SLIM, *def_stmt, 0);
> }
>
> ! basic_block bb = gimple_bb (*def_stmt);
> ! if ((is_a <loop_vec_info> (vinfo)
> ! && !flow_bb_inside_loop_p (as_a <loop_vec_info> (vinfo)->loop, bb))
> ! || (is_a <bb_vec_info> (vinfo)
> ! && (bb != as_a <bb_vec_info> (vinfo)->bb
> ! || gimple_code (*def_stmt) == GIMPLE_PHI)))
> *dt = vect_external_def;
> else
> {
> --- 8196,8202 ----
> dump_gimple_stmt (MSG_NOTE, TDF_SLIM, *def_stmt, 0);
> }
>
> ! if (! vect_stmt_in_region_p (vinfo, *def_stmt))
> *dt = vect_external_def;
> else
> {
> Index: gcc/tree-vectorizer.c
> ===================================================================
> *** gcc/tree-vectorizer.c.orig 2015-11-04 09:23:53.724687806 +0100
> --- gcc/tree-vectorizer.c 2015-11-05 13:55:08.299817570 +0100
> *************** vect_destroy_datarefs (vec_info *vinfo)
> *** 350,355 ****
> --- 350,382 ----
> }
>
>
> + /* Return whether STMT is inside the region we try to vectorize. */
> +
> + bool
> + vect_stmt_in_region_p (vec_info *vinfo, gimple *stmt)
> + {
> + if (!gimple_bb (stmt))
> + return false;
> +
> + if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
> + {
> + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> + if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt)))
> + return false;
> + }
> + else
> + {
> + bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo);
> + if (gimple_bb (stmt) != BB_VINFO_BB (bb_vinfo)
> + || gimple_uid (stmt) == -1U
> + || gimple_code (stmt) == GIMPLE_PHI)
> + return false;
> + }
> +
> + return true;
> + }
> +
> +
> /* If LOOP has been versioned during ifcvt, return the internal call
> guarding it. */
>
> *************** pass_slp_vectorize::execute (function *f
> *** 692,697 ****
> --- 719,732 ----
> scev_initialize ();
> }
>
> + /* Mark all stmts as not belonging to the current region. */
> + FOR_EACH_BB_FN (bb, fun)
> + {
> + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
> + gsi_next (&gsi))
> + gimple_set_uid (gsi_stmt (gsi), -1);
> + }
> +
> init_stmt_vec_info_vec ();
>
> FOR_EACH_BB_FN (bb, fun)
> Index: gcc/config/aarch64/aarch64.c
> ===================================================================
> *** gcc/config/aarch64/aarch64.c.orig 2015-10-28 11:22:25.290823112 +0100
> --- gcc/config/aarch64/aarch64.c 2015-11-06 10:24:21.539818027 +0100
> ***************
> *** 52,57 ****
> --- 52,58 ----
> #include "params.h"
> #include "gimplify.h"
> #include "dwarf2.h"
> + #include "gimple-iterator.h"
> #include "tree-vectorizer.h"
> #include "aarch64-cost-tables.h"
> #include "dumpfile.h"
>
--
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Make BB vectorizer work on sub-BBs
2015-11-06 11:10 ` Richard Biener
@ 2015-11-06 11:12 ` Kyrill Tkachov
2015-11-06 11:27 ` Richard Biener
2015-11-10 12:56 ` Christophe Lyon
2015-11-06 16:13 ` Jeff Law
1 sibling, 2 replies; 8+ messages in thread
From: Kyrill Tkachov @ 2015-11-06 11:12 UTC (permalink / raw)
To: Richard Biener, gcc-patches
Hi Richard,
On 06/11/15 11:09, Richard Biener wrote:
> On Fri, 6 Nov 2015, Richard Biener wrote:
>
>> The following patch makes the BB vectorizer not only handle BB heads
>> (until the first stmt with a data reference it cannot handle) but
>> arbitrary regions in a BB separated by such stmts.
>>
>> This improves the number of BB vectorizations from 469 to 556
>> in a quick test on SPEC CPU 2006 with -Ofast on x86_64 and
>> 1x400.perlbench 1x410.bwaves 1x416.gamess 1x450.soplex 1x453.povray
>> 1x481.wrf failing both patched and unpatched (have to update my
>> config used for such experiments it seems ...)
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu, aarch64 cross built.
>>
>> I'm currently re-testing for a cosmetic change I made when writing
>> the changelog.
>>
>> I expected (and there are) some issues with compile-time. Left
>> is unpatched and right is patched.
>>
>> '403.gcc': 00:00:54 (54) | '403.gcc': 00:00:55 (55)
>> '483.xalancbmk': 00:02:20 (140) | '483.xalancbmk': 00:02:24 (144)
>> '416.gamess': 00:02:36 (156) | '416.gamess': 00:02:37 (157)
>> '435.gromacs': 00:00:18 (18) | '435.gromacs': 00:00:19 (19)
>> '447.dealII': 00:01:31 (91) | '447.dealII': 00:01:33 (93)
>> '453.povray': 00:04:54 (294) | '453.povray': 00:08:54 (534)
>> '454.calculix': 00:00:34 (34) | '454.calculix': 00:00:52 (52)
>> '481.wrf': 00:01:57 (117) | '481.wrf': 00:01:59 (119)
>>
>> other benchmarks are unchanged. I'm double-checking now that a followup
>> patch I have which re-implements BB vectorization dependence checking
>> fixes this (that's the only quadraticness I know of).
> Fixes all but
>
> '453.povray': 00:04:54 (294) | '453.povray': 00:06:46 (406)
Note that povray is currently suffering from PR 68198
Kyrill
>
> it even improves compile-time on some:
>
> '464.h264ref': 00:00:26 (26) | '464.h264ref': 00:00:21 (21)
>
> it also increases the number of vectorized BBs to 722.
>
> Needs some work still though.
>
> Richard.
>
>> Richard.
>>
>> 2015-11-06 Richard Biener <rguenther@suse.de>
>>
>> * tree-vectorizer.h (struct _bb_vec_info): Add region_begin/end
>> members.
>> (vect_stmt_in_region_p): Declare.
>> * tree-vect-slp.c (new_bb_vec_info): Work on a region.
>> (destroy_bb_vec_info): Likewise.
>> (vect_bb_slp_scalar_cost): Use vect_stmt_in_region_p.
>> (vect_get_and_check_slp_defs): Likewise.
>> (vect_slp_analyze_bb_1): Refactor to make it work on sub-BBs.
>> (vect_slp_bb): Likewise.
>> * tree-vect-patterns.c (vect_same_loop_or_bb_p): Implement
>> in terms of vect_stmt_in_region_p.
>> (vect_pattern_recog): Iterate over the BB region.
>> * tree-vect-stmts.c (vect_is_simple_use): Use vect_stmt_in_region_p.
>> * tree-vectorizer.c (vect_stmt_in_region_p): New function.
>> (pass_slp_vectorize::execute): Initialize all stmt UIDs to -1.
>>
>> * config/i386/i386.c: Include gimple-iterator.h.
>> * config/aarch64/aarch64.c: Likewise.
>>
>> * gcc.dg/vect/bb-slp-38.c: New testcase.
>>
>> Index: gcc/tree-vectorizer.h
>> ===================================================================
>> *** gcc/tree-vectorizer.h.orig 2015-11-05 09:52:00.640227178 +0100
>> --- gcc/tree-vectorizer.h 2015-11-05 13:20:58.385786476 +0100
>> *************** nested_in_vect_loop_p (struct loop *loop
>> *** 390,395 ****
>> --- 390,397 ----
>> typedef struct _bb_vec_info : public vec_info
>> {
>> basic_block bb;
>> + gimple_stmt_iterator region_begin;
>> + gimple_stmt_iterator region_end;
>> } *bb_vec_info;
>>
>> #define BB_VINFO_BB(B) (B)->bb
>> *************** void vect_pattern_recog (vec_info *);
>> *** 1085,1089 ****
>> --- 1087,1092 ----
>> /* In tree-vectorizer.c. */
>> unsigned vectorize_loops (void);
>> void vect_destroy_datarefs (vec_info *);
>> + bool vect_stmt_in_region_p (vec_info *, gimple *);
>>
>> #endif /* GCC_TREE_VECTORIZER_H */
>> Index: gcc/tree-vect-slp.c
>> ===================================================================
>> *** gcc/tree-vect-slp.c.orig 2015-11-05 09:52:00.640227178 +0100
>> --- gcc/tree-vect-slp.c 2015-11-06 10:22:56.707880233 +0100
>> *************** vect_get_and_check_slp_defs (vec_info *v
>> *** 209,215 ****
>> unsigned int i, number_of_oprnds;
>> gimple *def_stmt;
>> enum vect_def_type dt = vect_uninitialized_def;
>> - struct loop *loop = NULL;
>> bool pattern = false;
>> slp_oprnd_info oprnd_info;
>> int first_op_idx = 1;
>> --- 209,214 ----
>> *************** vect_get_and_check_slp_defs (vec_info *v
>> *** 218,226 ****
>> bool first = stmt_num == 0;
>> bool second = stmt_num == 1;
>>
>> - if (is_a <loop_vec_info> (vinfo))
>> - loop = LOOP_VINFO_LOOP (as_a <loop_vec_info> (vinfo));
>> -
>> if (is_gimple_call (stmt))
>> {
>> number_of_oprnds = gimple_call_num_args (stmt);
>> --- 217,222 ----
>> *************** again:
>> *** 276,286 ****
>> from the pattern. Check that all the stmts of the node are in the
>> pattern. */
>> if (def_stmt && gimple_bb (def_stmt)
>> ! && ((is_a <loop_vec_info> (vinfo)
>> ! && flow_bb_inside_loop_p (loop, gimple_bb (def_stmt)))
>> ! || (is_a <bb_vec_info> (vinfo)
>> ! && gimple_bb (def_stmt) == as_a <bb_vec_info> (vinfo)->bb
>> ! && gimple_code (def_stmt) != GIMPLE_PHI))
>> && vinfo_for_stmt (def_stmt)
>> && STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (def_stmt))
>> && !STMT_VINFO_RELEVANT (vinfo_for_stmt (def_stmt))
>> --- 272,278 ----
>> from the pattern. Check that all the stmts of the node are in the
>> pattern. */
>> if (def_stmt && gimple_bb (def_stmt)
>> ! && vect_stmt_in_region_p (vinfo, def_stmt)
>> && vinfo_for_stmt (def_stmt)
>> && STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (def_stmt))
>> && !STMT_VINFO_RELEVANT (vinfo_for_stmt (def_stmt))
>> *************** vect_detect_hybrid_slp (loop_vec_info lo
>> *** 2076,2091 ****
>> stmt_vec_info structs for all the stmts in it. */
>>
>> static bb_vec_info
>> ! new_bb_vec_info (basic_block bb)
>> {
>> bb_vec_info res = NULL;
>> gimple_stmt_iterator gsi;
>>
>> res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info));
>> res->kind = vec_info::bb;
>> BB_VINFO_BB (res) = bb;
>>
>> ! for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>> {
>> gimple *stmt = gsi_stmt (gsi);
>> gimple_set_uid (stmt, 0);
>> --- 2068,2088 ----
>> stmt_vec_info structs for all the stmts in it. */
>>
>> static bb_vec_info
>> ! new_bb_vec_info (gimple_stmt_iterator region_begin,
>> ! gimple_stmt_iterator region_end)
>> {
>> + basic_block bb = gsi_bb (region_begin);
>> bb_vec_info res = NULL;
>> gimple_stmt_iterator gsi;
>>
>> res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info));
>> res->kind = vec_info::bb;
>> BB_VINFO_BB (res) = bb;
>> + res->region_begin = region_begin;
>> + res->region_end = region_end;
>>
>> ! for (gsi = region_begin; gsi_stmt (gsi) != gsi_stmt (region_end);
>> ! gsi_next (&gsi))
>> {
>> gimple *stmt = gsi_stmt (gsi);
>> gimple_set_uid (stmt, 0);
>> *************** destroy_bb_vec_info (bb_vec_info bb_vinf
>> *** 2118,2124 ****
>>
>> bb = BB_VINFO_BB (bb_vinfo);
>>
>> ! for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
>> {
>> gimple *stmt = gsi_stmt (si);
>> stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>> --- 2115,2122 ----
>>
>> bb = BB_VINFO_BB (bb_vinfo);
>>
>> ! for (si = bb_vinfo->region_begin;
>> ! gsi_stmt (si) != gsi_stmt (bb_vinfo->region_end); gsi_next (&si))
>> {
>> gimple *stmt = gsi_stmt (si);
>> stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>> *************** destroy_bb_vec_info (bb_vec_info bb_vinf
>> *** 2126,2131 ****
>> --- 2124,2132 ----
>> if (stmt_info)
>> /* Free stmt_vec_info. */
>> free_stmt_vec_info (stmt);
>> +
>> + /* Reset region marker. */
>> + gimple_set_uid (stmt, -1);
>> }
>>
>> vect_destroy_datarefs (bb_vinfo);
>> *************** vect_bb_slp_scalar_cost (basic_block bb,
>> *** 2247,2254 ****
>> gimple *use_stmt;
>> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p))
>> if (!is_gimple_debug (use_stmt)
>> ! && (gimple_code (use_stmt) == GIMPLE_PHI
>> ! || gimple_bb (use_stmt) != bb
>> || !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (use_stmt))))
>> {
>> (*life)[i] = true;
>> --- 2248,2255 ----
>> gimple *use_stmt;
>> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p))
>> if (!is_gimple_debug (use_stmt)
>> ! && (! vect_stmt_in_region_p (vinfo_for_stmt (stmt)->vinfo,
>> ! use_stmt)
>> || !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (use_stmt))))
>> {
>> (*life)[i] = true;
>> *************** vect_bb_vectorization_profitable_p (bb_v
>> *** 2327,2366 ****
>> /* Check if the basic block can be vectorized. */
>>
>> static bb_vec_info
>> ! vect_slp_analyze_bb_1 (basic_block bb)
>> {
>> bb_vec_info bb_vinfo;
>> vec<slp_instance> slp_instances;
>> slp_instance instance;
>> int i;
>> int min_vf = 2;
>> - unsigned n_stmts = 0;
>>
>> ! bb_vinfo = new_bb_vec_info (bb);
>> if (!bb_vinfo)
>> return NULL;
>>
>> ! /* Gather all data references in the basic-block. */
>> !
>> ! for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
>> ! !gsi_end_p (gsi); gsi_next (&gsi))
>> ! {
>> ! gimple *stmt = gsi_stmt (gsi);
>> ! if (is_gimple_debug (stmt))
>> ! continue;
>> ! ++n_stmts;
>> ! if (!find_data_references_in_stmt (NULL, stmt,
>> ! &BB_VINFO_DATAREFS (bb_vinfo)))
>> ! {
>> ! /* Mark the rest of the basic-block as unvectorizable. */
>> ! for (; !gsi_end_p (gsi); gsi_next (&gsi))
>> ! {
>> ! stmt = gsi_stmt (gsi);
>> ! STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt)) = false;
>> ! }
>> ! break;
>> ! }
>> ! }
>>
>> /* Analyze the data references. */
>>
>> --- 2328,2358 ----
>> /* Check if the basic block can be vectorized. */
>>
>> static bb_vec_info
>> ! vect_slp_analyze_bb_1 (gimple_stmt_iterator region_begin,
>> ! gimple_stmt_iterator region_end,
>> ! vec<data_reference_p> datarefs, int n_stmts)
>> {
>> bb_vec_info bb_vinfo;
>> vec<slp_instance> slp_instances;
>> slp_instance instance;
>> int i;
>> int min_vf = 2;
>>
>> ! if (n_stmts > PARAM_VALUE (PARAM_SLP_MAX_INSNS_IN_BB))
>> ! {
>> ! if (dump_enabled_p ())
>> ! dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> ! "not vectorized: too many instructions in "
>> ! "basic block.\n");
>> ! free_data_refs (datarefs);
>> ! return NULL;
>> ! }
>> !
>> ! bb_vinfo = new_bb_vec_info (region_begin, region_end);
>> if (!bb_vinfo)
>> return NULL;
>>
>> ! BB_VINFO_DATAREFS (bb_vinfo) = datarefs;
>>
>> /* Analyze the data references. */
>>
>> *************** vect_slp_analyze_bb_1 (basic_block bb)
>> *** 2438,2445 ****
>> }
>>
>> /* Mark all the statements that we do not want to vectorize. */
>> ! for (gimple_stmt_iterator gsi = gsi_start_bb (BB_VINFO_BB (bb_vinfo));
>> ! !gsi_end_p (gsi); gsi_next (&gsi))
>> {
>> stmt_vec_info vinfo = vinfo_for_stmt (gsi_stmt (gsi));
>> if (STMT_SLP_TYPE (vinfo) != pure_slp)
>> --- 2430,2437 ----
>> }
>>
>> /* Mark all the statements that we do not want to vectorize. */
>> ! for (gimple_stmt_iterator gsi = bb_vinfo->region_begin;
>> ! gsi_stmt (gsi) != gsi_stmt (bb_vinfo->region_end); gsi_next (&gsi))
>> {
>> stmt_vec_info vinfo = vinfo_for_stmt (gsi_stmt (gsi));
>> if (STMT_SLP_TYPE (vinfo) != pure_slp)
>> *************** bool
>> *** 2509,2585 ****
>> vect_slp_bb (basic_block bb)
>> {
>> bb_vec_info bb_vinfo;
>> - int insns = 0;
>> gimple_stmt_iterator gsi;
>> unsigned int vector_sizes;
>>
>> if (dump_enabled_p ())
>> dump_printf_loc (MSG_NOTE, vect_location, "===vect_slp_analyze_bb===\n");
>>
>> - for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>> - {
>> - gimple *stmt = gsi_stmt (gsi);
>> - if (!is_gimple_debug (stmt)
>> - && !gimple_nop_p (stmt)
>> - && gimple_code (stmt) != GIMPLE_LABEL)
>> - insns++;
>> - if (gimple_location (stmt) != UNKNOWN_LOCATION)
>> - vect_location = gimple_location (stmt);
>> - }
>> -
>> - if (insns > PARAM_VALUE (PARAM_SLP_MAX_INSNS_IN_BB))
>> - {
>> - if (dump_enabled_p ())
>> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> - "not vectorized: too many instructions in "
>> - "basic block.\n");
>> -
>> - return false;
>> - }
>> -
>> /* Autodetect first vector size we try. */
>> current_vector_size = 0;
>> vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
>>
>> while (1)
>> {
>> ! bb_vinfo = vect_slp_analyze_bb_1 (bb);
>> ! if (bb_vinfo)
>> {
>> ! if (!dbg_cnt (vect_slp))
>> ! {
>> ! destroy_bb_vec_info (bb_vinfo);
>> ! return false;
>> ! }
>>
>> if (dump_enabled_p ())
>> ! dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB\n");
>>
>> vect_schedule_slp (bb_vinfo);
>>
>> if (dump_enabled_p ())
>> dump_printf_loc (MSG_NOTE, vect_location,
>> ! "BASIC BLOCK VECTORIZED\n");
>>
>> destroy_bb_vec_info (bb_vinfo);
>>
>> ! return true;
>> }
>>
>> ! destroy_bb_vec_info (bb_vinfo);
>>
>> vector_sizes &= ~current_vector_size;
>> ! if (vector_sizes == 0
>> ! || current_vector_size == 0)
>> ! return false;
>>
>> ! /* Try the next biggest vector size. */
>> ! current_vector_size = 1 << floor_log2 (vector_sizes);
>> ! if (dump_enabled_p ())
>> ! dump_printf_loc (MSG_NOTE, vect_location,
>> ! "***** Re-trying analysis with "
>> ! "vector size %d\n", current_vector_size);
>> }
>> }
>>
>>
>> --- 2501,2605 ----
>> vect_slp_bb (basic_block bb)
>> {
>> bb_vec_info bb_vinfo;
>> gimple_stmt_iterator gsi;
>> unsigned int vector_sizes;
>> + bool any_vectorized = false;
>>
>> if (dump_enabled_p ())
>> dump_printf_loc (MSG_NOTE, vect_location, "===vect_slp_analyze_bb===\n");
>>
>> /* Autodetect first vector size we try. */
>> current_vector_size = 0;
>> vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
>>
>> + gsi = gsi_start_bb (bb);
>> +
>> while (1)
>> {
>> ! if (gsi_end_p (gsi))
>> ! break;
>> !
>> ! gimple_stmt_iterator region_begin = gsi;
>> ! vec<data_reference_p> datarefs = vNULL;
>> ! int insns = 0;
>> !
>> ! for (; !gsi_end_p (gsi); gsi_next (&gsi))
>> {
>> ! gimple *stmt = gsi_stmt (gsi);
>> ! if (is_gimple_debug (stmt))
>> ! continue;
>> ! insns++;
>> !
>> ! if (gimple_location (stmt) != UNKNOWN_LOCATION)
>> ! vect_location = gimple_location (stmt);
>> !
>> ! if (!find_data_references_in_stmt (NULL, stmt, &datarefs))
>> ! break;
>> ! }
>> !
>> ! /* Skip leading unhandled stmts. */
>> ! if (gsi_stmt (region_begin) == gsi_stmt (gsi))
>> ! {
>> ! gsi_next (&gsi);
>> ! continue;
>> ! }
>> !
>> ! gimple_stmt_iterator region_end = gsi;
>>
>> + bool vectorized = false;
>> + bb_vinfo = vect_slp_analyze_bb_1 (region_begin, region_end,
>> + datarefs, insns);
>> + if (bb_vinfo
>> + && dbg_cnt (vect_slp))
>> + {
>> if (dump_enabled_p ())
>> ! dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB part\n");
>>
>> vect_schedule_slp (bb_vinfo);
>>
>> if (dump_enabled_p ())
>> dump_printf_loc (MSG_NOTE, vect_location,
>> ! "basic block part vectorized\n");
>>
>> destroy_bb_vec_info (bb_vinfo);
>>
>> ! vectorized = true;
>> }
>> + else
>> + destroy_bb_vec_info (bb_vinfo);
>>
>> ! any_vectorized |= vectorized;
>>
>> vector_sizes &= ~current_vector_size;
>> ! if (vectorized
>> ! || vector_sizes == 0
>> ! || current_vector_size == 0)
>> ! {
>> ! if (gsi_end_p (region_end))
>> ! break;
>> !
>> ! /* Skip the unhandled stmt. */
>> ! gsi_next (&gsi);
>> !
>> ! /* And reset vector sizes. */
>> ! current_vector_size = 0;
>> ! vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
>> ! }
>> ! else
>> ! {
>> ! /* Try the next biggest vector size. */
>> ! current_vector_size = 1 << floor_log2 (vector_sizes);
>> ! if (dump_enabled_p ())
>> ! dump_printf_loc (MSG_NOTE, vect_location,
>> ! "***** Re-trying analysis with "
>> ! "vector size %d\n", current_vector_size);
>>
>> ! /* Start over. */
>> ! gsi = region_begin;
>> ! }
>> }
>> +
>> + return any_vectorized;
>> }
>>
>>
>> Index: gcc/tree-vect-patterns.c
>> ===================================================================
>> *** gcc/tree-vect-patterns.c.orig 2015-11-05 09:52:00.640227178 +0100
>> --- gcc/tree-vect-patterns.c 2015-11-05 13:25:46.060011765 +0100
>> *************** static bool
>> *** 107,133 ****
>> vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2)
>> {
>> stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1);
>> ! loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
>> ! bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_vinfo);
>> !
>> ! if (!gimple_bb (stmt2))
>> ! return false;
>> !
>> ! if (loop_vinfo)
>> ! {
>> ! struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>> ! if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt2)))
>> ! return false;
>> ! }
>> ! else
>> ! {
>> ! if (gimple_bb (stmt2) != BB_VINFO_BB (bb_vinfo)
>> ! || gimple_code (stmt2) == GIMPLE_PHI)
>> ! return false;
>> ! }
>> !
>> ! gcc_assert (vinfo_for_stmt (stmt2));
>> ! return true;
>> }
>>
>> /* If the LHS of DEF_STMT has a single use, and that statement is
>> --- 107,113 ----
>> vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2)
>> {
>> stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1);
>> ! return vect_stmt_in_region_p (stmt_vinfo->vinfo, stmt2);
>> }
>>
>> /* If the LHS of DEF_STMT has a single use, and that statement is
>> *************** vect_pattern_recog (vec_info *vinfo)
>> *** 3611,3643 ****
>> loop = LOOP_VINFO_LOOP (loop_vinfo);
>> bbs = LOOP_VINFO_BBS (loop_vinfo);
>> nbbs = loop->num_nodes;
>> }
>> else
>> {
>> ! bbs = &as_a <bb_vec_info> (vinfo)->bb;
>> ! nbbs = 1;
>> ! }
>> !
>> ! /* Scan through the loop stmts, applying the pattern recognition
>> ! functions starting at each stmt visited: */
>> ! for (i = 0; i < nbbs; i++)
>> ! {
>> ! basic_block bb = bbs[i];
>> ! for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
>> ! {
>> ! if (is_a <bb_vec_info> (vinfo)
>> ! && (stmt = gsi_stmt (si))
>> && vinfo_for_stmt (stmt)
>> && !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt)))
>> ! continue;
>>
>> ! /* Scan over all generic vect_recog_xxx_pattern functions. */
>> ! for (j = 0; j < NUM_PATTERNS; j++)
>> ! {
>> vect_recog_func = vect_vect_recog_func_ptrs[j];
>> vect_pattern_recog_1 (vect_recog_func, si,
>> &stmts_to_replace);
>> ! }
>> ! }
>> }
>> }
>> --- 3591,3632 ----
>> loop = LOOP_VINFO_LOOP (loop_vinfo);
>> bbs = LOOP_VINFO_BBS (loop_vinfo);
>> nbbs = loop->num_nodes;
>> +
>> + /* Scan through the loop stmts, applying the pattern recognition
>> + functions starting at each stmt visited: */
>> + for (i = 0; i < nbbs; i++)
>> + {
>> + basic_block bb = bbs[i];
>> + for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
>> + {
>> + /* Scan over all generic vect_recog_xxx_pattern functions. */
>> + for (j = 0; j < NUM_PATTERNS; j++)
>> + {
>> + vect_recog_func = vect_vect_recog_func_ptrs[j];
>> + vect_pattern_recog_1 (vect_recog_func, si,
>> + &stmts_to_replace);
>> + }
>> + }
>> + }
>> }
>> else
>> {
>> ! bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo);
>> ! for (si = bb_vinfo->region_begin;
>> ! gsi_stmt (si) != gsi_stmt (bb_vinfo->region_end); gsi_next (&si))
>> ! {
>> ! if ((stmt = gsi_stmt (si))
>> && vinfo_for_stmt (stmt)
>> && !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt)))
>> ! continue;
>>
>> ! /* Scan over all generic vect_recog_xxx_pattern functions. */
>> ! for (j = 0; j < NUM_PATTERNS; j++)
>> ! {
>> vect_recog_func = vect_vect_recog_func_ptrs[j];
>> vect_pattern_recog_1 (vect_recog_func, si,
>> &stmts_to_replace);
>> ! }
>> ! }
>> }
>> }
>> Index: gcc/config/i386/i386.c
>> ===================================================================
>> *** gcc/config/i386/i386.c.orig 2015-11-05 09:52:42.239687133 +0100
>> --- gcc/config/i386/i386.c 2015-11-05 11:09:09.451774562 +0100
>> *************** along with GCC; see the file COPYING3.
>> *** 64,69 ****
>> --- 64,70 ----
>> #include "context.h"
>> #include "pass_manager.h"
>> #include "target-globals.h"
>> + #include "gimple-iterator.h"
>> #include "tree-vectorizer.h"
>> #include "shrink-wrap.h"
>> #include "builtins.h"
>> Index: gcc/testsuite/gcc.dg/vect/bb-slp-38.c
>> ===================================================================
>> *** /dev/null 1970-01-01 00:00:00.000000000 +0000
>> --- gcc/testsuite/gcc.dg/vect/bb-slp-38.c 2015-11-05 14:00:48.177644327 +0100
>> ***************
>> *** 0 ****
>> --- 1,44 ----
>> + /* { dg-require-effective-target vect_int } */
>> +
>> + #include "tree-vect.h"
>> +
>> + extern void abort (void);
>> +
>> + int a[8], b[8];
>> + int x;
>> +
>> + void __attribute__((noinline,noclone))
>> + bar (void)
>> + {
>> + x = 1;
>> + }
>> +
>> + void __attribute__((noinline,noclone))
>> + foo(void)
>> + {
>> + a[0] = b[0];
>> + a[1] = b[0];
>> + a[2] = b[3];
>> + a[3] = b[3];
>> + bar ();
>> + a[4] = b[4];
>> + a[5] = b[7];
>> + a[6] = b[4];
>> + a[7] = b[7];
>> + }
>> +
>> + int main()
>> + {
>> + int i;
>> + check_vect ();
>> + for (i = 0; i < 8; ++i)
>> + b[i] = i;
>> + foo ();
>> + if (a[0] != 0 || a[1] != 0 || a[2] != 3 || a[3] != 3
>> + || a[4] != 4 || a[5] != 7 || a[6] != 4 || a[7] != 7)
>> + abort ();
>> + return 0;
>> + }
>> +
>> + /* { dg-final { scan-tree-dump "basic block vectorized" "slp2" { target vect_perm } } } */
>> + /* { dg-final { scan-tree-dump-times "basic block part vectorized" 2 "slp2" { target vect_perm } } } */
>> Index: gcc/tree-vect-stmts.c
>> ===================================================================
>> *** gcc/tree-vect-stmts.c.orig 2015-11-02 12:37:11.074249388 +0100
>> --- gcc/tree-vect-stmts.c 2015-11-05 13:29:21.413423692 +0100
>> *************** vect_is_simple_use (tree operand, vec_in
>> *** 8196,8207 ****
>> dump_gimple_stmt (MSG_NOTE, TDF_SLIM, *def_stmt, 0);
>> }
>>
>> ! basic_block bb = gimple_bb (*def_stmt);
>> ! if ((is_a <loop_vec_info> (vinfo)
>> ! && !flow_bb_inside_loop_p (as_a <loop_vec_info> (vinfo)->loop, bb))
>> ! || (is_a <bb_vec_info> (vinfo)
>> ! && (bb != as_a <bb_vec_info> (vinfo)->bb
>> ! || gimple_code (*def_stmt) == GIMPLE_PHI)))
>> *dt = vect_external_def;
>> else
>> {
>> --- 8196,8202 ----
>> dump_gimple_stmt (MSG_NOTE, TDF_SLIM, *def_stmt, 0);
>> }
>>
>> ! if (! vect_stmt_in_region_p (vinfo, *def_stmt))
>> *dt = vect_external_def;
>> else
>> {
>> Index: gcc/tree-vectorizer.c
>> ===================================================================
>> *** gcc/tree-vectorizer.c.orig 2015-11-04 09:23:53.724687806 +0100
>> --- gcc/tree-vectorizer.c 2015-11-05 13:55:08.299817570 +0100
>> *************** vect_destroy_datarefs (vec_info *vinfo)
>> *** 350,355 ****
>> --- 350,382 ----
>> }
>>
>>
>> + /* Return whether STMT is inside the region we try to vectorize. */
>> +
>> + bool
>> + vect_stmt_in_region_p (vec_info *vinfo, gimple *stmt)
>> + {
>> + if (!gimple_bb (stmt))
>> + return false;
>> +
>> + if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
>> + {
>> + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>> + if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt)))
>> + return false;
>> + }
>> + else
>> + {
>> + bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo);
>> + if (gimple_bb (stmt) != BB_VINFO_BB (bb_vinfo)
>> + || gimple_uid (stmt) == -1U
>> + || gimple_code (stmt) == GIMPLE_PHI)
>> + return false;
>> + }
>> +
>> + return true;
>> + }
>> +
>> +
>> /* If LOOP has been versioned during ifcvt, return the internal call
>> guarding it. */
>>
>> *************** pass_slp_vectorize::execute (function *f
>> *** 692,697 ****
>> --- 719,732 ----
>> scev_initialize ();
>> }
>>
>> + /* Mark all stmts as not belonging to the current region. */
>> + FOR_EACH_BB_FN (bb, fun)
>> + {
>> + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi);
>> + gsi_next (&gsi))
>> + gimple_set_uid (gsi_stmt (gsi), -1);
>> + }
>> +
>> init_stmt_vec_info_vec ();
>>
>> FOR_EACH_BB_FN (bb, fun)
>> Index: gcc/config/aarch64/aarch64.c
>> ===================================================================
>> *** gcc/config/aarch64/aarch64.c.orig 2015-10-28 11:22:25.290823112 +0100
>> --- gcc/config/aarch64/aarch64.c 2015-11-06 10:24:21.539818027 +0100
>> ***************
>> *** 52,57 ****
>> --- 52,58 ----
>> #include "params.h"
>> #include "gimplify.h"
>> #include "dwarf2.h"
>> + #include "gimple-iterator.h"
>> #include "tree-vectorizer.h"
>> #include "aarch64-cost-tables.h"
>> #include "dumpfile.h"
>>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Make BB vectorizer work on sub-BBs
2015-11-06 11:12 ` Kyrill Tkachov
@ 2015-11-06 11:27 ` Richard Biener
2015-11-10 12:56 ` Christophe Lyon
1 sibling, 0 replies; 8+ messages in thread
From: Richard Biener @ 2015-11-06 11:27 UTC (permalink / raw)
To: Kyrill Tkachov; +Cc: gcc-patches
On Fri, 6 Nov 2015, Kyrill Tkachov wrote:
> Hi Richard,
>
> On 06/11/15 11:09, Richard Biener wrote:
> > On Fri, 6 Nov 2015, Richard Biener wrote:
> >
> > > The following patch makes the BB vectorizer not only handle BB heads
> > > (until the first stmt with a data reference it cannot handle) but
> > > arbitrary regions in a BB separated by such stmts.
> > >
> > > This improves the number of BB vectorizations from 469 to 556
> > > in a quick test on SPEC CPU 2006 with -Ofast on x86_64 and
> > > 1x400.perlbench 1x410.bwaves 1x416.gamess 1x450.soplex 1x453.povray
> > > 1x481.wrf failing both patched and unpatched (have to update my
> > > config used for such experiments it seems ...)
> > >
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu, aarch64 cross built.
> > >
> > > I'm currently re-testing for a cosmetic change I made when writing
> > > the changelog.
> > >
> > > I expected (and there are) some issues with compile-time. Left
> > > is unpatched and right is patched.
> > >
> > > '403.gcc': 00:00:54 (54) | '403.gcc': 00:00:55 (55)
> > > '483.xalancbmk': 00:02:20 (140) | '483.xalancbmk': 00:02:24 (144)
> > > '416.gamess': 00:02:36 (156) | '416.gamess': 00:02:37 (157)
> > > '435.gromacs': 00:00:18 (18) | '435.gromacs': 00:00:19 (19)
> > > '447.dealII': 00:01:31 (91) | '447.dealII': 00:01:33 (93)
> > > '453.povray': 00:04:54 (294) | '453.povray': 00:08:54 (534)
> > > '454.calculix': 00:00:34 (34) | '454.calculix': 00:00:52 (52)
> > > '481.wrf': 00:01:57 (117) | '481.wrf': 00:01:59 (119)
> > >
> > > other benchmarks are unchanged. I'm double-checking now that a followup
> > > patch I have which re-implements BB vectorization dependence checking
> > > fixes this (that's the only quadraticness I know of).
> > Fixes all but
> >
> > '453.povray': 00:04:54 (294) | '453.povray': 00:06:46 (406)
>
> Note that povray is currently suffering from PR 68198
Ah, yeah. Seems to run into
/space/rguenther/install-trunk/usr/local/bin/g++ -c -o fnpovfpu.o
-DSPEC_CPU -DNDEBUG -Ofast -fopt-info-vec -ftime-report
-Wl,-rpath=/abuild/rguenther/install-trunk/usr/local/lib64
-DSPEC_CPU_LP64 -Wno-multichar fnpovfpu.cpp
g++: internal compiler error: Killed (program cc1plus)
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.
specmake: *** [fnpovfpu.o] Error 4
and dmesg
[7525617.394116] Out of memory: Kill process 31426 (cc1plus) score 832 or
sacrif
ice child
[7525617.394117] Killed process 31426 (cc1plus) total-vm:8399700kB,
anon-rss:679
0020kB, file-rss:1584kB
for me (and that's the one taking all the time). I can imagine that
with many basic-blocks the patch might end up as a net slowdown
still. I'll try to investigate anyway, maybe I'm leaking sth.
Richard.
> Kyrill
>
> >
> > it even improves compile-time on some:
> >
> > '464.h264ref': 00:00:26 (26) | '464.h264ref': 00:00:21 (21)
> >
> > it also increases the number of vectorized BBs to 722.
> >
> > Needs some work still though.
> >
> > Richard.
> >
> > > Richard.
> > >
> > > 2015-11-06 Richard Biener <rguenther@suse.de>
> > >
> > > * tree-vectorizer.h (struct _bb_vec_info): Add region_begin/end
> > > members.
> > > (vect_stmt_in_region_p): Declare.
> > > * tree-vect-slp.c (new_bb_vec_info): Work on a region.
> > > (destroy_bb_vec_info): Likewise.
> > > (vect_bb_slp_scalar_cost): Use vect_stmt_in_region_p.
> > > (vect_get_and_check_slp_defs): Likewise.
> > > (vect_slp_analyze_bb_1): Refactor to make it work on sub-BBs.
> > > (vect_slp_bb): Likewise.
> > > * tree-vect-patterns.c (vect_same_loop_or_bb_p): Implement
> > > in terms of vect_stmt_in_region_p.
> > > (vect_pattern_recog): Iterate over the BB region.
> > > * tree-vect-stmts.c (vect_is_simple_use): Use vect_stmt_in_region_p.
> > > * tree-vectorizer.c (vect_stmt_in_region_p): New function.
> > > (pass_slp_vectorize::execute): Initialize all stmt UIDs to -1.
> > >
> > > * config/i386/i386.c: Include gimple-iterator.h.
> > > * config/aarch64/aarch64.c: Likewise.
> > >
> > > * gcc.dg/vect/bb-slp-38.c: New testcase.
> > >
> > > Index: gcc/tree-vectorizer.h
> > > ===================================================================
> > > *** gcc/tree-vectorizer.h.orig 2015-11-05 09:52:00.640227178 +0100
> > > --- gcc/tree-vectorizer.h 2015-11-05 13:20:58.385786476 +0100
> > > *************** nested_in_vect_loop_p (struct loop *loop
> > > *** 390,395 ****
> > > --- 390,397 ----
> > > typedef struct _bb_vec_info : public vec_info
> > > {
> > > basic_block bb;
> > > + gimple_stmt_iterator region_begin;
> > > + gimple_stmt_iterator region_end;
> > > } *bb_vec_info;
> > > #define BB_VINFO_BB(B) (B)->bb
> > > *************** void vect_pattern_recog (vec_info *);
> > > *** 1085,1089 ****
> > > --- 1087,1092 ----
> > > /* In tree-vectorizer.c. */
> > > unsigned vectorize_loops (void);
> > > void vect_destroy_datarefs (vec_info *);
> > > + bool vect_stmt_in_region_p (vec_info *, gimple *);
> > > #endif /* GCC_TREE_VECTORIZER_H */
> > > Index: gcc/tree-vect-slp.c
> > > ===================================================================
> > > *** gcc/tree-vect-slp.c.orig 2015-11-05 09:52:00.640227178 +0100
> > > --- gcc/tree-vect-slp.c 2015-11-06 10:22:56.707880233 +0100
> > > *************** vect_get_and_check_slp_defs (vec_info *v
> > > *** 209,215 ****
> > > unsigned int i, number_of_oprnds;
> > > gimple *def_stmt;
> > > enum vect_def_type dt = vect_uninitialized_def;
> > > - struct loop *loop = NULL;
> > > bool pattern = false;
> > > slp_oprnd_info oprnd_info;
> > > int first_op_idx = 1;
> > > --- 209,214 ----
> > > *************** vect_get_and_check_slp_defs (vec_info *v
> > > *** 218,226 ****
> > > bool first = stmt_num == 0;
> > > bool second = stmt_num == 1;
> > > - if (is_a <loop_vec_info> (vinfo))
> > > - loop = LOOP_VINFO_LOOP (as_a <loop_vec_info> (vinfo));
> > > -
> > > if (is_gimple_call (stmt))
> > > {
> > > number_of_oprnds = gimple_call_num_args (stmt);
> > > --- 217,222 ----
> > > *************** again:
> > > *** 276,286 ****
> > > from the pattern. Check that all the stmts of the node are in
> > > the
> > > pattern. */
> > > if (def_stmt && gimple_bb (def_stmt)
> > > ! && ((is_a <loop_vec_info> (vinfo)
> > > ! && flow_bb_inside_loop_p (loop, gimple_bb (def_stmt)))
> > > ! || (is_a <bb_vec_info> (vinfo)
> > > ! && gimple_bb (def_stmt) == as_a <bb_vec_info> (vinfo)->bb
> > > ! && gimple_code (def_stmt) != GIMPLE_PHI))
> > > && vinfo_for_stmt (def_stmt)
> > > && STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (def_stmt))
> > > && !STMT_VINFO_RELEVANT (vinfo_for_stmt (def_stmt))
> > > --- 272,278 ----
> > > from the pattern. Check that all the stmts of the node are in
> > > the
> > > pattern. */
> > > if (def_stmt && gimple_bb (def_stmt)
> > > ! && vect_stmt_in_region_p (vinfo, def_stmt)
> > > && vinfo_for_stmt (def_stmt)
> > > && STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (def_stmt))
> > > && !STMT_VINFO_RELEVANT (vinfo_for_stmt (def_stmt))
> > > *************** vect_detect_hybrid_slp (loop_vec_info lo
> > > *** 2076,2091 ****
> > > stmt_vec_info structs for all the stmts in it. */
> > > static bb_vec_info
> > > ! new_bb_vec_info (basic_block bb)
> > > {
> > > bb_vec_info res = NULL;
> > > gimple_stmt_iterator gsi;
> > > res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info));
> > > res->kind = vec_info::bb;
> > > BB_VINFO_BB (res) = bb;
> > > ! for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > {
> > > gimple *stmt = gsi_stmt (gsi);
> > > gimple_set_uid (stmt, 0);
> > > --- 2068,2088 ----
> > > stmt_vec_info structs for all the stmts in it. */
> > > static bb_vec_info
> > > ! new_bb_vec_info (gimple_stmt_iterator region_begin,
> > > ! gimple_stmt_iterator region_end)
> > > {
> > > + basic_block bb = gsi_bb (region_begin);
> > > bb_vec_info res = NULL;
> > > gimple_stmt_iterator gsi;
> > > res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info));
> > > res->kind = vec_info::bb;
> > > BB_VINFO_BB (res) = bb;
> > > + res->region_begin = region_begin;
> > > + res->region_end = region_end;
> > > ! for (gsi = region_begin; gsi_stmt (gsi) != gsi_stmt (region_end);
> > > ! gsi_next (&gsi))
> > > {
> > > gimple *stmt = gsi_stmt (gsi);
> > > gimple_set_uid (stmt, 0);
> > > *************** destroy_bb_vec_info (bb_vec_info bb_vinf
> > > *** 2118,2124 ****
> > > bb = BB_VINFO_BB (bb_vinfo);
> > > ! for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
> > > {
> > > gimple *stmt = gsi_stmt (si);
> > > stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> > > --- 2115,2122 ----
> > > bb = BB_VINFO_BB (bb_vinfo);
> > > ! for (si = bb_vinfo->region_begin;
> > > ! gsi_stmt (si) != gsi_stmt (bb_vinfo->region_end); gsi_next (&si))
> > > {
> > > gimple *stmt = gsi_stmt (si);
> > > stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> > > *************** destroy_bb_vec_info (bb_vec_info bb_vinf
> > > *** 2126,2131 ****
> > > --- 2124,2132 ----
> > > if (stmt_info)
> > > /* Free stmt_vec_info. */
> > > free_stmt_vec_info (stmt);
> > > +
> > > + /* Reset region marker. */
> > > + gimple_set_uid (stmt, -1);
> > > }
> > > vect_destroy_datarefs (bb_vinfo);
> > > *************** vect_bb_slp_scalar_cost (basic_block bb,
> > > *** 2247,2254 ****
> > > gimple *use_stmt;
> > > FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p))
> > > if (!is_gimple_debug (use_stmt)
> > > ! && (gimple_code (use_stmt) == GIMPLE_PHI
> > > ! || gimple_bb (use_stmt) != bb
> > > || !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (use_stmt))))
> > > {
> > > (*life)[i] = true;
> > > --- 2248,2255 ----
> > > gimple *use_stmt;
> > > FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p))
> > > if (!is_gimple_debug (use_stmt)
> > > ! && (! vect_stmt_in_region_p (vinfo_for_stmt (stmt)->vinfo,
> > > ! use_stmt)
> > > || !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (use_stmt))))
> > > {
> > > (*life)[i] = true;
> > > *************** vect_bb_vectorization_profitable_p (bb_v
> > > *** 2327,2366 ****
> > > /* Check if the basic block can be vectorized. */
> > > static bb_vec_info
> > > ! vect_slp_analyze_bb_1 (basic_block bb)
> > > {
> > > bb_vec_info bb_vinfo;
> > > vec<slp_instance> slp_instances;
> > > slp_instance instance;
> > > int i;
> > > int min_vf = 2;
> > > - unsigned n_stmts = 0;
> > > ! bb_vinfo = new_bb_vec_info (bb);
> > > if (!bb_vinfo)
> > > return NULL;
> > > ! /* Gather all data references in the basic-block. */
> > > !
> > > ! for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
> > > ! !gsi_end_p (gsi); gsi_next (&gsi))
> > > ! {
> > > ! gimple *stmt = gsi_stmt (gsi);
> > > ! if (is_gimple_debug (stmt))
> > > ! continue;
> > > ! ++n_stmts;
> > > ! if (!find_data_references_in_stmt (NULL, stmt,
> > > ! &BB_VINFO_DATAREFS (bb_vinfo)))
> > > ! {
> > > ! /* Mark the rest of the basic-block as unvectorizable. */
> > > ! for (; !gsi_end_p (gsi); gsi_next (&gsi))
> > > ! {
> > > ! stmt = gsi_stmt (gsi);
> > > ! STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt)) = false;
> > > ! }
> > > ! break;
> > > ! }
> > > ! }
> > > /* Analyze the data references. */
> > > --- 2328,2358 ----
> > > /* Check if the basic block can be vectorized. */
> > > static bb_vec_info
> > > ! vect_slp_analyze_bb_1 (gimple_stmt_iterator region_begin,
> > > ! gimple_stmt_iterator region_end,
> > > ! vec<data_reference_p> datarefs, int n_stmts)
> > > {
> > > bb_vec_info bb_vinfo;
> > > vec<slp_instance> slp_instances;
> > > slp_instance instance;
> > > int i;
> > > int min_vf = 2;
> > > ! if (n_stmts > PARAM_VALUE (PARAM_SLP_MAX_INSNS_IN_BB))
> > > ! {
> > > ! if (dump_enabled_p ())
> > > ! dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > ! "not vectorized: too many instructions in "
> > > ! "basic block.\n");
> > > ! free_data_refs (datarefs);
> > > ! return NULL;
> > > ! }
> > > !
> > > ! bb_vinfo = new_bb_vec_info (region_begin, region_end);
> > > if (!bb_vinfo)
> > > return NULL;
> > > ! BB_VINFO_DATAREFS (bb_vinfo) = datarefs;
> > > /* Analyze the data references. */
> > > *************** vect_slp_analyze_bb_1 (basic_block bb)
> > > *** 2438,2445 ****
> > > }
> > > /* Mark all the statements that we do not want to vectorize. */
> > > ! for (gimple_stmt_iterator gsi = gsi_start_bb (BB_VINFO_BB (bb_vinfo));
> > > ! !gsi_end_p (gsi); gsi_next (&gsi))
> > > {
> > > stmt_vec_info vinfo = vinfo_for_stmt (gsi_stmt (gsi));
> > > if (STMT_SLP_TYPE (vinfo) != pure_slp)
> > > --- 2430,2437 ----
> > > }
> > > /* Mark all the statements that we do not want to vectorize. */
> > > ! for (gimple_stmt_iterator gsi = bb_vinfo->region_begin;
> > > ! gsi_stmt (gsi) != gsi_stmt (bb_vinfo->region_end); gsi_next
> > > (&gsi))
> > > {
> > > stmt_vec_info vinfo = vinfo_for_stmt (gsi_stmt (gsi));
> > > if (STMT_SLP_TYPE (vinfo) != pure_slp)
> > > *************** bool
> > > *** 2509,2585 ****
> > > vect_slp_bb (basic_block bb)
> > > {
> > > bb_vec_info bb_vinfo;
> > > - int insns = 0;
> > > gimple_stmt_iterator gsi;
> > > unsigned int vector_sizes;
> > > if (dump_enabled_p ())
> > > dump_printf_loc (MSG_NOTE, vect_location,
> > > "===vect_slp_analyze_bb===\n");
> > > - for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > > - {
> > > - gimple *stmt = gsi_stmt (gsi);
> > > - if (!is_gimple_debug (stmt)
> > > - && !gimple_nop_p (stmt)
> > > - && gimple_code (stmt) != GIMPLE_LABEL)
> > > - insns++;
> > > - if (gimple_location (stmt) != UNKNOWN_LOCATION)
> > > - vect_location = gimple_location (stmt);
> > > - }
> > > -
> > > - if (insns > PARAM_VALUE (PARAM_SLP_MAX_INSNS_IN_BB))
> > > - {
> > > - if (dump_enabled_p ())
> > > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> > > - "not vectorized: too many instructions in "
> > > - "basic block.\n");
> > > -
> > > - return false;
> > > - }
> > > -
> > > /* Autodetect first vector size we try. */
> > > current_vector_size = 0;
> > > vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
> > > while (1)
> > > {
> > > ! bb_vinfo = vect_slp_analyze_bb_1 (bb);
> > > ! if (bb_vinfo)
> > > {
> > > ! if (!dbg_cnt (vect_slp))
> > > ! {
> > > ! destroy_bb_vec_info (bb_vinfo);
> > > ! return false;
> > > ! }
> > > if (dump_enabled_p ())
> > > ! dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB\n");
> > > vect_schedule_slp (bb_vinfo);
> > > if (dump_enabled_p ())
> > > dump_printf_loc (MSG_NOTE, vect_location,
> > > ! "BASIC BLOCK VECTORIZED\n");
> > > destroy_bb_vec_info (bb_vinfo);
> > > ! return true;
> > > }
> > > ! destroy_bb_vec_info (bb_vinfo);
> > > vector_sizes &= ~current_vector_size;
> > > ! if (vector_sizes == 0
> > > ! || current_vector_size == 0)
> > > ! return false;
> > > ! /* Try the next biggest vector size. */
> > > ! current_vector_size = 1 << floor_log2 (vector_sizes);
> > > ! if (dump_enabled_p ())
> > > ! dump_printf_loc (MSG_NOTE, vect_location,
> > > ! "***** Re-trying analysis with "
> > > ! "vector size %d\n", current_vector_size);
> > > }
> > > }
> > > --- 2501,2605 ----
> > > vect_slp_bb (basic_block bb)
> > > {
> > > bb_vec_info bb_vinfo;
> > > gimple_stmt_iterator gsi;
> > > unsigned int vector_sizes;
> > > + bool any_vectorized = false;
> > > if (dump_enabled_p ())
> > > dump_printf_loc (MSG_NOTE, vect_location,
> > > "===vect_slp_analyze_bb===\n");
> > > /* Autodetect first vector size we try. */
> > > current_vector_size = 0;
> > > vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
> > > + gsi = gsi_start_bb (bb);
> > > +
> > > while (1)
> > > {
> > > ! if (gsi_end_p (gsi))
> > > ! break;
> > > !
> > > ! gimple_stmt_iterator region_begin = gsi;
> > > ! vec<data_reference_p> datarefs = vNULL;
> > > ! int insns = 0;
> > > !
> > > ! for (; !gsi_end_p (gsi); gsi_next (&gsi))
> > > {
> > > ! gimple *stmt = gsi_stmt (gsi);
> > > ! if (is_gimple_debug (stmt))
> > > ! continue;
> > > ! insns++;
> > > !
> > > ! if (gimple_location (stmt) != UNKNOWN_LOCATION)
> > > ! vect_location = gimple_location (stmt);
> > > !
> > > ! if (!find_data_references_in_stmt (NULL, stmt, &datarefs))
> > > ! break;
> > > ! }
> > > !
> > > ! /* Skip leading unhandled stmts. */
> > > ! if (gsi_stmt (region_begin) == gsi_stmt (gsi))
> > > ! {
> > > ! gsi_next (&gsi);
> > > ! continue;
> > > ! }
> > > !
> > > ! gimple_stmt_iterator region_end = gsi;
> > > + bool vectorized = false;
> > > + bb_vinfo = vect_slp_analyze_bb_1 (region_begin, region_end,
> > > + datarefs, insns);
> > > + if (bb_vinfo
> > > + && dbg_cnt (vect_slp))
> > > + {
> > > if (dump_enabled_p ())
> > > ! dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB part\n");
> > > vect_schedule_slp (bb_vinfo);
> > > if (dump_enabled_p ())
> > > dump_printf_loc (MSG_NOTE, vect_location,
> > > ! "basic block part vectorized\n");
> > > destroy_bb_vec_info (bb_vinfo);
> > > ! vectorized = true;
> > > }
> > > + else
> > > + destroy_bb_vec_info (bb_vinfo);
> > > ! any_vectorized |= vectorized;
> > > vector_sizes &= ~current_vector_size;
> > > ! if (vectorized
> > > ! || vector_sizes == 0
> > > ! || current_vector_size == 0)
> > > ! {
> > > ! if (gsi_end_p (region_end))
> > > ! break;
> > > !
> > > ! /* Skip the unhandled stmt. */
> > > ! gsi_next (&gsi);
> > > !
> > > ! /* And reset vector sizes. */
> > > ! current_vector_size = 0;
> > > ! vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
> > > ! }
> > > ! else
> > > ! {
> > > ! /* Try the next biggest vector size. */
> > > ! current_vector_size = 1 << floor_log2 (vector_sizes);
> > > ! if (dump_enabled_p ())
> > > ! dump_printf_loc (MSG_NOTE, vect_location,
> > > ! "***** Re-trying analysis with "
> > > ! "vector size %d\n", current_vector_size);
> > > ! /* Start over. */
> > > ! gsi = region_begin;
> > > ! }
> > > }
> > > +
> > > + return any_vectorized;
> > > }
> > > Index: gcc/tree-vect-patterns.c
> > > ===================================================================
> > > *** gcc/tree-vect-patterns.c.orig 2015-11-05 09:52:00.640227178 +0100
> > > --- gcc/tree-vect-patterns.c 2015-11-05 13:25:46.060011765 +0100
> > > *************** static bool
> > > *** 107,133 ****
> > > vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2)
> > > {
> > > stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1);
> > > ! loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
> > > ! bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_vinfo);
> > > !
> > > ! if (!gimple_bb (stmt2))
> > > ! return false;
> > > !
> > > ! if (loop_vinfo)
> > > ! {
> > > ! struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > ! if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt2)))
> > > ! return false;
> > > ! }
> > > ! else
> > > ! {
> > > ! if (gimple_bb (stmt2) != BB_VINFO_BB (bb_vinfo)
> > > ! || gimple_code (stmt2) == GIMPLE_PHI)
> > > ! return false;
> > > ! }
> > > !
> > > ! gcc_assert (vinfo_for_stmt (stmt2));
> > > ! return true;
> > > }
> > > /* If the LHS of DEF_STMT has a single use, and that statement is
> > > --- 107,113 ----
> > > vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2)
> > > {
> > > stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1);
> > > ! return vect_stmt_in_region_p (stmt_vinfo->vinfo, stmt2);
> > > }
> > > /* If the LHS of DEF_STMT has a single use, and that statement is
> > > *************** vect_pattern_recog (vec_info *vinfo)
> > > *** 3611,3643 ****
> > > loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > bbs = LOOP_VINFO_BBS (loop_vinfo);
> > > nbbs = loop->num_nodes;
> > > }
> > > else
> > > {
> > > ! bbs = &as_a <bb_vec_info> (vinfo)->bb;
> > > ! nbbs = 1;
> > > ! }
> > > !
> > > ! /* Scan through the loop stmts, applying the pattern recognition
> > > ! functions starting at each stmt visited: */
> > > ! for (i = 0; i < nbbs; i++)
> > > ! {
> > > ! basic_block bb = bbs[i];
> > > ! for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
> > > ! {
> > > ! if (is_a <bb_vec_info> (vinfo)
> > > ! && (stmt = gsi_stmt (si))
> > > && vinfo_for_stmt (stmt)
> > > && !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt)))
> > > ! continue;
> > > ! /* Scan over all generic vect_recog_xxx_pattern functions.
> > > */
> > > ! for (j = 0; j < NUM_PATTERNS; j++)
> > > ! {
> > > vect_recog_func = vect_vect_recog_func_ptrs[j];
> > > vect_pattern_recog_1 (vect_recog_func, si,
> > > &stmts_to_replace);
> > > ! }
> > > ! }
> > > }
> > > }
> > > --- 3591,3632 ----
> > > loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > bbs = LOOP_VINFO_BBS (loop_vinfo);
> > > nbbs = loop->num_nodes;
> > > +
> > > + /* Scan through the loop stmts, applying the pattern recognition
> > > + functions starting at each stmt visited: */
> > > + for (i = 0; i < nbbs; i++)
> > > + {
> > > + basic_block bb = bbs[i];
> > > + for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
> > > + {
> > > + /* Scan over all generic vect_recog_xxx_pattern functions. */
> > > + for (j = 0; j < NUM_PATTERNS; j++)
> > > + {
> > > + vect_recog_func = vect_vect_recog_func_ptrs[j];
> > > + vect_pattern_recog_1 (vect_recog_func, si,
> > > + &stmts_to_replace);
> > > + }
> > > + }
> > > + }
> > > }
> > > else
> > > {
> > > ! bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo);
> > > ! for (si = bb_vinfo->region_begin;
> > > ! gsi_stmt (si) != gsi_stmt (bb_vinfo->region_end); gsi_next (&si))
> > > ! {
> > > ! if ((stmt = gsi_stmt (si))
> > > && vinfo_for_stmt (stmt)
> > > && !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt)))
> > > ! continue;
> > > ! /* Scan over all generic vect_recog_xxx_pattern functions.
> > > */
> > > ! for (j = 0; j < NUM_PATTERNS; j++)
> > > ! {
> > > vect_recog_func = vect_vect_recog_func_ptrs[j];
> > > vect_pattern_recog_1 (vect_recog_func, si,
> > > &stmts_to_replace);
> > > ! }
> > > ! }
> > > }
> > > }
> > > Index: gcc/config/i386/i386.c
> > > ===================================================================
> > > *** gcc/config/i386/i386.c.orig 2015-11-05 09:52:42.239687133 +0100
> > > --- gcc/config/i386/i386.c 2015-11-05 11:09:09.451774562 +0100
> > > *************** along with GCC; see the file COPYING3.
> > > *** 64,69 ****
> > > --- 64,70 ----
> > > #include "context.h"
> > > #include "pass_manager.h"
> > > #include "target-globals.h"
> > > + #include "gimple-iterator.h"
> > > #include "tree-vectorizer.h"
> > > #include "shrink-wrap.h"
> > > #include "builtins.h"
> > > Index: gcc/testsuite/gcc.dg/vect/bb-slp-38.c
> > > ===================================================================
> > > *** /dev/null 1970-01-01 00:00:00.000000000 +0000
> > > --- gcc/testsuite/gcc.dg/vect/bb-slp-38.c 2015-11-05 14:00:48.177644327
> > > +0100
> > > ***************
> > > *** 0 ****
> > > --- 1,44 ----
> > > + /* { dg-require-effective-target vect_int } */
> > > +
> > > + #include "tree-vect.h"
> > > +
> > > + extern void abort (void);
> > > +
> > > + int a[8], b[8];
> > > + int x;
> > > +
> > > + void __attribute__((noinline,noclone))
> > > + bar (void)
> > > + {
> > > + x = 1;
> > > + }
> > > +
> > > + void __attribute__((noinline,noclone))
> > > + foo(void)
> > > + {
> > > + a[0] = b[0];
> > > + a[1] = b[0];
> > > + a[2] = b[3];
> > > + a[3] = b[3];
> > > + bar ();
> > > + a[4] = b[4];
> > > + a[5] = b[7];
> > > + a[6] = b[4];
> > > + a[7] = b[7];
> > > + }
> > > +
> > > + int main()
> > > + {
> > > + int i;
> > > + check_vect ();
> > > + for (i = 0; i < 8; ++i)
> > > + b[i] = i;
> > > + foo ();
> > > + if (a[0] != 0 || a[1] != 0 || a[2] != 3 || a[3] != 3
> > > + || a[4] != 4 || a[5] != 7 || a[6] != 4 || a[7] != 7)
> > > + abort ();
> > > + return 0;
> > > + }
> > > +
> > > + /* { dg-final { scan-tree-dump "basic block vectorized" "slp2" { target
> > > vect_perm } } } */
> > > + /* { dg-final { scan-tree-dump-times "basic block part vectorized" 2
> > > "slp2" { target vect_perm } } } */
> > > Index: gcc/tree-vect-stmts.c
> > > ===================================================================
> > > *** gcc/tree-vect-stmts.c.orig 2015-11-02 12:37:11.074249388 +0100
> > > --- gcc/tree-vect-stmts.c 2015-11-05 13:29:21.413423692 +0100
> > > *************** vect_is_simple_use (tree operand, vec_in
> > > *** 8196,8207 ****
> > > dump_gimple_stmt (MSG_NOTE, TDF_SLIM, *def_stmt, 0);
> > > }
> > > ! basic_block bb = gimple_bb (*def_stmt);
> > > ! if ((is_a <loop_vec_info> (vinfo)
> > > ! && !flow_bb_inside_loop_p (as_a <loop_vec_info> (vinfo)->loop,
> > > bb))
> > > ! || (is_a <bb_vec_info> (vinfo)
> > > ! && (bb != as_a <bb_vec_info> (vinfo)->bb
> > > ! || gimple_code (*def_stmt) == GIMPLE_PHI)))
> > > *dt = vect_external_def;
> > > else
> > > {
> > > --- 8196,8202 ----
> > > dump_gimple_stmt (MSG_NOTE, TDF_SLIM, *def_stmt, 0);
> > > }
> > > ! if (! vect_stmt_in_region_p (vinfo, *def_stmt))
> > > *dt = vect_external_def;
> > > else
> > > {
> > > Index: gcc/tree-vectorizer.c
> > > ===================================================================
> > > *** gcc/tree-vectorizer.c.orig 2015-11-04 09:23:53.724687806 +0100
> > > --- gcc/tree-vectorizer.c 2015-11-05 13:55:08.299817570 +0100
> > > *************** vect_destroy_datarefs (vec_info *vinfo)
> > > *** 350,355 ****
> > > --- 350,382 ----
> > > }
> > > + /* Return whether STMT is inside the region we try to vectorize.
> > > */
> > > +
> > > + bool
> > > + vect_stmt_in_region_p (vec_info *vinfo, gimple *stmt)
> > > + {
> > > + if (!gimple_bb (stmt))
> > > + return false;
> > > +
> > > + if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
> > > + {
> > > + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> > > + if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt)))
> > > + return false;
> > > + }
> > > + else
> > > + {
> > > + bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo);
> > > + if (gimple_bb (stmt) != BB_VINFO_BB (bb_vinfo)
> > > + || gimple_uid (stmt) == -1U
> > > + || gimple_code (stmt) == GIMPLE_PHI)
> > > + return false;
> > > + }
> > > +
> > > + return true;
> > > + }
> > > +
> > > +
> > > /* If LOOP has been versioned during ifcvt, return the internal call
> > > guarding it. */
> > > *************** pass_slp_vectorize::execute (function *f
> > > *** 692,697 ****
> > > --- 719,732 ----
> > > scev_initialize ();
> > > }
> > > + /* Mark all stmts as not belonging to the current region. */
> > > + FOR_EACH_BB_FN (bb, fun)
> > > + {
> > > + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p
> > > (gsi);
> > > + gsi_next (&gsi))
> > > + gimple_set_uid (gsi_stmt (gsi), -1);
> > > + }
> > > +
> > > init_stmt_vec_info_vec ();
> > > FOR_EACH_BB_FN (bb, fun)
> > > Index: gcc/config/aarch64/aarch64.c
> > > ===================================================================
> > > *** gcc/config/aarch64/aarch64.c.orig 2015-10-28 11:22:25.290823112
> > > +0100
> > > --- gcc/config/aarch64/aarch64.c 2015-11-06 10:24:21.539818027 +0100
> > > ***************
> > > *** 52,57 ****
> > > --- 52,58 ----
> > > #include "params.h"
> > > #include "gimplify.h"
> > > #include "dwarf2.h"
> > > + #include "gimple-iterator.h"
> > > #include "tree-vectorizer.h"
> > > #include "aarch64-cost-tables.h"
> > > #include "dumpfile.h"
> > >
>
>
--
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Make BB vectorizer work on sub-BBs
2015-11-06 11:10 ` Richard Biener
2015-11-06 11:12 ` Kyrill Tkachov
@ 2015-11-06 16:13 ` Jeff Law
1 sibling, 0 replies; 8+ messages in thread
From: Jeff Law @ 2015-11-06 16:13 UTC (permalink / raw)
To: Richard Biener, gcc-patches
On 11/06/2015 04:09 AM, Richard Biener wrote:
> On Fri, 6 Nov 2015, Richard Biener wrote:
>
>>
>> The following patch makes the BB vectorizer not only handle BB heads
>> (until the first stmt with a data reference it cannot handle) but
>> arbitrary regions in a BB separated by such stmts.
>>
>> This improves the number of BB vectorizations from 469 to 556
>> in a quick test on SPEC CPU 2006 with -Ofast on x86_64 and
>> 1x400.perlbench 1x410.bwaves 1x416.gamess 1x450.soplex 1x453.povray
>> 1x481.wrf failing both patched and unpatched (have to update my
>> config used for such experiments it seems ...)
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu, aarch64 cross built.
>>
>> I'm currently re-testing for a cosmetic change I made when writing
>> the changelog.
>>
>> I expected (and there are) some issues with compile-time. Left
>> is unpatched and right is patched.
>>
>> '403.gcc': 00:00:54 (54) | '403.gcc': 00:00:55 (55)
>> '483.xalancbmk': 00:02:20 (140) | '483.xalancbmk': 00:02:24 (144)
>> '416.gamess': 00:02:36 (156) | '416.gamess': 00:02:37 (157)
>> '435.gromacs': 00:00:18 (18) | '435.gromacs': 00:00:19 (19)
>> '447.dealII': 00:01:31 (91) | '447.dealII': 00:01:33 (93)
>> '453.povray': 00:04:54 (294) | '453.povray': 00:08:54 (534)
>> '454.calculix': 00:00:34 (34) | '454.calculix': 00:00:52 (52)
>> '481.wrf': 00:01:57 (117) | '481.wrf': 00:01:59 (119)
>>
>> other benchmarks are unchanged. I'm double-checking now that a followup
>> patch I have which re-implements BB vectorization dependence checking
>> fixes this (that's the only quadraticness I know of).
>
> Fixes all but
>
> '453.povray': 00:04:54 (294) | '453.povray': 00:06:46 (406)
453.povray is mine, related to the FSM bits.
jeff
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Make BB vectorizer work on sub-BBs
2015-11-06 11:12 ` Kyrill Tkachov
2015-11-06 11:27 ` Richard Biener
@ 2015-11-10 12:56 ` Christophe Lyon
2015-11-10 13:03 ` Richard Biener
1 sibling, 1 reply; 8+ messages in thread
From: Christophe Lyon @ 2015-11-10 12:56 UTC (permalink / raw)
To: Kyrill Tkachov; +Cc: Richard Biener, gcc-patches
On 6 November 2015 at 12:11, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
> Hi Richard,
>
>
> On 06/11/15 11:09, Richard Biener wrote:
>>
>> On Fri, 6 Nov 2015, Richard Biener wrote:
>>
>>> The following patch makes the BB vectorizer not only handle BB heads
>>> (until the first stmt with a data reference it cannot handle) but
>>> arbitrary regions in a BB separated by such stmts.
>>>
>>> This improves the number of BB vectorizations from 469 to 556
>>> in a quick test on SPEC CPU 2006 with -Ofast on x86_64 and
>>> 1x400.perlbench 1x410.bwaves 1x416.gamess 1x450.soplex 1x453.povray
>>> 1x481.wrf failing both patched and unpatched (have to update my
>>> config used for such experiments it seems ...)
>>>
>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, aarch64 cross built.
>>>
>>> I'm currently re-testing for a cosmetic change I made when writing
>>> the changelog.
>>>
>>> I expected (and there are) some issues with compile-time. Left
>>> is unpatched and right is patched.
>>>
>>> '403.gcc': 00:00:54 (54) | '403.gcc': 00:00:55 (55)
>>> '483.xalancbmk': 00:02:20 (140) | '483.xalancbmk': 00:02:24 (144)
>>> '416.gamess': 00:02:36 (156) | '416.gamess': 00:02:37 (157)
>>> '435.gromacs': 00:00:18 (18) | '435.gromacs': 00:00:19 (19)
>>> '447.dealII': 00:01:31 (91) | '447.dealII': 00:01:33 (93)
>>> '453.povray': 00:04:54 (294) | '453.povray': 00:08:54 (534)
>>> '454.calculix': 00:00:34 (34) | '454.calculix': 00:00:52 (52)
>>> '481.wrf': 00:01:57 (117) | '481.wrf': 00:01:59 (119)
>>>
>>> other benchmarks are unchanged. I'm double-checking now that a followup
>>> patch I have which re-implements BB vectorization dependence checking
>>> fixes this (that's the only quadraticness I know of).
>>
>> Fixes all but
>>
>> '453.povray': 00:04:54 (294) | '453.povray': 00:06:46 (406)
>
>
> Note that povray is currently suffering from PR 68198
>
Hi,
I've also noticed that the new test bb-slp-38 fails on armeb:
FAIL: gcc.dg/vect/bb-slp-38.c -flto -ffat-lto-objects
scan-tree-dump-times slp2 "basic block part vectorized" 2
FAIL: gcc.dg/vect/bb-slp-38.c scan-tree-dump-times slp2 "basic block
part vectorized" 2
I haven't checked in more detail, maybe it's similar to what we
discussed in PR65962
> Kyrill
>
>
>>
>> it even improves compile-time on some:
>>
>> '464.h264ref': 00:00:26 (26) | '464.h264ref': 00:00:21 (21)
>>
>> it also increases the number of vectorized BBs to 722.
>>
>> Needs some work still though.
>>
>> Richard.
>>
>>> Richard.
>>>
>>> 2015-11-06 Richard Biener <rguenther@suse.de>
>>>
>>> * tree-vectorizer.h (struct _bb_vec_info): Add region_begin/end
>>> members.
>>> (vect_stmt_in_region_p): Declare.
>>> * tree-vect-slp.c (new_bb_vec_info): Work on a region.
>>> (destroy_bb_vec_info): Likewise.
>>> (vect_bb_slp_scalar_cost): Use vect_stmt_in_region_p.
>>> (vect_get_and_check_slp_defs): Likewise.
>>> (vect_slp_analyze_bb_1): Refactor to make it work on sub-BBs.
>>> (vect_slp_bb): Likewise.
>>> * tree-vect-patterns.c (vect_same_loop_or_bb_p): Implement
>>> in terms of vect_stmt_in_region_p.
>>> (vect_pattern_recog): Iterate over the BB region.
>>> * tree-vect-stmts.c (vect_is_simple_use): Use
>>> vect_stmt_in_region_p.
>>> * tree-vectorizer.c (vect_stmt_in_region_p): New function.
>>> (pass_slp_vectorize::execute): Initialize all stmt UIDs to -1.
>>>
>>> * config/i386/i386.c: Include gimple-iterator.h.
>>> * config/aarch64/aarch64.c: Likewise.
>>>
>>> * gcc.dg/vect/bb-slp-38.c: New testcase.
>>>
>>> Index: gcc/tree-vectorizer.h
>>> ===================================================================
>>> *** gcc/tree-vectorizer.h.orig 2015-11-05 09:52:00.640227178 +0100
>>> --- gcc/tree-vectorizer.h 2015-11-05 13:20:58.385786476 +0100
>>> *************** nested_in_vect_loop_p (struct loop *loop
>>> *** 390,395 ****
>>> --- 390,397 ----
>>> typedef struct _bb_vec_info : public vec_info
>>> {
>>> basic_block bb;
>>> + gimple_stmt_iterator region_begin;
>>> + gimple_stmt_iterator region_end;
>>> } *bb_vec_info;
>>> #define BB_VINFO_BB(B) (B)->bb
>>> *************** void vect_pattern_recog (vec_info *);
>>> *** 1085,1089 ****
>>> --- 1087,1092 ----
>>> /* In tree-vectorizer.c. */
>>> unsigned vectorize_loops (void);
>>> void vect_destroy_datarefs (vec_info *);
>>> + bool vect_stmt_in_region_p (vec_info *, gimple *);
>>> #endif /* GCC_TREE_VECTORIZER_H */
>>> Index: gcc/tree-vect-slp.c
>>> ===================================================================
>>> *** gcc/tree-vect-slp.c.orig 2015-11-05 09:52:00.640227178 +0100
>>> --- gcc/tree-vect-slp.c 2015-11-06 10:22:56.707880233 +0100
>>> *************** vect_get_and_check_slp_defs (vec_info *v
>>> *** 209,215 ****
>>> unsigned int i, number_of_oprnds;
>>> gimple *def_stmt;
>>> enum vect_def_type dt = vect_uninitialized_def;
>>> - struct loop *loop = NULL;
>>> bool pattern = false;
>>> slp_oprnd_info oprnd_info;
>>> int first_op_idx = 1;
>>> --- 209,214 ----
>>> *************** vect_get_and_check_slp_defs (vec_info *v
>>> *** 218,226 ****
>>> bool first = stmt_num == 0;
>>> bool second = stmt_num == 1;
>>> - if (is_a <loop_vec_info> (vinfo))
>>> - loop = LOOP_VINFO_LOOP (as_a <loop_vec_info> (vinfo));
>>> -
>>> if (is_gimple_call (stmt))
>>> {
>>> number_of_oprnds = gimple_call_num_args (stmt);
>>> --- 217,222 ----
>>> *************** again:
>>> *** 276,286 ****
>>> from the pattern. Check that all the stmts of the node are
>>> in the
>>> pattern. */
>>> if (def_stmt && gimple_bb (def_stmt)
>>> ! && ((is_a <loop_vec_info> (vinfo)
>>> ! && flow_bb_inside_loop_p (loop, gimple_bb (def_stmt)))
>>> ! || (is_a <bb_vec_info> (vinfo)
>>> ! && gimple_bb (def_stmt) == as_a <bb_vec_info>
>>> (vinfo)->bb
>>> ! && gimple_code (def_stmt) != GIMPLE_PHI))
>>> && vinfo_for_stmt (def_stmt)
>>> && STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (def_stmt))
>>> && !STMT_VINFO_RELEVANT (vinfo_for_stmt (def_stmt))
>>> --- 272,278 ----
>>> from the pattern. Check that all the stmts of the node are
>>> in the
>>> pattern. */
>>> if (def_stmt && gimple_bb (def_stmt)
>>> ! && vect_stmt_in_region_p (vinfo, def_stmt)
>>> && vinfo_for_stmt (def_stmt)
>>> && STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (def_stmt))
>>> && !STMT_VINFO_RELEVANT (vinfo_for_stmt (def_stmt))
>>> *************** vect_detect_hybrid_slp (loop_vec_info lo
>>> *** 2076,2091 ****
>>> stmt_vec_info structs for all the stmts in it. */
>>> static bb_vec_info
>>> ! new_bb_vec_info (basic_block bb)
>>> {
>>> bb_vec_info res = NULL;
>>> gimple_stmt_iterator gsi;
>>> res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info));
>>> res->kind = vec_info::bb;
>>> BB_VINFO_BB (res) = bb;
>>> ! for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>>> {
>>> gimple *stmt = gsi_stmt (gsi);
>>> gimple_set_uid (stmt, 0);
>>> --- 2068,2088 ----
>>> stmt_vec_info structs for all the stmts in it. */
>>> static bb_vec_info
>>> ! new_bb_vec_info (gimple_stmt_iterator region_begin,
>>> ! gimple_stmt_iterator region_end)
>>> {
>>> + basic_block bb = gsi_bb (region_begin);
>>> bb_vec_info res = NULL;
>>> gimple_stmt_iterator gsi;
>>> res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info));
>>> res->kind = vec_info::bb;
>>> BB_VINFO_BB (res) = bb;
>>> + res->region_begin = region_begin;
>>> + res->region_end = region_end;
>>> ! for (gsi = region_begin; gsi_stmt (gsi) != gsi_stmt (region_end);
>>> ! gsi_next (&gsi))
>>> {
>>> gimple *stmt = gsi_stmt (gsi);
>>> gimple_set_uid (stmt, 0);
>>> *************** destroy_bb_vec_info (bb_vec_info bb_vinf
>>> *** 2118,2124 ****
>>> bb = BB_VINFO_BB (bb_vinfo);
>>> ! for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
>>> {
>>> gimple *stmt = gsi_stmt (si);
>>> stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>>> --- 2115,2122 ----
>>> bb = BB_VINFO_BB (bb_vinfo);
>>> ! for (si = bb_vinfo->region_begin;
>>> ! gsi_stmt (si) != gsi_stmt (bb_vinfo->region_end); gsi_next
>>> (&si))
>>> {
>>> gimple *stmt = gsi_stmt (si);
>>> stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>>> *************** destroy_bb_vec_info (bb_vec_info bb_vinf
>>> *** 2126,2131 ****
>>> --- 2124,2132 ----
>>> if (stmt_info)
>>> /* Free stmt_vec_info. */
>>> free_stmt_vec_info (stmt);
>>> +
>>> + /* Reset region marker. */
>>> + gimple_set_uid (stmt, -1);
>>> }
>>> vect_destroy_datarefs (bb_vinfo);
>>> *************** vect_bb_slp_scalar_cost (basic_block bb,
>>> *** 2247,2254 ****
>>> gimple *use_stmt;
>>> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR
>>> (def_p))
>>> if (!is_gimple_debug (use_stmt)
>>> ! && (gimple_code (use_stmt) == GIMPLE_PHI
>>> ! || gimple_bb (use_stmt) != bb
>>> || !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt
>>> (use_stmt))))
>>> {
>>> (*life)[i] = true;
>>> --- 2248,2255 ----
>>> gimple *use_stmt;
>>> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR
>>> (def_p))
>>> if (!is_gimple_debug (use_stmt)
>>> ! && (! vect_stmt_in_region_p (vinfo_for_stmt
>>> (stmt)->vinfo,
>>> ! use_stmt)
>>> || !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt
>>> (use_stmt))))
>>> {
>>> (*life)[i] = true;
>>> *************** vect_bb_vectorization_profitable_p (bb_v
>>> *** 2327,2366 ****
>>> /* Check if the basic block can be vectorized. */
>>> static bb_vec_info
>>> ! vect_slp_analyze_bb_1 (basic_block bb)
>>> {
>>> bb_vec_info bb_vinfo;
>>> vec<slp_instance> slp_instances;
>>> slp_instance instance;
>>> int i;
>>> int min_vf = 2;
>>> - unsigned n_stmts = 0;
>>> ! bb_vinfo = new_bb_vec_info (bb);
>>> if (!bb_vinfo)
>>> return NULL;
>>> ! /* Gather all data references in the basic-block. */
>>> !
>>> ! for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
>>> ! !gsi_end_p (gsi); gsi_next (&gsi))
>>> ! {
>>> ! gimple *stmt = gsi_stmt (gsi);
>>> ! if (is_gimple_debug (stmt))
>>> ! continue;
>>> ! ++n_stmts;
>>> ! if (!find_data_references_in_stmt (NULL, stmt,
>>> ! &BB_VINFO_DATAREFS (bb_vinfo)))
>>> ! {
>>> ! /* Mark the rest of the basic-block as unvectorizable. */
>>> ! for (; !gsi_end_p (gsi); gsi_next (&gsi))
>>> ! {
>>> ! stmt = gsi_stmt (gsi);
>>> ! STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt)) = false;
>>> ! }
>>> ! break;
>>> ! }
>>> ! }
>>> /* Analyze the data references. */
>>> --- 2328,2358 ----
>>> /* Check if the basic block can be vectorized. */
>>> static bb_vec_info
>>> ! vect_slp_analyze_bb_1 (gimple_stmt_iterator region_begin,
>>> ! gimple_stmt_iterator region_end,
>>> ! vec<data_reference_p> datarefs, int n_stmts)
>>> {
>>> bb_vec_info bb_vinfo;
>>> vec<slp_instance> slp_instances;
>>> slp_instance instance;
>>> int i;
>>> int min_vf = 2;
>>> ! if (n_stmts > PARAM_VALUE (PARAM_SLP_MAX_INSNS_IN_BB))
>>> ! {
>>> ! if (dump_enabled_p ())
>>> ! dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>>> ! "not vectorized: too many instructions in "
>>> ! "basic block.\n");
>>> ! free_data_refs (datarefs);
>>> ! return NULL;
>>> ! }
>>> !
>>> ! bb_vinfo = new_bb_vec_info (region_begin, region_end);
>>> if (!bb_vinfo)
>>> return NULL;
>>> ! BB_VINFO_DATAREFS (bb_vinfo) = datarefs;
>>> /* Analyze the data references. */
>>> *************** vect_slp_analyze_bb_1 (basic_block bb)
>>> *** 2438,2445 ****
>>> }
>>> /* Mark all the statements that we do not want to vectorize. */
>>> ! for (gimple_stmt_iterator gsi = gsi_start_bb (BB_VINFO_BB
>>> (bb_vinfo));
>>> ! !gsi_end_p (gsi); gsi_next (&gsi))
>>> {
>>> stmt_vec_info vinfo = vinfo_for_stmt (gsi_stmt (gsi));
>>> if (STMT_SLP_TYPE (vinfo) != pure_slp)
>>> --- 2430,2437 ----
>>> }
>>> /* Mark all the statements that we do not want to vectorize. */
>>> ! for (gimple_stmt_iterator gsi = bb_vinfo->region_begin;
>>> ! gsi_stmt (gsi) != gsi_stmt (bb_vinfo->region_end); gsi_next
>>> (&gsi))
>>> {
>>> stmt_vec_info vinfo = vinfo_for_stmt (gsi_stmt (gsi));
>>> if (STMT_SLP_TYPE (vinfo) != pure_slp)
>>> *************** bool
>>> *** 2509,2585 ****
>>> vect_slp_bb (basic_block bb)
>>> {
>>> bb_vec_info bb_vinfo;
>>> - int insns = 0;
>>> gimple_stmt_iterator gsi;
>>> unsigned int vector_sizes;
>>> if (dump_enabled_p ())
>>> dump_printf_loc (MSG_NOTE, vect_location,
>>> "===vect_slp_analyze_bb===\n");
>>> - for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>>> - {
>>> - gimple *stmt = gsi_stmt (gsi);
>>> - if (!is_gimple_debug (stmt)
>>> - && !gimple_nop_p (stmt)
>>> - && gimple_code (stmt) != GIMPLE_LABEL)
>>> - insns++;
>>> - if (gimple_location (stmt) != UNKNOWN_LOCATION)
>>> - vect_location = gimple_location (stmt);
>>> - }
>>> -
>>> - if (insns > PARAM_VALUE (PARAM_SLP_MAX_INSNS_IN_BB))
>>> - {
>>> - if (dump_enabled_p ())
>>> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>>> - "not vectorized: too many instructions in "
>>> - "basic block.\n");
>>> -
>>> - return false;
>>> - }
>>> -
>>> /* Autodetect first vector size we try. */
>>> current_vector_size = 0;
>>> vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
>>> while (1)
>>> {
>>> ! bb_vinfo = vect_slp_analyze_bb_1 (bb);
>>> ! if (bb_vinfo)
>>> {
>>> ! if (!dbg_cnt (vect_slp))
>>> ! {
>>> ! destroy_bb_vec_info (bb_vinfo);
>>> ! return false;
>>> ! }
>>> if (dump_enabled_p ())
>>> ! dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB\n");
>>> vect_schedule_slp (bb_vinfo);
>>> if (dump_enabled_p ())
>>> dump_printf_loc (MSG_NOTE, vect_location,
>>> ! "BASIC BLOCK VECTORIZED\n");
>>> destroy_bb_vec_info (bb_vinfo);
>>> ! return true;
>>> }
>>> ! destroy_bb_vec_info (bb_vinfo);
>>> vector_sizes &= ~current_vector_size;
>>> ! if (vector_sizes == 0
>>> ! || current_vector_size == 0)
>>> ! return false;
>>> ! /* Try the next biggest vector size. */
>>> ! current_vector_size = 1 << floor_log2 (vector_sizes);
>>> ! if (dump_enabled_p ())
>>> ! dump_printf_loc (MSG_NOTE, vect_location,
>>> ! "***** Re-trying analysis with "
>>> ! "vector size %d\n", current_vector_size);
>>> }
>>> }
>>> --- 2501,2605 ----
>>> vect_slp_bb (basic_block bb)
>>> {
>>> bb_vec_info bb_vinfo;
>>> gimple_stmt_iterator gsi;
>>> unsigned int vector_sizes;
>>> + bool any_vectorized = false;
>>> if (dump_enabled_p ())
>>> dump_printf_loc (MSG_NOTE, vect_location,
>>> "===vect_slp_analyze_bb===\n");
>>> /* Autodetect first vector size we try. */
>>> current_vector_size = 0;
>>> vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
>>> + gsi = gsi_start_bb (bb);
>>> +
>>> while (1)
>>> {
>>> ! if (gsi_end_p (gsi))
>>> ! break;
>>> !
>>> ! gimple_stmt_iterator region_begin = gsi;
>>> ! vec<data_reference_p> datarefs = vNULL;
>>> ! int insns = 0;
>>> !
>>> ! for (; !gsi_end_p (gsi); gsi_next (&gsi))
>>> {
>>> ! gimple *stmt = gsi_stmt (gsi);
>>> ! if (is_gimple_debug (stmt))
>>> ! continue;
>>> ! insns++;
>>> !
>>> ! if (gimple_location (stmt) != UNKNOWN_LOCATION)
>>> ! vect_location = gimple_location (stmt);
>>> !
>>> ! if (!find_data_references_in_stmt (NULL, stmt, &datarefs))
>>> ! break;
>>> ! }
>>> !
>>> ! /* Skip leading unhandled stmts. */
>>> ! if (gsi_stmt (region_begin) == gsi_stmt (gsi))
>>> ! {
>>> ! gsi_next (&gsi);
>>> ! continue;
>>> ! }
>>> !
>>> ! gimple_stmt_iterator region_end = gsi;
>>> + bool vectorized = false;
>>> + bb_vinfo = vect_slp_analyze_bb_1 (region_begin, region_end,
>>> + datarefs, insns);
>>> + if (bb_vinfo
>>> + && dbg_cnt (vect_slp))
>>> + {
>>> if (dump_enabled_p ())
>>> ! dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB
>>> part\n");
>>> vect_schedule_slp (bb_vinfo);
>>> if (dump_enabled_p ())
>>> dump_printf_loc (MSG_NOTE, vect_location,
>>> ! "basic block part vectorized\n");
>>> destroy_bb_vec_info (bb_vinfo);
>>> ! vectorized = true;
>>> }
>>> + else
>>> + destroy_bb_vec_info (bb_vinfo);
>>> ! any_vectorized |= vectorized;
>>> vector_sizes &= ~current_vector_size;
>>> ! if (vectorized
>>> ! || vector_sizes == 0
>>> ! || current_vector_size == 0)
>>> ! {
>>> ! if (gsi_end_p (region_end))
>>> ! break;
>>> !
>>> ! /* Skip the unhandled stmt. */
>>> ! gsi_next (&gsi);
>>> !
>>> ! /* And reset vector sizes. */
>>> ! current_vector_size = 0;
>>> ! vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
>>> ! }
>>> ! else
>>> ! {
>>> ! /* Try the next biggest vector size. */
>>> ! current_vector_size = 1 << floor_log2 (vector_sizes);
>>> ! if (dump_enabled_p ())
>>> ! dump_printf_loc (MSG_NOTE, vect_location,
>>> ! "***** Re-trying analysis with "
>>> ! "vector size %d\n", current_vector_size);
>>> ! /* Start over. */
>>> ! gsi = region_begin;
>>> ! }
>>> }
>>> +
>>> + return any_vectorized;
>>> }
>>> Index: gcc/tree-vect-patterns.c
>>> ===================================================================
>>> *** gcc/tree-vect-patterns.c.orig 2015-11-05 09:52:00.640227178
>>> +0100
>>> --- gcc/tree-vect-patterns.c 2015-11-05 13:25:46.060011765 +0100
>>> *************** static bool
>>> *** 107,133 ****
>>> vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2)
>>> {
>>> stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1);
>>> ! loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
>>> ! bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_vinfo);
>>> !
>>> ! if (!gimple_bb (stmt2))
>>> ! return false;
>>> !
>>> ! if (loop_vinfo)
>>> ! {
>>> ! struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>>> ! if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt2)))
>>> ! return false;
>>> ! }
>>> ! else
>>> ! {
>>> ! if (gimple_bb (stmt2) != BB_VINFO_BB (bb_vinfo)
>>> ! || gimple_code (stmt2) == GIMPLE_PHI)
>>> ! return false;
>>> ! }
>>> !
>>> ! gcc_assert (vinfo_for_stmt (stmt2));
>>> ! return true;
>>> }
>>> /* If the LHS of DEF_STMT has a single use, and that statement is
>>> --- 107,113 ----
>>> vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2)
>>> {
>>> stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1);
>>> ! return vect_stmt_in_region_p (stmt_vinfo->vinfo, stmt2);
>>> }
>>> /* If the LHS of DEF_STMT has a single use, and that statement is
>>> *************** vect_pattern_recog (vec_info *vinfo)
>>> *** 3611,3643 ****
>>> loop = LOOP_VINFO_LOOP (loop_vinfo);
>>> bbs = LOOP_VINFO_BBS (loop_vinfo);
>>> nbbs = loop->num_nodes;
>>> }
>>> else
>>> {
>>> ! bbs = &as_a <bb_vec_info> (vinfo)->bb;
>>> ! nbbs = 1;
>>> ! }
>>> !
>>> ! /* Scan through the loop stmts, applying the pattern recognition
>>> ! functions starting at each stmt visited: */
>>> ! for (i = 0; i < nbbs; i++)
>>> ! {
>>> ! basic_block bb = bbs[i];
>>> ! for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
>>> ! {
>>> ! if (is_a <bb_vec_info> (vinfo)
>>> ! && (stmt = gsi_stmt (si))
>>> && vinfo_for_stmt (stmt)
>>> && !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt)))
>>> ! continue;
>>> ! /* Scan over all generic vect_recog_xxx_pattern functions.
>>> */
>>> ! for (j = 0; j < NUM_PATTERNS; j++)
>>> ! {
>>> vect_recog_func = vect_vect_recog_func_ptrs[j];
>>> vect_pattern_recog_1 (vect_recog_func, si,
>>> &stmts_to_replace);
>>> ! }
>>> ! }
>>> }
>>> }
>>> --- 3591,3632 ----
>>> loop = LOOP_VINFO_LOOP (loop_vinfo);
>>> bbs = LOOP_VINFO_BBS (loop_vinfo);
>>> nbbs = loop->num_nodes;
>>> +
>>> + /* Scan through the loop stmts, applying the pattern recognition
>>> + functions starting at each stmt visited: */
>>> + for (i = 0; i < nbbs; i++)
>>> + {
>>> + basic_block bb = bbs[i];
>>> + for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
>>> + {
>>> + /* Scan over all generic vect_recog_xxx_pattern functions.
>>> */
>>> + for (j = 0; j < NUM_PATTERNS; j++)
>>> + {
>>> + vect_recog_func = vect_vect_recog_func_ptrs[j];
>>> + vect_pattern_recog_1 (vect_recog_func, si,
>>> + &stmts_to_replace);
>>> + }
>>> + }
>>> + }
>>> }
>>> else
>>> {
>>> ! bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo);
>>> ! for (si = bb_vinfo->region_begin;
>>> ! gsi_stmt (si) != gsi_stmt (bb_vinfo->region_end); gsi_next
>>> (&si))
>>> ! {
>>> ! if ((stmt = gsi_stmt (si))
>>> && vinfo_for_stmt (stmt)
>>> && !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt)))
>>> ! continue;
>>> ! /* Scan over all generic vect_recog_xxx_pattern functions. */
>>> ! for (j = 0; j < NUM_PATTERNS; j++)
>>> ! {
>>> vect_recog_func = vect_vect_recog_func_ptrs[j];
>>> vect_pattern_recog_1 (vect_recog_func, si,
>>> &stmts_to_replace);
>>> ! }
>>> ! }
>>> }
>>> }
>>> Index: gcc/config/i386/i386.c
>>> ===================================================================
>>> *** gcc/config/i386/i386.c.orig 2015-11-05 09:52:42.239687133 +0100
>>> --- gcc/config/i386/i386.c 2015-11-05 11:09:09.451774562 +0100
>>> *************** along with GCC; see the file COPYING3.
>>> *** 64,69 ****
>>> --- 64,70 ----
>>> #include "context.h"
>>> #include "pass_manager.h"
>>> #include "target-globals.h"
>>> + #include "gimple-iterator.h"
>>> #include "tree-vectorizer.h"
>>> #include "shrink-wrap.h"
>>> #include "builtins.h"
>>> Index: gcc/testsuite/gcc.dg/vect/bb-slp-38.c
>>> ===================================================================
>>> *** /dev/null 1970-01-01 00:00:00.000000000 +0000
>>> --- gcc/testsuite/gcc.dg/vect/bb-slp-38.c 2015-11-05
>>> 14:00:48.177644327 +0100
>>> ***************
>>> *** 0 ****
>>> --- 1,44 ----
>>> + /* { dg-require-effective-target vect_int } */
>>> +
>>> + #include "tree-vect.h"
>>> +
>>> + extern void abort (void);
>>> +
>>> + int a[8], b[8];
>>> + int x;
>>> +
>>> + void __attribute__((noinline,noclone))
>>> + bar (void)
>>> + {
>>> + x = 1;
>>> + }
>>> +
>>> + void __attribute__((noinline,noclone))
>>> + foo(void)
>>> + {
>>> + a[0] = b[0];
>>> + a[1] = b[0];
>>> + a[2] = b[3];
>>> + a[3] = b[3];
>>> + bar ();
>>> + a[4] = b[4];
>>> + a[5] = b[7];
>>> + a[6] = b[4];
>>> + a[7] = b[7];
>>> + }
>>> +
>>> + int main()
>>> + {
>>> + int i;
>>> + check_vect ();
>>> + for (i = 0; i < 8; ++i)
>>> + b[i] = i;
>>> + foo ();
>>> + if (a[0] != 0 || a[1] != 0 || a[2] != 3 || a[3] != 3
>>> + || a[4] != 4 || a[5] != 7 || a[6] != 4 || a[7] != 7)
>>> + abort ();
>>> + return 0;
>>> + }
>>> +
>>> + /* { dg-final { scan-tree-dump "basic block vectorized" "slp2" { target
>>> vect_perm } } } */
>>> + /* { dg-final { scan-tree-dump-times "basic block part vectorized" 2
>>> "slp2" { target vect_perm } } } */
>>> Index: gcc/tree-vect-stmts.c
>>> ===================================================================
>>> *** gcc/tree-vect-stmts.c.orig 2015-11-02 12:37:11.074249388 +0100
>>> --- gcc/tree-vect-stmts.c 2015-11-05 13:29:21.413423692 +0100
>>> *************** vect_is_simple_use (tree operand, vec_in
>>> *** 8196,8207 ****
>>> dump_gimple_stmt (MSG_NOTE, TDF_SLIM, *def_stmt, 0);
>>> }
>>> ! basic_block bb = gimple_bb (*def_stmt);
>>> ! if ((is_a <loop_vec_info> (vinfo)
>>> ! && !flow_bb_inside_loop_p (as_a <loop_vec_info> (vinfo)->loop,
>>> bb))
>>> ! || (is_a <bb_vec_info> (vinfo)
>>> ! && (bb != as_a <bb_vec_info> (vinfo)->bb
>>> ! || gimple_code (*def_stmt) == GIMPLE_PHI)))
>>> *dt = vect_external_def;
>>> else
>>> {
>>> --- 8196,8202 ----
>>> dump_gimple_stmt (MSG_NOTE, TDF_SLIM, *def_stmt, 0);
>>> }
>>> ! if (! vect_stmt_in_region_p (vinfo, *def_stmt))
>>> *dt = vect_external_def;
>>> else
>>> {
>>> Index: gcc/tree-vectorizer.c
>>> ===================================================================
>>> *** gcc/tree-vectorizer.c.orig 2015-11-04 09:23:53.724687806 +0100
>>> --- gcc/tree-vectorizer.c 2015-11-05 13:55:08.299817570 +0100
>>> *************** vect_destroy_datarefs (vec_info *vinfo)
>>> *** 350,355 ****
>>> --- 350,382 ----
>>> }
>>> + /* Return whether STMT is inside the region we try to vectorize.
>>> */
>>> +
>>> + bool
>>> + vect_stmt_in_region_p (vec_info *vinfo, gimple *stmt)
>>> + {
>>> + if (!gimple_bb (stmt))
>>> + return false;
>>> +
>>> + if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
>>> + {
>>> + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>>> + if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt)))
>>> + return false;
>>> + }
>>> + else
>>> + {
>>> + bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo);
>>> + if (gimple_bb (stmt) != BB_VINFO_BB (bb_vinfo)
>>> + || gimple_uid (stmt) == -1U
>>> + || gimple_code (stmt) == GIMPLE_PHI)
>>> + return false;
>>> + }
>>> +
>>> + return true;
>>> + }
>>> +
>>> +
>>> /* If LOOP has been versioned during ifcvt, return the internal call
>>> guarding it. */
>>> *************** pass_slp_vectorize::execute (function *f
>>> *** 692,697 ****
>>> --- 719,732 ----
>>> scev_initialize ();
>>> }
>>> + /* Mark all stmts as not belonging to the current region. */
>>> + FOR_EACH_BB_FN (bb, fun)
>>> + {
>>> + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p
>>> (gsi);
>>> + gsi_next (&gsi))
>>> + gimple_set_uid (gsi_stmt (gsi), -1);
>>> + }
>>> +
>>> init_stmt_vec_info_vec ();
>>> FOR_EACH_BB_FN (bb, fun)
>>> Index: gcc/config/aarch64/aarch64.c
>>> ===================================================================
>>> *** gcc/config/aarch64/aarch64.c.orig 2015-10-28 11:22:25.290823112
>>> +0100
>>> --- gcc/config/aarch64/aarch64.c 2015-11-06 10:24:21.539818027
>>> +0100
>>> ***************
>>> *** 52,57 ****
>>> --- 52,58 ----
>>> #include "params.h"
>>> #include "gimplify.h"
>>> #include "dwarf2.h"
>>> + #include "gimple-iterator.h"
>>> #include "tree-vectorizer.h"
>>> #include "aarch64-cost-tables.h"
>>> #include "dumpfile.h"
>>>
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Make BB vectorizer work on sub-BBs
2015-11-10 12:56 ` Christophe Lyon
@ 2015-11-10 13:03 ` Richard Biener
2015-11-10 15:20 ` Christophe Lyon
0 siblings, 1 reply; 8+ messages in thread
From: Richard Biener @ 2015-11-10 13:03 UTC (permalink / raw)
To: Christophe Lyon; +Cc: Kyrill Tkachov, gcc-patches
On Tue, 10 Nov 2015, Christophe Lyon wrote:
> On 6 November 2015 at 12:11, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
> > Hi Richard,
> >
> >
> > On 06/11/15 11:09, Richard Biener wrote:
> >>
> >> On Fri, 6 Nov 2015, Richard Biener wrote:
> >>
> >>> The following patch makes the BB vectorizer not only handle BB heads
> >>> (until the first stmt with a data reference it cannot handle) but
> >>> arbitrary regions in a BB separated by such stmts.
> >>>
> >>> This improves the number of BB vectorizations from 469 to 556
> >>> in a quick test on SPEC CPU 2006 with -Ofast on x86_64 and
> >>> 1x400.perlbench 1x410.bwaves 1x416.gamess 1x450.soplex 1x453.povray
> >>> 1x481.wrf failing both patched and unpatched (have to update my
> >>> config used for such experiments it seems ...)
> >>>
> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu, aarch64 cross built.
> >>>
> >>> I'm currently re-testing for a cosmetic change I made when writing
> >>> the changelog.
> >>>
> >>> I expected (and there are) some issues with compile-time. Left
> >>> is unpatched and right is patched.
> >>>
> >>> '403.gcc': 00:00:54 (54) | '403.gcc': 00:00:55 (55)
> >>> '483.xalancbmk': 00:02:20 (140) | '483.xalancbmk': 00:02:24 (144)
> >>> '416.gamess': 00:02:36 (156) | '416.gamess': 00:02:37 (157)
> >>> '435.gromacs': 00:00:18 (18) | '435.gromacs': 00:00:19 (19)
> >>> '447.dealII': 00:01:31 (91) | '447.dealII': 00:01:33 (93)
> >>> '453.povray': 00:04:54 (294) | '453.povray': 00:08:54 (534)
> >>> '454.calculix': 00:00:34 (34) | '454.calculix': 00:00:52 (52)
> >>> '481.wrf': 00:01:57 (117) | '481.wrf': 00:01:59 (119)
> >>>
> >>> other benchmarks are unchanged. I'm double-checking now that a followup
> >>> patch I have which re-implements BB vectorization dependence checking
> >>> fixes this (that's the only quadraticness I know of).
> >>
> >> Fixes all but
> >>
> >> '453.povray': 00:04:54 (294) | '453.povray': 00:06:46 (406)
> >
> >
> > Note that povray is currently suffering from PR 68198
> >
>
> Hi,
>
> I've also noticed that the new test bb-slp-38 fails on armeb:
> FAIL: gcc.dg/vect/bb-slp-38.c -flto -ffat-lto-objects
> scan-tree-dump-times slp2 "basic block part vectorized" 2
> FAIL: gcc.dg/vect/bb-slp-38.c scan-tree-dump-times slp2 "basic block
> part vectorized" 2
>
> I haven't checked in more detail, maybe it's similar to what we
> discussed in PR65962
Maybe though there is no misalignment involved as far as I can see.
Please open a bug and attach vectorizer dumps.
Richard.
> > Kyrill
> >
> >
> >>
> >> it even improves compile-time on some:
> >>
> >> '464.h264ref': 00:00:26 (26) | '464.h264ref': 00:00:21 (21)
> >>
> >> it also increases the number of vectorized BBs to 722.
> >>
> >> Needs some work still though.
> >>
> >> Richard.
> >>
> >>> Richard.
> >>>
> >>> 2015-11-06 Richard Biener <rguenther@suse.de>
> >>>
> >>> * tree-vectorizer.h (struct _bb_vec_info): Add region_begin/end
> >>> members.
> >>> (vect_stmt_in_region_p): Declare.
> >>> * tree-vect-slp.c (new_bb_vec_info): Work on a region.
> >>> (destroy_bb_vec_info): Likewise.
> >>> (vect_bb_slp_scalar_cost): Use vect_stmt_in_region_p.
> >>> (vect_get_and_check_slp_defs): Likewise.
> >>> (vect_slp_analyze_bb_1): Refactor to make it work on sub-BBs.
> >>> (vect_slp_bb): Likewise.
> >>> * tree-vect-patterns.c (vect_same_loop_or_bb_p): Implement
> >>> in terms of vect_stmt_in_region_p.
> >>> (vect_pattern_recog): Iterate over the BB region.
> >>> * tree-vect-stmts.c (vect_is_simple_use): Use
> >>> vect_stmt_in_region_p.
> >>> * tree-vectorizer.c (vect_stmt_in_region_p): New function.
> >>> (pass_slp_vectorize::execute): Initialize all stmt UIDs to -1.
> >>>
> >>> * config/i386/i386.c: Include gimple-iterator.h.
> >>> * config/aarch64/aarch64.c: Likewise.
> >>>
> >>> * gcc.dg/vect/bb-slp-38.c: New testcase.
> >>>
> >>> Index: gcc/tree-vectorizer.h
> >>> ===================================================================
> >>> *** gcc/tree-vectorizer.h.orig 2015-11-05 09:52:00.640227178 +0100
> >>> --- gcc/tree-vectorizer.h 2015-11-05 13:20:58.385786476 +0100
> >>> *************** nested_in_vect_loop_p (struct loop *loop
> >>> *** 390,395 ****
> >>> --- 390,397 ----
> >>> typedef struct _bb_vec_info : public vec_info
> >>> {
> >>> basic_block bb;
> >>> + gimple_stmt_iterator region_begin;
> >>> + gimple_stmt_iterator region_end;
> >>> } *bb_vec_info;
> >>> #define BB_VINFO_BB(B) (B)->bb
> >>> *************** void vect_pattern_recog (vec_info *);
> >>> *** 1085,1089 ****
> >>> --- 1087,1092 ----
> >>> /* In tree-vectorizer.c. */
> >>> unsigned vectorize_loops (void);
> >>> void vect_destroy_datarefs (vec_info *);
> >>> + bool vect_stmt_in_region_p (vec_info *, gimple *);
> >>> #endif /* GCC_TREE_VECTORIZER_H */
> >>> Index: gcc/tree-vect-slp.c
> >>> ===================================================================
> >>> *** gcc/tree-vect-slp.c.orig 2015-11-05 09:52:00.640227178 +0100
> >>> --- gcc/tree-vect-slp.c 2015-11-06 10:22:56.707880233 +0100
> >>> *************** vect_get_and_check_slp_defs (vec_info *v
> >>> *** 209,215 ****
> >>> unsigned int i, number_of_oprnds;
> >>> gimple *def_stmt;
> >>> enum vect_def_type dt = vect_uninitialized_def;
> >>> - struct loop *loop = NULL;
> >>> bool pattern = false;
> >>> slp_oprnd_info oprnd_info;
> >>> int first_op_idx = 1;
> >>> --- 209,214 ----
> >>> *************** vect_get_and_check_slp_defs (vec_info *v
> >>> *** 218,226 ****
> >>> bool first = stmt_num == 0;
> >>> bool second = stmt_num == 1;
> >>> - if (is_a <loop_vec_info> (vinfo))
> >>> - loop = LOOP_VINFO_LOOP (as_a <loop_vec_info> (vinfo));
> >>> -
> >>> if (is_gimple_call (stmt))
> >>> {
> >>> number_of_oprnds = gimple_call_num_args (stmt);
> >>> --- 217,222 ----
> >>> *************** again:
> >>> *** 276,286 ****
> >>> from the pattern. Check that all the stmts of the node are
> >>> in the
> >>> pattern. */
> >>> if (def_stmt && gimple_bb (def_stmt)
> >>> ! && ((is_a <loop_vec_info> (vinfo)
> >>> ! && flow_bb_inside_loop_p (loop, gimple_bb (def_stmt)))
> >>> ! || (is_a <bb_vec_info> (vinfo)
> >>> ! && gimple_bb (def_stmt) == as_a <bb_vec_info>
> >>> (vinfo)->bb
> >>> ! && gimple_code (def_stmt) != GIMPLE_PHI))
> >>> && vinfo_for_stmt (def_stmt)
> >>> && STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (def_stmt))
> >>> && !STMT_VINFO_RELEVANT (vinfo_for_stmt (def_stmt))
> >>> --- 272,278 ----
> >>> from the pattern. Check that all the stmts of the node are
> >>> in the
> >>> pattern. */
> >>> if (def_stmt && gimple_bb (def_stmt)
> >>> ! && vect_stmt_in_region_p (vinfo, def_stmt)
> >>> && vinfo_for_stmt (def_stmt)
> >>> && STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (def_stmt))
> >>> && !STMT_VINFO_RELEVANT (vinfo_for_stmt (def_stmt))
> >>> *************** vect_detect_hybrid_slp (loop_vec_info lo
> >>> *** 2076,2091 ****
> >>> stmt_vec_info structs for all the stmts in it. */
> >>> static bb_vec_info
> >>> ! new_bb_vec_info (basic_block bb)
> >>> {
> >>> bb_vec_info res = NULL;
> >>> gimple_stmt_iterator gsi;
> >>> res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info));
> >>> res->kind = vec_info::bb;
> >>> BB_VINFO_BB (res) = bb;
> >>> ! for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> >>> {
> >>> gimple *stmt = gsi_stmt (gsi);
> >>> gimple_set_uid (stmt, 0);
> >>> --- 2068,2088 ----
> >>> stmt_vec_info structs for all the stmts in it. */
> >>> static bb_vec_info
> >>> ! new_bb_vec_info (gimple_stmt_iterator region_begin,
> >>> ! gimple_stmt_iterator region_end)
> >>> {
> >>> + basic_block bb = gsi_bb (region_begin);
> >>> bb_vec_info res = NULL;
> >>> gimple_stmt_iterator gsi;
> >>> res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info));
> >>> res->kind = vec_info::bb;
> >>> BB_VINFO_BB (res) = bb;
> >>> + res->region_begin = region_begin;
> >>> + res->region_end = region_end;
> >>> ! for (gsi = region_begin; gsi_stmt (gsi) != gsi_stmt (region_end);
> >>> ! gsi_next (&gsi))
> >>> {
> >>> gimple *stmt = gsi_stmt (gsi);
> >>> gimple_set_uid (stmt, 0);
> >>> *************** destroy_bb_vec_info (bb_vec_info bb_vinf
> >>> *** 2118,2124 ****
> >>> bb = BB_VINFO_BB (bb_vinfo);
> >>> ! for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
> >>> {
> >>> gimple *stmt = gsi_stmt (si);
> >>> stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> >>> --- 2115,2122 ----
> >>> bb = BB_VINFO_BB (bb_vinfo);
> >>> ! for (si = bb_vinfo->region_begin;
> >>> ! gsi_stmt (si) != gsi_stmt (bb_vinfo->region_end); gsi_next
> >>> (&si))
> >>> {
> >>> gimple *stmt = gsi_stmt (si);
> >>> stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
> >>> *************** destroy_bb_vec_info (bb_vec_info bb_vinf
> >>> *** 2126,2131 ****
> >>> --- 2124,2132 ----
> >>> if (stmt_info)
> >>> /* Free stmt_vec_info. */
> >>> free_stmt_vec_info (stmt);
> >>> +
> >>> + /* Reset region marker. */
> >>> + gimple_set_uid (stmt, -1);
> >>> }
> >>> vect_destroy_datarefs (bb_vinfo);
> >>> *************** vect_bb_slp_scalar_cost (basic_block bb,
> >>> *** 2247,2254 ****
> >>> gimple *use_stmt;
> >>> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR
> >>> (def_p))
> >>> if (!is_gimple_debug (use_stmt)
> >>> ! && (gimple_code (use_stmt) == GIMPLE_PHI
> >>> ! || gimple_bb (use_stmt) != bb
> >>> || !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt
> >>> (use_stmt))))
> >>> {
> >>> (*life)[i] = true;
> >>> --- 2248,2255 ----
> >>> gimple *use_stmt;
> >>> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR
> >>> (def_p))
> >>> if (!is_gimple_debug (use_stmt)
> >>> ! && (! vect_stmt_in_region_p (vinfo_for_stmt
> >>> (stmt)->vinfo,
> >>> ! use_stmt)
> >>> || !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt
> >>> (use_stmt))))
> >>> {
> >>> (*life)[i] = true;
> >>> *************** vect_bb_vectorization_profitable_p (bb_v
> >>> *** 2327,2366 ****
> >>> /* Check if the basic block can be vectorized. */
> >>> static bb_vec_info
> >>> ! vect_slp_analyze_bb_1 (basic_block bb)
> >>> {
> >>> bb_vec_info bb_vinfo;
> >>> vec<slp_instance> slp_instances;
> >>> slp_instance instance;
> >>> int i;
> >>> int min_vf = 2;
> >>> - unsigned n_stmts = 0;
> >>> ! bb_vinfo = new_bb_vec_info (bb);
> >>> if (!bb_vinfo)
> >>> return NULL;
> >>> ! /* Gather all data references in the basic-block. */
> >>> !
> >>> ! for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
> >>> ! !gsi_end_p (gsi); gsi_next (&gsi))
> >>> ! {
> >>> ! gimple *stmt = gsi_stmt (gsi);
> >>> ! if (is_gimple_debug (stmt))
> >>> ! continue;
> >>> ! ++n_stmts;
> >>> ! if (!find_data_references_in_stmt (NULL, stmt,
> >>> ! &BB_VINFO_DATAREFS (bb_vinfo)))
> >>> ! {
> >>> ! /* Mark the rest of the basic-block as unvectorizable. */
> >>> ! for (; !gsi_end_p (gsi); gsi_next (&gsi))
> >>> ! {
> >>> ! stmt = gsi_stmt (gsi);
> >>> ! STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt)) = false;
> >>> ! }
> >>> ! break;
> >>> ! }
> >>> ! }
> >>> /* Analyze the data references. */
> >>> --- 2328,2358 ----
> >>> /* Check if the basic block can be vectorized. */
> >>> static bb_vec_info
> >>> ! vect_slp_analyze_bb_1 (gimple_stmt_iterator region_begin,
> >>> ! gimple_stmt_iterator region_end,
> >>> ! vec<data_reference_p> datarefs, int n_stmts)
> >>> {
> >>> bb_vec_info bb_vinfo;
> >>> vec<slp_instance> slp_instances;
> >>> slp_instance instance;
> >>> int i;
> >>> int min_vf = 2;
> >>> ! if (n_stmts > PARAM_VALUE (PARAM_SLP_MAX_INSNS_IN_BB))
> >>> ! {
> >>> ! if (dump_enabled_p ())
> >>> ! dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >>> ! "not vectorized: too many instructions in "
> >>> ! "basic block.\n");
> >>> ! free_data_refs (datarefs);
> >>> ! return NULL;
> >>> ! }
> >>> !
> >>> ! bb_vinfo = new_bb_vec_info (region_begin, region_end);
> >>> if (!bb_vinfo)
> >>> return NULL;
> >>> ! BB_VINFO_DATAREFS (bb_vinfo) = datarefs;
> >>> /* Analyze the data references. */
> >>> *************** vect_slp_analyze_bb_1 (basic_block bb)
> >>> *** 2438,2445 ****
> >>> }
> >>> /* Mark all the statements that we do not want to vectorize. */
> >>> ! for (gimple_stmt_iterator gsi = gsi_start_bb (BB_VINFO_BB
> >>> (bb_vinfo));
> >>> ! !gsi_end_p (gsi); gsi_next (&gsi))
> >>> {
> >>> stmt_vec_info vinfo = vinfo_for_stmt (gsi_stmt (gsi));
> >>> if (STMT_SLP_TYPE (vinfo) != pure_slp)
> >>> --- 2430,2437 ----
> >>> }
> >>> /* Mark all the statements that we do not want to vectorize. */
> >>> ! for (gimple_stmt_iterator gsi = bb_vinfo->region_begin;
> >>> ! gsi_stmt (gsi) != gsi_stmt (bb_vinfo->region_end); gsi_next
> >>> (&gsi))
> >>> {
> >>> stmt_vec_info vinfo = vinfo_for_stmt (gsi_stmt (gsi));
> >>> if (STMT_SLP_TYPE (vinfo) != pure_slp)
> >>> *************** bool
> >>> *** 2509,2585 ****
> >>> vect_slp_bb (basic_block bb)
> >>> {
> >>> bb_vec_info bb_vinfo;
> >>> - int insns = 0;
> >>> gimple_stmt_iterator gsi;
> >>> unsigned int vector_sizes;
> >>> if (dump_enabled_p ())
> >>> dump_printf_loc (MSG_NOTE, vect_location,
> >>> "===vect_slp_analyze_bb===\n");
> >>> - for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> >>> - {
> >>> - gimple *stmt = gsi_stmt (gsi);
> >>> - if (!is_gimple_debug (stmt)
> >>> - && !gimple_nop_p (stmt)
> >>> - && gimple_code (stmt) != GIMPLE_LABEL)
> >>> - insns++;
> >>> - if (gimple_location (stmt) != UNKNOWN_LOCATION)
> >>> - vect_location = gimple_location (stmt);
> >>> - }
> >>> -
> >>> - if (insns > PARAM_VALUE (PARAM_SLP_MAX_INSNS_IN_BB))
> >>> - {
> >>> - if (dump_enabled_p ())
> >>> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> >>> - "not vectorized: too many instructions in "
> >>> - "basic block.\n");
> >>> -
> >>> - return false;
> >>> - }
> >>> -
> >>> /* Autodetect first vector size we try. */
> >>> current_vector_size = 0;
> >>> vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
> >>> while (1)
> >>> {
> >>> ! bb_vinfo = vect_slp_analyze_bb_1 (bb);
> >>> ! if (bb_vinfo)
> >>> {
> >>> ! if (!dbg_cnt (vect_slp))
> >>> ! {
> >>> ! destroy_bb_vec_info (bb_vinfo);
> >>> ! return false;
> >>> ! }
> >>> if (dump_enabled_p ())
> >>> ! dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB\n");
> >>> vect_schedule_slp (bb_vinfo);
> >>> if (dump_enabled_p ())
> >>> dump_printf_loc (MSG_NOTE, vect_location,
> >>> ! "BASIC BLOCK VECTORIZED\n");
> >>> destroy_bb_vec_info (bb_vinfo);
> >>> ! return true;
> >>> }
> >>> ! destroy_bb_vec_info (bb_vinfo);
> >>> vector_sizes &= ~current_vector_size;
> >>> ! if (vector_sizes == 0
> >>> ! || current_vector_size == 0)
> >>> ! return false;
> >>> ! /* Try the next biggest vector size. */
> >>> ! current_vector_size = 1 << floor_log2 (vector_sizes);
> >>> ! if (dump_enabled_p ())
> >>> ! dump_printf_loc (MSG_NOTE, vect_location,
> >>> ! "***** Re-trying analysis with "
> >>> ! "vector size %d\n", current_vector_size);
> >>> }
> >>> }
> >>> --- 2501,2605 ----
> >>> vect_slp_bb (basic_block bb)
> >>> {
> >>> bb_vec_info bb_vinfo;
> >>> gimple_stmt_iterator gsi;
> >>> unsigned int vector_sizes;
> >>> + bool any_vectorized = false;
> >>> if (dump_enabled_p ())
> >>> dump_printf_loc (MSG_NOTE, vect_location,
> >>> "===vect_slp_analyze_bb===\n");
> >>> /* Autodetect first vector size we try. */
> >>> current_vector_size = 0;
> >>> vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
> >>> + gsi = gsi_start_bb (bb);
> >>> +
> >>> while (1)
> >>> {
> >>> ! if (gsi_end_p (gsi))
> >>> ! break;
> >>> !
> >>> ! gimple_stmt_iterator region_begin = gsi;
> >>> ! vec<data_reference_p> datarefs = vNULL;
> >>> ! int insns = 0;
> >>> !
> >>> ! for (; !gsi_end_p (gsi); gsi_next (&gsi))
> >>> {
> >>> ! gimple *stmt = gsi_stmt (gsi);
> >>> ! if (is_gimple_debug (stmt))
> >>> ! continue;
> >>> ! insns++;
> >>> !
> >>> ! if (gimple_location (stmt) != UNKNOWN_LOCATION)
> >>> ! vect_location = gimple_location (stmt);
> >>> !
> >>> ! if (!find_data_references_in_stmt (NULL, stmt, &datarefs))
> >>> ! break;
> >>> ! }
> >>> !
> >>> ! /* Skip leading unhandled stmts. */
> >>> ! if (gsi_stmt (region_begin) == gsi_stmt (gsi))
> >>> ! {
> >>> ! gsi_next (&gsi);
> >>> ! continue;
> >>> ! }
> >>> !
> >>> ! gimple_stmt_iterator region_end = gsi;
> >>> + bool vectorized = false;
> >>> + bb_vinfo = vect_slp_analyze_bb_1 (region_begin, region_end,
> >>> + datarefs, insns);
> >>> + if (bb_vinfo
> >>> + && dbg_cnt (vect_slp))
> >>> + {
> >>> if (dump_enabled_p ())
> >>> ! dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB
> >>> part\n");
> >>> vect_schedule_slp (bb_vinfo);
> >>> if (dump_enabled_p ())
> >>> dump_printf_loc (MSG_NOTE, vect_location,
> >>> ! "basic block part vectorized\n");
> >>> destroy_bb_vec_info (bb_vinfo);
> >>> ! vectorized = true;
> >>> }
> >>> + else
> >>> + destroy_bb_vec_info (bb_vinfo);
> >>> ! any_vectorized |= vectorized;
> >>> vector_sizes &= ~current_vector_size;
> >>> ! if (vectorized
> >>> ! || vector_sizes == 0
> >>> ! || current_vector_size == 0)
> >>> ! {
> >>> ! if (gsi_end_p (region_end))
> >>> ! break;
> >>> !
> >>> ! /* Skip the unhandled stmt. */
> >>> ! gsi_next (&gsi);
> >>> !
> >>> ! /* And reset vector sizes. */
> >>> ! current_vector_size = 0;
> >>> ! vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
> >>> ! }
> >>> ! else
> >>> ! {
> >>> ! /* Try the next biggest vector size. */
> >>> ! current_vector_size = 1 << floor_log2 (vector_sizes);
> >>> ! if (dump_enabled_p ())
> >>> ! dump_printf_loc (MSG_NOTE, vect_location,
> >>> ! "***** Re-trying analysis with "
> >>> ! "vector size %d\n", current_vector_size);
> >>> ! /* Start over. */
> >>> ! gsi = region_begin;
> >>> ! }
> >>> }
> >>> +
> >>> + return any_vectorized;
> >>> }
> >>> Index: gcc/tree-vect-patterns.c
> >>> ===================================================================
> >>> *** gcc/tree-vect-patterns.c.orig 2015-11-05 09:52:00.640227178
> >>> +0100
> >>> --- gcc/tree-vect-patterns.c 2015-11-05 13:25:46.060011765 +0100
> >>> *************** static bool
> >>> *** 107,133 ****
> >>> vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2)
> >>> {
> >>> stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1);
> >>> ! loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
> >>> ! bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_vinfo);
> >>> !
> >>> ! if (!gimple_bb (stmt2))
> >>> ! return false;
> >>> !
> >>> ! if (loop_vinfo)
> >>> ! {
> >>> ! struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> >>> ! if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt2)))
> >>> ! return false;
> >>> ! }
> >>> ! else
> >>> ! {
> >>> ! if (gimple_bb (stmt2) != BB_VINFO_BB (bb_vinfo)
> >>> ! || gimple_code (stmt2) == GIMPLE_PHI)
> >>> ! return false;
> >>> ! }
> >>> !
> >>> ! gcc_assert (vinfo_for_stmt (stmt2));
> >>> ! return true;
> >>> }
> >>> /* If the LHS of DEF_STMT has a single use, and that statement is
> >>> --- 107,113 ----
> >>> vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2)
> >>> {
> >>> stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1);
> >>> ! return vect_stmt_in_region_p (stmt_vinfo->vinfo, stmt2);
> >>> }
> >>> /* If the LHS of DEF_STMT has a single use, and that statement is
> >>> *************** vect_pattern_recog (vec_info *vinfo)
> >>> *** 3611,3643 ****
> >>> loop = LOOP_VINFO_LOOP (loop_vinfo);
> >>> bbs = LOOP_VINFO_BBS (loop_vinfo);
> >>> nbbs = loop->num_nodes;
> >>> }
> >>> else
> >>> {
> >>> ! bbs = &as_a <bb_vec_info> (vinfo)->bb;
> >>> ! nbbs = 1;
> >>> ! }
> >>> !
> >>> ! /* Scan through the loop stmts, applying the pattern recognition
> >>> ! functions starting at each stmt visited: */
> >>> ! for (i = 0; i < nbbs; i++)
> >>> ! {
> >>> ! basic_block bb = bbs[i];
> >>> ! for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
> >>> ! {
> >>> ! if (is_a <bb_vec_info> (vinfo)
> >>> ! && (stmt = gsi_stmt (si))
> >>> && vinfo_for_stmt (stmt)
> >>> && !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt)))
> >>> ! continue;
> >>> ! /* Scan over all generic vect_recog_xxx_pattern functions.
> >>> */
> >>> ! for (j = 0; j < NUM_PATTERNS; j++)
> >>> ! {
> >>> vect_recog_func = vect_vect_recog_func_ptrs[j];
> >>> vect_pattern_recog_1 (vect_recog_func, si,
> >>> &stmts_to_replace);
> >>> ! }
> >>> ! }
> >>> }
> >>> }
> >>> --- 3591,3632 ----
> >>> loop = LOOP_VINFO_LOOP (loop_vinfo);
> >>> bbs = LOOP_VINFO_BBS (loop_vinfo);
> >>> nbbs = loop->num_nodes;
> >>> +
> >>> + /* Scan through the loop stmts, applying the pattern recognition
> >>> + functions starting at each stmt visited: */
> >>> + for (i = 0; i < nbbs; i++)
> >>> + {
> >>> + basic_block bb = bbs[i];
> >>> + for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
> >>> + {
> >>> + /* Scan over all generic vect_recog_xxx_pattern functions.
> >>> */
> >>> + for (j = 0; j < NUM_PATTERNS; j++)
> >>> + {
> >>> + vect_recog_func = vect_vect_recog_func_ptrs[j];
> >>> + vect_pattern_recog_1 (vect_recog_func, si,
> >>> + &stmts_to_replace);
> >>> + }
> >>> + }
> >>> + }
> >>> }
> >>> else
> >>> {
> >>> ! bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo);
> >>> ! for (si = bb_vinfo->region_begin;
> >>> ! gsi_stmt (si) != gsi_stmt (bb_vinfo->region_end); gsi_next
> >>> (&si))
> >>> ! {
> >>> ! if ((stmt = gsi_stmt (si))
> >>> && vinfo_for_stmt (stmt)
> >>> && !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt)))
> >>> ! continue;
> >>> ! /* Scan over all generic vect_recog_xxx_pattern functions. */
> >>> ! for (j = 0; j < NUM_PATTERNS; j++)
> >>> ! {
> >>> vect_recog_func = vect_vect_recog_func_ptrs[j];
> >>> vect_pattern_recog_1 (vect_recog_func, si,
> >>> &stmts_to_replace);
> >>> ! }
> >>> ! }
> >>> }
> >>> }
> >>> Index: gcc/config/i386/i386.c
> >>> ===================================================================
> >>> *** gcc/config/i386/i386.c.orig 2015-11-05 09:52:42.239687133 +0100
> >>> --- gcc/config/i386/i386.c 2015-11-05 11:09:09.451774562 +0100
> >>> *************** along with GCC; see the file COPYING3.
> >>> *** 64,69 ****
> >>> --- 64,70 ----
> >>> #include "context.h"
> >>> #include "pass_manager.h"
> >>> #include "target-globals.h"
> >>> + #include "gimple-iterator.h"
> >>> #include "tree-vectorizer.h"
> >>> #include "shrink-wrap.h"
> >>> #include "builtins.h"
> >>> Index: gcc/testsuite/gcc.dg/vect/bb-slp-38.c
> >>> ===================================================================
> >>> *** /dev/null 1970-01-01 00:00:00.000000000 +0000
> >>> --- gcc/testsuite/gcc.dg/vect/bb-slp-38.c 2015-11-05
> >>> 14:00:48.177644327 +0100
> >>> ***************
> >>> *** 0 ****
> >>> --- 1,44 ----
> >>> + /* { dg-require-effective-target vect_int } */
> >>> +
> >>> + #include "tree-vect.h"
> >>> +
> >>> + extern void abort (void);
> >>> +
> >>> + int a[8], b[8];
> >>> + int x;
> >>> +
> >>> + void __attribute__((noinline,noclone))
> >>> + bar (void)
> >>> + {
> >>> + x = 1;
> >>> + }
> >>> +
> >>> + void __attribute__((noinline,noclone))
> >>> + foo(void)
> >>> + {
> >>> + a[0] = b[0];
> >>> + a[1] = b[0];
> >>> + a[2] = b[3];
> >>> + a[3] = b[3];
> >>> + bar ();
> >>> + a[4] = b[4];
> >>> + a[5] = b[7];
> >>> + a[6] = b[4];
> >>> + a[7] = b[7];
> >>> + }
> >>> +
> >>> + int main()
> >>> + {
> >>> + int i;
> >>> + check_vect ();
> >>> + for (i = 0; i < 8; ++i)
> >>> + b[i] = i;
> >>> + foo ();
> >>> + if (a[0] != 0 || a[1] != 0 || a[2] != 3 || a[3] != 3
> >>> + || a[4] != 4 || a[5] != 7 || a[6] != 4 || a[7] != 7)
> >>> + abort ();
> >>> + return 0;
> >>> + }
> >>> +
> >>> + /* { dg-final { scan-tree-dump "basic block vectorized" "slp2" { target
> >>> vect_perm } } } */
> >>> + /* { dg-final { scan-tree-dump-times "basic block part vectorized" 2
> >>> "slp2" { target vect_perm } } } */
> >>> Index: gcc/tree-vect-stmts.c
> >>> ===================================================================
> >>> *** gcc/tree-vect-stmts.c.orig 2015-11-02 12:37:11.074249388 +0100
> >>> --- gcc/tree-vect-stmts.c 2015-11-05 13:29:21.413423692 +0100
> >>> *************** vect_is_simple_use (tree operand, vec_in
> >>> *** 8196,8207 ****
> >>> dump_gimple_stmt (MSG_NOTE, TDF_SLIM, *def_stmt, 0);
> >>> }
> >>> ! basic_block bb = gimple_bb (*def_stmt);
> >>> ! if ((is_a <loop_vec_info> (vinfo)
> >>> ! && !flow_bb_inside_loop_p (as_a <loop_vec_info> (vinfo)->loop,
> >>> bb))
> >>> ! || (is_a <bb_vec_info> (vinfo)
> >>> ! && (bb != as_a <bb_vec_info> (vinfo)->bb
> >>> ! || gimple_code (*def_stmt) == GIMPLE_PHI)))
> >>> *dt = vect_external_def;
> >>> else
> >>> {
> >>> --- 8196,8202 ----
> >>> dump_gimple_stmt (MSG_NOTE, TDF_SLIM, *def_stmt, 0);
> >>> }
> >>> ! if (! vect_stmt_in_region_p (vinfo, *def_stmt))
> >>> *dt = vect_external_def;
> >>> else
> >>> {
> >>> Index: gcc/tree-vectorizer.c
> >>> ===================================================================
> >>> *** gcc/tree-vectorizer.c.orig 2015-11-04 09:23:53.724687806 +0100
> >>> --- gcc/tree-vectorizer.c 2015-11-05 13:55:08.299817570 +0100
> >>> *************** vect_destroy_datarefs (vec_info *vinfo)
> >>> *** 350,355 ****
> >>> --- 350,382 ----
> >>> }
> >>> + /* Return whether STMT is inside the region we try to vectorize.
> >>> */
> >>> +
> >>> + bool
> >>> + vect_stmt_in_region_p (vec_info *vinfo, gimple *stmt)
> >>> + {
> >>> + if (!gimple_bb (stmt))
> >>> + return false;
> >>> +
> >>> + if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
> >>> + {
> >>> + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
> >>> + if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt)))
> >>> + return false;
> >>> + }
> >>> + else
> >>> + {
> >>> + bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo);
> >>> + if (gimple_bb (stmt) != BB_VINFO_BB (bb_vinfo)
> >>> + || gimple_uid (stmt) == -1U
> >>> + || gimple_code (stmt) == GIMPLE_PHI)
> >>> + return false;
> >>> + }
> >>> +
> >>> + return true;
> >>> + }
> >>> +
> >>> +
> >>> /* If LOOP has been versioned during ifcvt, return the internal call
> >>> guarding it. */
> >>> *************** pass_slp_vectorize::execute (function *f
> >>> *** 692,697 ****
> >>> --- 719,732 ----
> >>> scev_initialize ();
> >>> }
> >>> + /* Mark all stmts as not belonging to the current region. */
> >>> + FOR_EACH_BB_FN (bb, fun)
> >>> + {
> >>> + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p
> >>> (gsi);
> >>> + gsi_next (&gsi))
> >>> + gimple_set_uid (gsi_stmt (gsi), -1);
> >>> + }
> >>> +
> >>> init_stmt_vec_info_vec ();
> >>> FOR_EACH_BB_FN (bb, fun)
> >>> Index: gcc/config/aarch64/aarch64.c
> >>> ===================================================================
> >>> *** gcc/config/aarch64/aarch64.c.orig 2015-10-28 11:22:25.290823112
> >>> +0100
> >>> --- gcc/config/aarch64/aarch64.c 2015-11-06 10:24:21.539818027
> >>> +0100
> >>> ***************
> >>> *** 52,57 ****
> >>> --- 52,58 ----
> >>> #include "params.h"
> >>> #include "gimplify.h"
> >>> #include "dwarf2.h"
> >>> + #include "gimple-iterator.h"
> >>> #include "tree-vectorizer.h"
> >>> #include "aarch64-cost-tables.h"
> >>> #include "dumpfile.h"
> >>>
> >
>
>
--
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Make BB vectorizer work on sub-BBs
2015-11-10 13:03 ` Richard Biener
@ 2015-11-10 15:20 ` Christophe Lyon
0 siblings, 0 replies; 8+ messages in thread
From: Christophe Lyon @ 2015-11-10 15:20 UTC (permalink / raw)
To: Richard Biener; +Cc: Kyrill Tkachov, gcc-patches
On 10 November 2015 at 14:02, Richard Biener <rguenther@suse.de> wrote:
> On Tue, 10 Nov 2015, Christophe Lyon wrote:
>
>> On 6 November 2015 at 12:11, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote:
>> > Hi Richard,
>> >
>> >
>> > On 06/11/15 11:09, Richard Biener wrote:
>> >>
>> >> On Fri, 6 Nov 2015, Richard Biener wrote:
>> >>
>> >>> The following patch makes the BB vectorizer not only handle BB heads
>> >>> (until the first stmt with a data reference it cannot handle) but
>> >>> arbitrary regions in a BB separated by such stmts.
>> >>>
>> >>> This improves the number of BB vectorizations from 469 to 556
>> >>> in a quick test on SPEC CPU 2006 with -Ofast on x86_64 and
>> >>> 1x400.perlbench 1x410.bwaves 1x416.gamess 1x450.soplex 1x453.povray
>> >>> 1x481.wrf failing both patched and unpatched (have to update my
>> >>> config used for such experiments it seems ...)
>> >>>
>> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu, aarch64 cross built.
>> >>>
>> >>> I'm currently re-testing for a cosmetic change I made when writing
>> >>> the changelog.
>> >>>
>> >>> I expected (and there are) some issues with compile-time. Left
>> >>> is unpatched and right is patched.
>> >>>
>> >>> '403.gcc': 00:00:54 (54) | '403.gcc': 00:00:55 (55)
>> >>> '483.xalancbmk': 00:02:20 (140) | '483.xalancbmk': 00:02:24 (144)
>> >>> '416.gamess': 00:02:36 (156) | '416.gamess': 00:02:37 (157)
>> >>> '435.gromacs': 00:00:18 (18) | '435.gromacs': 00:00:19 (19)
>> >>> '447.dealII': 00:01:31 (91) | '447.dealII': 00:01:33 (93)
>> >>> '453.povray': 00:04:54 (294) | '453.povray': 00:08:54 (534)
>> >>> '454.calculix': 00:00:34 (34) | '454.calculix': 00:00:52 (52)
>> >>> '481.wrf': 00:01:57 (117) | '481.wrf': 00:01:59 (119)
>> >>>
>> >>> other benchmarks are unchanged. I'm double-checking now that a followup
>> >>> patch I have which re-implements BB vectorization dependence checking
>> >>> fixes this (that's the only quadraticness I know of).
>> >>
>> >> Fixes all but
>> >>
>> >> '453.povray': 00:04:54 (294) | '453.povray': 00:06:46 (406)
>> >
>> >
>> > Note that povray is currently suffering from PR 68198
>> >
>>
>> Hi,
>>
>> I've also noticed that the new test bb-slp-38 fails on armeb:
>> FAIL: gcc.dg/vect/bb-slp-38.c -flto -ffat-lto-objects
>> scan-tree-dump-times slp2 "basic block part vectorized" 2
>> FAIL: gcc.dg/vect/bb-slp-38.c scan-tree-dump-times slp2 "basic block
>> part vectorized" 2
>>
>> I haven't checked in more detail, maybe it's similar to what we
>> discussed in PR65962
>
> Maybe though there is no misalignment involved as far as I can see.
>
> Please open a bug and attach vectorizer dumps.
>
OK, this is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68275
> Richard.
>
>> > Kyrill
>> >
>> >
>> >>
>> >> it even improves compile-time on some:
>> >>
>> >> '464.h264ref': 00:00:26 (26) | '464.h264ref': 00:00:21 (21)
>> >>
>> >> it also increases the number of vectorized BBs to 722.
>> >>
>> >> Needs some work still though.
>> >>
>> >> Richard.
>> >>
>> >>> Richard.
>> >>>
>> >>> 2015-11-06 Richard Biener <rguenther@suse.de>
>> >>>
>> >>> * tree-vectorizer.h (struct _bb_vec_info): Add region_begin/end
>> >>> members.
>> >>> (vect_stmt_in_region_p): Declare.
>> >>> * tree-vect-slp.c (new_bb_vec_info): Work on a region.
>> >>> (destroy_bb_vec_info): Likewise.
>> >>> (vect_bb_slp_scalar_cost): Use vect_stmt_in_region_p.
>> >>> (vect_get_and_check_slp_defs): Likewise.
>> >>> (vect_slp_analyze_bb_1): Refactor to make it work on sub-BBs.
>> >>> (vect_slp_bb): Likewise.
>> >>> * tree-vect-patterns.c (vect_same_loop_or_bb_p): Implement
>> >>> in terms of vect_stmt_in_region_p.
>> >>> (vect_pattern_recog): Iterate over the BB region.
>> >>> * tree-vect-stmts.c (vect_is_simple_use): Use
>> >>> vect_stmt_in_region_p.
>> >>> * tree-vectorizer.c (vect_stmt_in_region_p): New function.
>> >>> (pass_slp_vectorize::execute): Initialize all stmt UIDs to -1.
>> >>>
>> >>> * config/i386/i386.c: Include gimple-iterator.h.
>> >>> * config/aarch64/aarch64.c: Likewise.
>> >>>
>> >>> * gcc.dg/vect/bb-slp-38.c: New testcase.
>> >>>
>> >>> Index: gcc/tree-vectorizer.h
>> >>> ===================================================================
>> >>> *** gcc/tree-vectorizer.h.orig 2015-11-05 09:52:00.640227178 +0100
>> >>> --- gcc/tree-vectorizer.h 2015-11-05 13:20:58.385786476 +0100
>> >>> *************** nested_in_vect_loop_p (struct loop *loop
>> >>> *** 390,395 ****
>> >>> --- 390,397 ----
>> >>> typedef struct _bb_vec_info : public vec_info
>> >>> {
>> >>> basic_block bb;
>> >>> + gimple_stmt_iterator region_begin;
>> >>> + gimple_stmt_iterator region_end;
>> >>> } *bb_vec_info;
>> >>> #define BB_VINFO_BB(B) (B)->bb
>> >>> *************** void vect_pattern_recog (vec_info *);
>> >>> *** 1085,1089 ****
>> >>> --- 1087,1092 ----
>> >>> /* In tree-vectorizer.c. */
>> >>> unsigned vectorize_loops (void);
>> >>> void vect_destroy_datarefs (vec_info *);
>> >>> + bool vect_stmt_in_region_p (vec_info *, gimple *);
>> >>> #endif /* GCC_TREE_VECTORIZER_H */
>> >>> Index: gcc/tree-vect-slp.c
>> >>> ===================================================================
>> >>> *** gcc/tree-vect-slp.c.orig 2015-11-05 09:52:00.640227178 +0100
>> >>> --- gcc/tree-vect-slp.c 2015-11-06 10:22:56.707880233 +0100
>> >>> *************** vect_get_and_check_slp_defs (vec_info *v
>> >>> *** 209,215 ****
>> >>> unsigned int i, number_of_oprnds;
>> >>> gimple *def_stmt;
>> >>> enum vect_def_type dt = vect_uninitialized_def;
>> >>> - struct loop *loop = NULL;
>> >>> bool pattern = false;
>> >>> slp_oprnd_info oprnd_info;
>> >>> int first_op_idx = 1;
>> >>> --- 209,214 ----
>> >>> *************** vect_get_and_check_slp_defs (vec_info *v
>> >>> *** 218,226 ****
>> >>> bool first = stmt_num == 0;
>> >>> bool second = stmt_num == 1;
>> >>> - if (is_a <loop_vec_info> (vinfo))
>> >>> - loop = LOOP_VINFO_LOOP (as_a <loop_vec_info> (vinfo));
>> >>> -
>> >>> if (is_gimple_call (stmt))
>> >>> {
>> >>> number_of_oprnds = gimple_call_num_args (stmt);
>> >>> --- 217,222 ----
>> >>> *************** again:
>> >>> *** 276,286 ****
>> >>> from the pattern. Check that all the stmts of the node are
>> >>> in the
>> >>> pattern. */
>> >>> if (def_stmt && gimple_bb (def_stmt)
>> >>> ! && ((is_a <loop_vec_info> (vinfo)
>> >>> ! && flow_bb_inside_loop_p (loop, gimple_bb (def_stmt)))
>> >>> ! || (is_a <bb_vec_info> (vinfo)
>> >>> ! && gimple_bb (def_stmt) == as_a <bb_vec_info>
>> >>> (vinfo)->bb
>> >>> ! && gimple_code (def_stmt) != GIMPLE_PHI))
>> >>> && vinfo_for_stmt (def_stmt)
>> >>> && STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (def_stmt))
>> >>> && !STMT_VINFO_RELEVANT (vinfo_for_stmt (def_stmt))
>> >>> --- 272,278 ----
>> >>> from the pattern. Check that all the stmts of the node are
>> >>> in the
>> >>> pattern. */
>> >>> if (def_stmt && gimple_bb (def_stmt)
>> >>> ! && vect_stmt_in_region_p (vinfo, def_stmt)
>> >>> && vinfo_for_stmt (def_stmt)
>> >>> && STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (def_stmt))
>> >>> && !STMT_VINFO_RELEVANT (vinfo_for_stmt (def_stmt))
>> >>> *************** vect_detect_hybrid_slp (loop_vec_info lo
>> >>> *** 2076,2091 ****
>> >>> stmt_vec_info structs for all the stmts in it. */
>> >>> static bb_vec_info
>> >>> ! new_bb_vec_info (basic_block bb)
>> >>> {
>> >>> bb_vec_info res = NULL;
>> >>> gimple_stmt_iterator gsi;
>> >>> res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info));
>> >>> res->kind = vec_info::bb;
>> >>> BB_VINFO_BB (res) = bb;
>> >>> ! for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>> >>> {
>> >>> gimple *stmt = gsi_stmt (gsi);
>> >>> gimple_set_uid (stmt, 0);
>> >>> --- 2068,2088 ----
>> >>> stmt_vec_info structs for all the stmts in it. */
>> >>> static bb_vec_info
>> >>> ! new_bb_vec_info (gimple_stmt_iterator region_begin,
>> >>> ! gimple_stmt_iterator region_end)
>> >>> {
>> >>> + basic_block bb = gsi_bb (region_begin);
>> >>> bb_vec_info res = NULL;
>> >>> gimple_stmt_iterator gsi;
>> >>> res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info));
>> >>> res->kind = vec_info::bb;
>> >>> BB_VINFO_BB (res) = bb;
>> >>> + res->region_begin = region_begin;
>> >>> + res->region_end = region_end;
>> >>> ! for (gsi = region_begin; gsi_stmt (gsi) != gsi_stmt (region_end);
>> >>> ! gsi_next (&gsi))
>> >>> {
>> >>> gimple *stmt = gsi_stmt (gsi);
>> >>> gimple_set_uid (stmt, 0);
>> >>> *************** destroy_bb_vec_info (bb_vec_info bb_vinf
>> >>> *** 2118,2124 ****
>> >>> bb = BB_VINFO_BB (bb_vinfo);
>> >>> ! for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
>> >>> {
>> >>> gimple *stmt = gsi_stmt (si);
>> >>> stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>> >>> --- 2115,2122 ----
>> >>> bb = BB_VINFO_BB (bb_vinfo);
>> >>> ! for (si = bb_vinfo->region_begin;
>> >>> ! gsi_stmt (si) != gsi_stmt (bb_vinfo->region_end); gsi_next
>> >>> (&si))
>> >>> {
>> >>> gimple *stmt = gsi_stmt (si);
>> >>> stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
>> >>> *************** destroy_bb_vec_info (bb_vec_info bb_vinf
>> >>> *** 2126,2131 ****
>> >>> --- 2124,2132 ----
>> >>> if (stmt_info)
>> >>> /* Free stmt_vec_info. */
>> >>> free_stmt_vec_info (stmt);
>> >>> +
>> >>> + /* Reset region marker. */
>> >>> + gimple_set_uid (stmt, -1);
>> >>> }
>> >>> vect_destroy_datarefs (bb_vinfo);
>> >>> *************** vect_bb_slp_scalar_cost (basic_block bb,
>> >>> *** 2247,2254 ****
>> >>> gimple *use_stmt;
>> >>> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR
>> >>> (def_p))
>> >>> if (!is_gimple_debug (use_stmt)
>> >>> ! && (gimple_code (use_stmt) == GIMPLE_PHI
>> >>> ! || gimple_bb (use_stmt) != bb
>> >>> || !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt
>> >>> (use_stmt))))
>> >>> {
>> >>> (*life)[i] = true;
>> >>> --- 2248,2255 ----
>> >>> gimple *use_stmt;
>> >>> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR
>> >>> (def_p))
>> >>> if (!is_gimple_debug (use_stmt)
>> >>> ! && (! vect_stmt_in_region_p (vinfo_for_stmt
>> >>> (stmt)->vinfo,
>> >>> ! use_stmt)
>> >>> || !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt
>> >>> (use_stmt))))
>> >>> {
>> >>> (*life)[i] = true;
>> >>> *************** vect_bb_vectorization_profitable_p (bb_v
>> >>> *** 2327,2366 ****
>> >>> /* Check if the basic block can be vectorized. */
>> >>> static bb_vec_info
>> >>> ! vect_slp_analyze_bb_1 (basic_block bb)
>> >>> {
>> >>> bb_vec_info bb_vinfo;
>> >>> vec<slp_instance> slp_instances;
>> >>> slp_instance instance;
>> >>> int i;
>> >>> int min_vf = 2;
>> >>> - unsigned n_stmts = 0;
>> >>> ! bb_vinfo = new_bb_vec_info (bb);
>> >>> if (!bb_vinfo)
>> >>> return NULL;
>> >>> ! /* Gather all data references in the basic-block. */
>> >>> !
>> >>> ! for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
>> >>> ! !gsi_end_p (gsi); gsi_next (&gsi))
>> >>> ! {
>> >>> ! gimple *stmt = gsi_stmt (gsi);
>> >>> ! if (is_gimple_debug (stmt))
>> >>> ! continue;
>> >>> ! ++n_stmts;
>> >>> ! if (!find_data_references_in_stmt (NULL, stmt,
>> >>> ! &BB_VINFO_DATAREFS (bb_vinfo)))
>> >>> ! {
>> >>> ! /* Mark the rest of the basic-block as unvectorizable. */
>> >>> ! for (; !gsi_end_p (gsi); gsi_next (&gsi))
>> >>> ! {
>> >>> ! stmt = gsi_stmt (gsi);
>> >>> ! STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt)) = false;
>> >>> ! }
>> >>> ! break;
>> >>> ! }
>> >>> ! }
>> >>> /* Analyze the data references. */
>> >>> --- 2328,2358 ----
>> >>> /* Check if the basic block can be vectorized. */
>> >>> static bb_vec_info
>> >>> ! vect_slp_analyze_bb_1 (gimple_stmt_iterator region_begin,
>> >>> ! gimple_stmt_iterator region_end,
>> >>> ! vec<data_reference_p> datarefs, int n_stmts)
>> >>> {
>> >>> bb_vec_info bb_vinfo;
>> >>> vec<slp_instance> slp_instances;
>> >>> slp_instance instance;
>> >>> int i;
>> >>> int min_vf = 2;
>> >>> ! if (n_stmts > PARAM_VALUE (PARAM_SLP_MAX_INSNS_IN_BB))
>> >>> ! {
>> >>> ! if (dump_enabled_p ())
>> >>> ! dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> >>> ! "not vectorized: too many instructions in "
>> >>> ! "basic block.\n");
>> >>> ! free_data_refs (datarefs);
>> >>> ! return NULL;
>> >>> ! }
>> >>> !
>> >>> ! bb_vinfo = new_bb_vec_info (region_begin, region_end);
>> >>> if (!bb_vinfo)
>> >>> return NULL;
>> >>> ! BB_VINFO_DATAREFS (bb_vinfo) = datarefs;
>> >>> /* Analyze the data references. */
>> >>> *************** vect_slp_analyze_bb_1 (basic_block bb)
>> >>> *** 2438,2445 ****
>> >>> }
>> >>> /* Mark all the statements that we do not want to vectorize. */
>> >>> ! for (gimple_stmt_iterator gsi = gsi_start_bb (BB_VINFO_BB
>> >>> (bb_vinfo));
>> >>> ! !gsi_end_p (gsi); gsi_next (&gsi))
>> >>> {
>> >>> stmt_vec_info vinfo = vinfo_for_stmt (gsi_stmt (gsi));
>> >>> if (STMT_SLP_TYPE (vinfo) != pure_slp)
>> >>> --- 2430,2437 ----
>> >>> }
>> >>> /* Mark all the statements that we do not want to vectorize. */
>> >>> ! for (gimple_stmt_iterator gsi = bb_vinfo->region_begin;
>> >>> ! gsi_stmt (gsi) != gsi_stmt (bb_vinfo->region_end); gsi_next
>> >>> (&gsi))
>> >>> {
>> >>> stmt_vec_info vinfo = vinfo_for_stmt (gsi_stmt (gsi));
>> >>> if (STMT_SLP_TYPE (vinfo) != pure_slp)
>> >>> *************** bool
>> >>> *** 2509,2585 ****
>> >>> vect_slp_bb (basic_block bb)
>> >>> {
>> >>> bb_vec_info bb_vinfo;
>> >>> - int insns = 0;
>> >>> gimple_stmt_iterator gsi;
>> >>> unsigned int vector_sizes;
>> >>> if (dump_enabled_p ())
>> >>> dump_printf_loc (MSG_NOTE, vect_location,
>> >>> "===vect_slp_analyze_bb===\n");
>> >>> - for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>> >>> - {
>> >>> - gimple *stmt = gsi_stmt (gsi);
>> >>> - if (!is_gimple_debug (stmt)
>> >>> - && !gimple_nop_p (stmt)
>> >>> - && gimple_code (stmt) != GIMPLE_LABEL)
>> >>> - insns++;
>> >>> - if (gimple_location (stmt) != UNKNOWN_LOCATION)
>> >>> - vect_location = gimple_location (stmt);
>> >>> - }
>> >>> -
>> >>> - if (insns > PARAM_VALUE (PARAM_SLP_MAX_INSNS_IN_BB))
>> >>> - {
>> >>> - if (dump_enabled_p ())
>> >>> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>> >>> - "not vectorized: too many instructions in "
>> >>> - "basic block.\n");
>> >>> -
>> >>> - return false;
>> >>> - }
>> >>> -
>> >>> /* Autodetect first vector size we try. */
>> >>> current_vector_size = 0;
>> >>> vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
>> >>> while (1)
>> >>> {
>> >>> ! bb_vinfo = vect_slp_analyze_bb_1 (bb);
>> >>> ! if (bb_vinfo)
>> >>> {
>> >>> ! if (!dbg_cnt (vect_slp))
>> >>> ! {
>> >>> ! destroy_bb_vec_info (bb_vinfo);
>> >>> ! return false;
>> >>> ! }
>> >>> if (dump_enabled_p ())
>> >>> ! dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB\n");
>> >>> vect_schedule_slp (bb_vinfo);
>> >>> if (dump_enabled_p ())
>> >>> dump_printf_loc (MSG_NOTE, vect_location,
>> >>> ! "BASIC BLOCK VECTORIZED\n");
>> >>> destroy_bb_vec_info (bb_vinfo);
>> >>> ! return true;
>> >>> }
>> >>> ! destroy_bb_vec_info (bb_vinfo);
>> >>> vector_sizes &= ~current_vector_size;
>> >>> ! if (vector_sizes == 0
>> >>> ! || current_vector_size == 0)
>> >>> ! return false;
>> >>> ! /* Try the next biggest vector size. */
>> >>> ! current_vector_size = 1 << floor_log2 (vector_sizes);
>> >>> ! if (dump_enabled_p ())
>> >>> ! dump_printf_loc (MSG_NOTE, vect_location,
>> >>> ! "***** Re-trying analysis with "
>> >>> ! "vector size %d\n", current_vector_size);
>> >>> }
>> >>> }
>> >>> --- 2501,2605 ----
>> >>> vect_slp_bb (basic_block bb)
>> >>> {
>> >>> bb_vec_info bb_vinfo;
>> >>> gimple_stmt_iterator gsi;
>> >>> unsigned int vector_sizes;
>> >>> + bool any_vectorized = false;
>> >>> if (dump_enabled_p ())
>> >>> dump_printf_loc (MSG_NOTE, vect_location,
>> >>> "===vect_slp_analyze_bb===\n");
>> >>> /* Autodetect first vector size we try. */
>> >>> current_vector_size = 0;
>> >>> vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
>> >>> + gsi = gsi_start_bb (bb);
>> >>> +
>> >>> while (1)
>> >>> {
>> >>> ! if (gsi_end_p (gsi))
>> >>> ! break;
>> >>> !
>> >>> ! gimple_stmt_iterator region_begin = gsi;
>> >>> ! vec<data_reference_p> datarefs = vNULL;
>> >>> ! int insns = 0;
>> >>> !
>> >>> ! for (; !gsi_end_p (gsi); gsi_next (&gsi))
>> >>> {
>> >>> ! gimple *stmt = gsi_stmt (gsi);
>> >>> ! if (is_gimple_debug (stmt))
>> >>> ! continue;
>> >>> ! insns++;
>> >>> !
>> >>> ! if (gimple_location (stmt) != UNKNOWN_LOCATION)
>> >>> ! vect_location = gimple_location (stmt);
>> >>> !
>> >>> ! if (!find_data_references_in_stmt (NULL, stmt, &datarefs))
>> >>> ! break;
>> >>> ! }
>> >>> !
>> >>> ! /* Skip leading unhandled stmts. */
>> >>> ! if (gsi_stmt (region_begin) == gsi_stmt (gsi))
>> >>> ! {
>> >>> ! gsi_next (&gsi);
>> >>> ! continue;
>> >>> ! }
>> >>> !
>> >>> ! gimple_stmt_iterator region_end = gsi;
>> >>> + bool vectorized = false;
>> >>> + bb_vinfo = vect_slp_analyze_bb_1 (region_begin, region_end,
>> >>> + datarefs, insns);
>> >>> + if (bb_vinfo
>> >>> + && dbg_cnt (vect_slp))
>> >>> + {
>> >>> if (dump_enabled_p ())
>> >>> ! dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB
>> >>> part\n");
>> >>> vect_schedule_slp (bb_vinfo);
>> >>> if (dump_enabled_p ())
>> >>> dump_printf_loc (MSG_NOTE, vect_location,
>> >>> ! "basic block part vectorized\n");
>> >>> destroy_bb_vec_info (bb_vinfo);
>> >>> ! vectorized = true;
>> >>> }
>> >>> + else
>> >>> + destroy_bb_vec_info (bb_vinfo);
>> >>> ! any_vectorized |= vectorized;
>> >>> vector_sizes &= ~current_vector_size;
>> >>> ! if (vectorized
>> >>> ! || vector_sizes == 0
>> >>> ! || current_vector_size == 0)
>> >>> ! {
>> >>> ! if (gsi_end_p (region_end))
>> >>> ! break;
>> >>> !
>> >>> ! /* Skip the unhandled stmt. */
>> >>> ! gsi_next (&gsi);
>> >>> !
>> >>> ! /* And reset vector sizes. */
>> >>> ! current_vector_size = 0;
>> >>> ! vector_sizes = targetm.vectorize.autovectorize_vector_sizes ();
>> >>> ! }
>> >>> ! else
>> >>> ! {
>> >>> ! /* Try the next biggest vector size. */
>> >>> ! current_vector_size = 1 << floor_log2 (vector_sizes);
>> >>> ! if (dump_enabled_p ())
>> >>> ! dump_printf_loc (MSG_NOTE, vect_location,
>> >>> ! "***** Re-trying analysis with "
>> >>> ! "vector size %d\n", current_vector_size);
>> >>> ! /* Start over. */
>> >>> ! gsi = region_begin;
>> >>> ! }
>> >>> }
>> >>> +
>> >>> + return any_vectorized;
>> >>> }
>> >>> Index: gcc/tree-vect-patterns.c
>> >>> ===================================================================
>> >>> *** gcc/tree-vect-patterns.c.orig 2015-11-05 09:52:00.640227178
>> >>> +0100
>> >>> --- gcc/tree-vect-patterns.c 2015-11-05 13:25:46.060011765 +0100
>> >>> *************** static bool
>> >>> *** 107,133 ****
>> >>> vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2)
>> >>> {
>> >>> stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1);
>> >>> ! loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
>> >>> ! bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_vinfo);
>> >>> !
>> >>> ! if (!gimple_bb (stmt2))
>> >>> ! return false;
>> >>> !
>> >>> ! if (loop_vinfo)
>> >>> ! {
>> >>> ! struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>> >>> ! if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt2)))
>> >>> ! return false;
>> >>> ! }
>> >>> ! else
>> >>> ! {
>> >>> ! if (gimple_bb (stmt2) != BB_VINFO_BB (bb_vinfo)
>> >>> ! || gimple_code (stmt2) == GIMPLE_PHI)
>> >>> ! return false;
>> >>> ! }
>> >>> !
>> >>> ! gcc_assert (vinfo_for_stmt (stmt2));
>> >>> ! return true;
>> >>> }
>> >>> /* If the LHS of DEF_STMT has a single use, and that statement is
>> >>> --- 107,113 ----
>> >>> vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2)
>> >>> {
>> >>> stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1);
>> >>> ! return vect_stmt_in_region_p (stmt_vinfo->vinfo, stmt2);
>> >>> }
>> >>> /* If the LHS of DEF_STMT has a single use, and that statement is
>> >>> *************** vect_pattern_recog (vec_info *vinfo)
>> >>> *** 3611,3643 ****
>> >>> loop = LOOP_VINFO_LOOP (loop_vinfo);
>> >>> bbs = LOOP_VINFO_BBS (loop_vinfo);
>> >>> nbbs = loop->num_nodes;
>> >>> }
>> >>> else
>> >>> {
>> >>> ! bbs = &as_a <bb_vec_info> (vinfo)->bb;
>> >>> ! nbbs = 1;
>> >>> ! }
>> >>> !
>> >>> ! /* Scan through the loop stmts, applying the pattern recognition
>> >>> ! functions starting at each stmt visited: */
>> >>> ! for (i = 0; i < nbbs; i++)
>> >>> ! {
>> >>> ! basic_block bb = bbs[i];
>> >>> ! for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
>> >>> ! {
>> >>> ! if (is_a <bb_vec_info> (vinfo)
>> >>> ! && (stmt = gsi_stmt (si))
>> >>> && vinfo_for_stmt (stmt)
>> >>> && !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt)))
>> >>> ! continue;
>> >>> ! /* Scan over all generic vect_recog_xxx_pattern functions.
>> >>> */
>> >>> ! for (j = 0; j < NUM_PATTERNS; j++)
>> >>> ! {
>> >>> vect_recog_func = vect_vect_recog_func_ptrs[j];
>> >>> vect_pattern_recog_1 (vect_recog_func, si,
>> >>> &stmts_to_replace);
>> >>> ! }
>> >>> ! }
>> >>> }
>> >>> }
>> >>> --- 3591,3632 ----
>> >>> loop = LOOP_VINFO_LOOP (loop_vinfo);
>> >>> bbs = LOOP_VINFO_BBS (loop_vinfo);
>> >>> nbbs = loop->num_nodes;
>> >>> +
>> >>> + /* Scan through the loop stmts, applying the pattern recognition
>> >>> + functions starting at each stmt visited: */
>> >>> + for (i = 0; i < nbbs; i++)
>> >>> + {
>> >>> + basic_block bb = bbs[i];
>> >>> + for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si))
>> >>> + {
>> >>> + /* Scan over all generic vect_recog_xxx_pattern functions.
>> >>> */
>> >>> + for (j = 0; j < NUM_PATTERNS; j++)
>> >>> + {
>> >>> + vect_recog_func = vect_vect_recog_func_ptrs[j];
>> >>> + vect_pattern_recog_1 (vect_recog_func, si,
>> >>> + &stmts_to_replace);
>> >>> + }
>> >>> + }
>> >>> + }
>> >>> }
>> >>> else
>> >>> {
>> >>> ! bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo);
>> >>> ! for (si = bb_vinfo->region_begin;
>> >>> ! gsi_stmt (si) != gsi_stmt (bb_vinfo->region_end); gsi_next
>> >>> (&si))
>> >>> ! {
>> >>> ! if ((stmt = gsi_stmt (si))
>> >>> && vinfo_for_stmt (stmt)
>> >>> && !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt)))
>> >>> ! continue;
>> >>> ! /* Scan over all generic vect_recog_xxx_pattern functions. */
>> >>> ! for (j = 0; j < NUM_PATTERNS; j++)
>> >>> ! {
>> >>> vect_recog_func = vect_vect_recog_func_ptrs[j];
>> >>> vect_pattern_recog_1 (vect_recog_func, si,
>> >>> &stmts_to_replace);
>> >>> ! }
>> >>> ! }
>> >>> }
>> >>> }
>> >>> Index: gcc/config/i386/i386.c
>> >>> ===================================================================
>> >>> *** gcc/config/i386/i386.c.orig 2015-11-05 09:52:42.239687133 +0100
>> >>> --- gcc/config/i386/i386.c 2015-11-05 11:09:09.451774562 +0100
>> >>> *************** along with GCC; see the file COPYING3.
>> >>> *** 64,69 ****
>> >>> --- 64,70 ----
>> >>> #include "context.h"
>> >>> #include "pass_manager.h"
>> >>> #include "target-globals.h"
>> >>> + #include "gimple-iterator.h"
>> >>> #include "tree-vectorizer.h"
>> >>> #include "shrink-wrap.h"
>> >>> #include "builtins.h"
>> >>> Index: gcc/testsuite/gcc.dg/vect/bb-slp-38.c
>> >>> ===================================================================
>> >>> *** /dev/null 1970-01-01 00:00:00.000000000 +0000
>> >>> --- gcc/testsuite/gcc.dg/vect/bb-slp-38.c 2015-11-05
>> >>> 14:00:48.177644327 +0100
>> >>> ***************
>> >>> *** 0 ****
>> >>> --- 1,44 ----
>> >>> + /* { dg-require-effective-target vect_int } */
>> >>> +
>> >>> + #include "tree-vect.h"
>> >>> +
>> >>> + extern void abort (void);
>> >>> +
>> >>> + int a[8], b[8];
>> >>> + int x;
>> >>> +
>> >>> + void __attribute__((noinline,noclone))
>> >>> + bar (void)
>> >>> + {
>> >>> + x = 1;
>> >>> + }
>> >>> +
>> >>> + void __attribute__((noinline,noclone))
>> >>> + foo(void)
>> >>> + {
>> >>> + a[0] = b[0];
>> >>> + a[1] = b[0];
>> >>> + a[2] = b[3];
>> >>> + a[3] = b[3];
>> >>> + bar ();
>> >>> + a[4] = b[4];
>> >>> + a[5] = b[7];
>> >>> + a[6] = b[4];
>> >>> + a[7] = b[7];
>> >>> + }
>> >>> +
>> >>> + int main()
>> >>> + {
>> >>> + int i;
>> >>> + check_vect ();
>> >>> + for (i = 0; i < 8; ++i)
>> >>> + b[i] = i;
>> >>> + foo ();
>> >>> + if (a[0] != 0 || a[1] != 0 || a[2] != 3 || a[3] != 3
>> >>> + || a[4] != 4 || a[5] != 7 || a[6] != 4 || a[7] != 7)
>> >>> + abort ();
>> >>> + return 0;
>> >>> + }
>> >>> +
>> >>> + /* { dg-final { scan-tree-dump "basic block vectorized" "slp2" { target
>> >>> vect_perm } } } */
>> >>> + /* { dg-final { scan-tree-dump-times "basic block part vectorized" 2
>> >>> "slp2" { target vect_perm } } } */
>> >>> Index: gcc/tree-vect-stmts.c
>> >>> ===================================================================
>> >>> *** gcc/tree-vect-stmts.c.orig 2015-11-02 12:37:11.074249388 +0100
>> >>> --- gcc/tree-vect-stmts.c 2015-11-05 13:29:21.413423692 +0100
>> >>> *************** vect_is_simple_use (tree operand, vec_in
>> >>> *** 8196,8207 ****
>> >>> dump_gimple_stmt (MSG_NOTE, TDF_SLIM, *def_stmt, 0);
>> >>> }
>> >>> ! basic_block bb = gimple_bb (*def_stmt);
>> >>> ! if ((is_a <loop_vec_info> (vinfo)
>> >>> ! && !flow_bb_inside_loop_p (as_a <loop_vec_info> (vinfo)->loop,
>> >>> bb))
>> >>> ! || (is_a <bb_vec_info> (vinfo)
>> >>> ! && (bb != as_a <bb_vec_info> (vinfo)->bb
>> >>> ! || gimple_code (*def_stmt) == GIMPLE_PHI)))
>> >>> *dt = vect_external_def;
>> >>> else
>> >>> {
>> >>> --- 8196,8202 ----
>> >>> dump_gimple_stmt (MSG_NOTE, TDF_SLIM, *def_stmt, 0);
>> >>> }
>> >>> ! if (! vect_stmt_in_region_p (vinfo, *def_stmt))
>> >>> *dt = vect_external_def;
>> >>> else
>> >>> {
>> >>> Index: gcc/tree-vectorizer.c
>> >>> ===================================================================
>> >>> *** gcc/tree-vectorizer.c.orig 2015-11-04 09:23:53.724687806 +0100
>> >>> --- gcc/tree-vectorizer.c 2015-11-05 13:55:08.299817570 +0100
>> >>> *************** vect_destroy_datarefs (vec_info *vinfo)
>> >>> *** 350,355 ****
>> >>> --- 350,382 ----
>> >>> }
>> >>> + /* Return whether STMT is inside the region we try to vectorize.
>> >>> */
>> >>> +
>> >>> + bool
>> >>> + vect_stmt_in_region_p (vec_info *vinfo, gimple *stmt)
>> >>> + {
>> >>> + if (!gimple_bb (stmt))
>> >>> + return false;
>> >>> +
>> >>> + if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo))
>> >>> + {
>> >>> + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
>> >>> + if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt)))
>> >>> + return false;
>> >>> + }
>> >>> + else
>> >>> + {
>> >>> + bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo);
>> >>> + if (gimple_bb (stmt) != BB_VINFO_BB (bb_vinfo)
>> >>> + || gimple_uid (stmt) == -1U
>> >>> + || gimple_code (stmt) == GIMPLE_PHI)
>> >>> + return false;
>> >>> + }
>> >>> +
>> >>> + return true;
>> >>> + }
>> >>> +
>> >>> +
>> >>> /* If LOOP has been versioned during ifcvt, return the internal call
>> >>> guarding it. */
>> >>> *************** pass_slp_vectorize::execute (function *f
>> >>> *** 692,697 ****
>> >>> --- 719,732 ----
>> >>> scev_initialize ();
>> >>> }
>> >>> + /* Mark all stmts as not belonging to the current region. */
>> >>> + FOR_EACH_BB_FN (bb, fun)
>> >>> + {
>> >>> + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p
>> >>> (gsi);
>> >>> + gsi_next (&gsi))
>> >>> + gimple_set_uid (gsi_stmt (gsi), -1);
>> >>> + }
>> >>> +
>> >>> init_stmt_vec_info_vec ();
>> >>> FOR_EACH_BB_FN (bb, fun)
>> >>> Index: gcc/config/aarch64/aarch64.c
>> >>> ===================================================================
>> >>> *** gcc/config/aarch64/aarch64.c.orig 2015-10-28 11:22:25.290823112
>> >>> +0100
>> >>> --- gcc/config/aarch64/aarch64.c 2015-11-06 10:24:21.539818027
>> >>> +0100
>> >>> ***************
>> >>> *** 52,57 ****
>> >>> --- 52,58 ----
>> >>> #include "params.h"
>> >>> #include "gimplify.h"
>> >>> #include "dwarf2.h"
>> >>> + #include "gimple-iterator.h"
>> >>> #include "tree-vectorizer.h"
>> >>> #include "aarch64-cost-tables.h"
>> >>> #include "dumpfile.h"
>> >>>
>> >
>>
>>
>
> --
> Richard Biener <rguenther@suse.de>
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-11-10 15:20 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-06 9:43 [PATCH] Make BB vectorizer work on sub-BBs Richard Biener
2015-11-06 11:10 ` Richard Biener
2015-11-06 11:12 ` Kyrill Tkachov
2015-11-06 11:27 ` Richard Biener
2015-11-10 12:56 ` Christophe Lyon
2015-11-10 13:03 ` Richard Biener
2015-11-10 15:20 ` Christophe Lyon
2015-11-06 16:13 ` Jeff Law
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).