* [PATCH] Make BB vectorizer work on sub-BBs @ 2015-11-06 9:43 Richard Biener 2015-11-06 11:10 ` Richard Biener 0 siblings, 1 reply; 8+ messages in thread From: Richard Biener @ 2015-11-06 9:43 UTC (permalink / raw) To: gcc-patches The following patch makes the BB vectorizer not only handle BB heads (until the first stmt with a data reference it cannot handle) but arbitrary regions in a BB separated by such stmts. This improves the number of BB vectorizations from 469 to 556 in a quick test on SPEC CPU 2006 with -Ofast on x86_64 and 1x400.perlbench 1x410.bwaves 1x416.gamess 1x450.soplex 1x453.povray 1x481.wrf failing both patched and unpatched (have to update my config used for such experiments it seems ...) Bootstrapped and tested on x86_64-unknown-linux-gnu, aarch64 cross built. I'm currently re-testing for a cosmetic change I made when writing the changelog. I expected (and there are) some issues with compile-time. Left is unpatched and right is patched. '403.gcc': 00:00:54 (54) | '403.gcc': 00:00:55 (55) '483.xalancbmk': 00:02:20 (140) | '483.xalancbmk': 00:02:24 (144) '416.gamess': 00:02:36 (156) | '416.gamess': 00:02:37 (157) '435.gromacs': 00:00:18 (18) | '435.gromacs': 00:00:19 (19) '447.dealII': 00:01:31 (91) | '447.dealII': 00:01:33 (93) '453.povray': 00:04:54 (294) | '453.povray': 00:08:54 (534) '454.calculix': 00:00:34 (34) | '454.calculix': 00:00:52 (52) '481.wrf': 00:01:57 (117) | '481.wrf': 00:01:59 (119) other benchmarks are unchanged. I'm double-checking now that a followup patch I have which re-implements BB vectorization dependence checking fixes this (that's the only quadraticness I know of). Richard. 2015-11-06 Richard Biener <rguenther@suse.de> * tree-vectorizer.h (struct _bb_vec_info): Add region_begin/end members. (vect_stmt_in_region_p): Declare. * tree-vect-slp.c (new_bb_vec_info): Work on a region. (destroy_bb_vec_info): Likewise. (vect_bb_slp_scalar_cost): Use vect_stmt_in_region_p. (vect_get_and_check_slp_defs): Likewise. (vect_slp_analyze_bb_1): Refactor to make it work on sub-BBs. (vect_slp_bb): Likewise. * tree-vect-patterns.c (vect_same_loop_or_bb_p): Implement in terms of vect_stmt_in_region_p. (vect_pattern_recog): Iterate over the BB region. * tree-vect-stmts.c (vect_is_simple_use): Use vect_stmt_in_region_p. * tree-vectorizer.c (vect_stmt_in_region_p): New function. (pass_slp_vectorize::execute): Initialize all stmt UIDs to -1. * config/i386/i386.c: Include gimple-iterator.h. * config/aarch64/aarch64.c: Likewise. * gcc.dg/vect/bb-slp-38.c: New testcase. Index: gcc/tree-vectorizer.h =================================================================== *** gcc/tree-vectorizer.h.orig 2015-11-05 09:52:00.640227178 +0100 --- gcc/tree-vectorizer.h 2015-11-05 13:20:58.385786476 +0100 *************** nested_in_vect_loop_p (struct loop *loop *** 390,395 **** --- 390,397 ---- typedef struct _bb_vec_info : public vec_info { basic_block bb; + gimple_stmt_iterator region_begin; + gimple_stmt_iterator region_end; } *bb_vec_info; #define BB_VINFO_BB(B) (B)->bb *************** void vect_pattern_recog (vec_info *); *** 1085,1089 **** --- 1087,1092 ---- /* In tree-vectorizer.c. */ unsigned vectorize_loops (void); void vect_destroy_datarefs (vec_info *); + bool vect_stmt_in_region_p (vec_info *, gimple *); #endif /* GCC_TREE_VECTORIZER_H */ Index: gcc/tree-vect-slp.c =================================================================== *** gcc/tree-vect-slp.c.orig 2015-11-05 09:52:00.640227178 +0100 --- gcc/tree-vect-slp.c 2015-11-06 10:22:56.707880233 +0100 *************** vect_get_and_check_slp_defs (vec_info *v *** 209,215 **** unsigned int i, number_of_oprnds; gimple *def_stmt; enum vect_def_type dt = vect_uninitialized_def; - struct loop *loop = NULL; bool pattern = false; slp_oprnd_info oprnd_info; int first_op_idx = 1; --- 209,214 ---- *************** vect_get_and_check_slp_defs (vec_info *v *** 218,226 **** bool first = stmt_num == 0; bool second = stmt_num == 1; - if (is_a <loop_vec_info> (vinfo)) - loop = LOOP_VINFO_LOOP (as_a <loop_vec_info> (vinfo)); - if (is_gimple_call (stmt)) { number_of_oprnds = gimple_call_num_args (stmt); --- 217,222 ---- *************** again: *** 276,286 **** from the pattern. Check that all the stmts of the node are in the pattern. */ if (def_stmt && gimple_bb (def_stmt) ! && ((is_a <loop_vec_info> (vinfo) ! && flow_bb_inside_loop_p (loop, gimple_bb (def_stmt))) ! || (is_a <bb_vec_info> (vinfo) ! && gimple_bb (def_stmt) == as_a <bb_vec_info> (vinfo)->bb ! && gimple_code (def_stmt) != GIMPLE_PHI)) && vinfo_for_stmt (def_stmt) && STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (def_stmt)) && !STMT_VINFO_RELEVANT (vinfo_for_stmt (def_stmt)) --- 272,278 ---- from the pattern. Check that all the stmts of the node are in the pattern. */ if (def_stmt && gimple_bb (def_stmt) ! && vect_stmt_in_region_p (vinfo, def_stmt) && vinfo_for_stmt (def_stmt) && STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (def_stmt)) && !STMT_VINFO_RELEVANT (vinfo_for_stmt (def_stmt)) *************** vect_detect_hybrid_slp (loop_vec_info lo *** 2076,2091 **** stmt_vec_info structs for all the stmts in it. */ static bb_vec_info ! new_bb_vec_info (basic_block bb) { bb_vec_info res = NULL; gimple_stmt_iterator gsi; res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info)); res->kind = vec_info::bb; BB_VINFO_BB (res) = bb; ! for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) { gimple *stmt = gsi_stmt (gsi); gimple_set_uid (stmt, 0); --- 2068,2088 ---- stmt_vec_info structs for all the stmts in it. */ static bb_vec_info ! new_bb_vec_info (gimple_stmt_iterator region_begin, ! gimple_stmt_iterator region_end) { + basic_block bb = gsi_bb (region_begin); bb_vec_info res = NULL; gimple_stmt_iterator gsi; res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info)); res->kind = vec_info::bb; BB_VINFO_BB (res) = bb; + res->region_begin = region_begin; + res->region_end = region_end; ! for (gsi = region_begin; gsi_stmt (gsi) != gsi_stmt (region_end); ! gsi_next (&gsi)) { gimple *stmt = gsi_stmt (gsi); gimple_set_uid (stmt, 0); *************** destroy_bb_vec_info (bb_vec_info bb_vinf *** 2118,2124 **** bb = BB_VINFO_BB (bb_vinfo); ! for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si)) { gimple *stmt = gsi_stmt (si); stmt_vec_info stmt_info = vinfo_for_stmt (stmt); --- 2115,2122 ---- bb = BB_VINFO_BB (bb_vinfo); ! for (si = bb_vinfo->region_begin; ! gsi_stmt (si) != gsi_stmt (bb_vinfo->region_end); gsi_next (&si)) { gimple *stmt = gsi_stmt (si); stmt_vec_info stmt_info = vinfo_for_stmt (stmt); *************** destroy_bb_vec_info (bb_vec_info bb_vinf *** 2126,2131 **** --- 2124,2132 ---- if (stmt_info) /* Free stmt_vec_info. */ free_stmt_vec_info (stmt); + + /* Reset region marker. */ + gimple_set_uid (stmt, -1); } vect_destroy_datarefs (bb_vinfo); *************** vect_bb_slp_scalar_cost (basic_block bb, *** 2247,2254 **** gimple *use_stmt; FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p)) if (!is_gimple_debug (use_stmt) ! && (gimple_code (use_stmt) == GIMPLE_PHI ! || gimple_bb (use_stmt) != bb || !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (use_stmt)))) { (*life)[i] = true; --- 2248,2255 ---- gimple *use_stmt; FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p)) if (!is_gimple_debug (use_stmt) ! && (! vect_stmt_in_region_p (vinfo_for_stmt (stmt)->vinfo, ! use_stmt) || !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (use_stmt)))) { (*life)[i] = true; *************** vect_bb_vectorization_profitable_p (bb_v *** 2327,2366 **** /* Check if the basic block can be vectorized. */ static bb_vec_info ! vect_slp_analyze_bb_1 (basic_block bb) { bb_vec_info bb_vinfo; vec<slp_instance> slp_instances; slp_instance instance; int i; int min_vf = 2; - unsigned n_stmts = 0; ! bb_vinfo = new_bb_vec_info (bb); if (!bb_vinfo) return NULL; ! /* Gather all data references in the basic-block. */ ! ! for (gimple_stmt_iterator gsi = gsi_start_bb (bb); ! !gsi_end_p (gsi); gsi_next (&gsi)) ! { ! gimple *stmt = gsi_stmt (gsi); ! if (is_gimple_debug (stmt)) ! continue; ! ++n_stmts; ! if (!find_data_references_in_stmt (NULL, stmt, ! &BB_VINFO_DATAREFS (bb_vinfo))) ! { ! /* Mark the rest of the basic-block as unvectorizable. */ ! for (; !gsi_end_p (gsi); gsi_next (&gsi)) ! { ! stmt = gsi_stmt (gsi); ! STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt)) = false; ! } ! break; ! } ! } /* Analyze the data references. */ --- 2328,2358 ---- /* Check if the basic block can be vectorized. */ static bb_vec_info ! vect_slp_analyze_bb_1 (gimple_stmt_iterator region_begin, ! gimple_stmt_iterator region_end, ! vec<data_reference_p> datarefs, int n_stmts) { bb_vec_info bb_vinfo; vec<slp_instance> slp_instances; slp_instance instance; int i; int min_vf = 2; ! if (n_stmts > PARAM_VALUE (PARAM_SLP_MAX_INSNS_IN_BB)) ! { ! if (dump_enabled_p ()) ! dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, ! "not vectorized: too many instructions in " ! "basic block.\n"); ! free_data_refs (datarefs); ! return NULL; ! } ! ! bb_vinfo = new_bb_vec_info (region_begin, region_end); if (!bb_vinfo) return NULL; ! BB_VINFO_DATAREFS (bb_vinfo) = datarefs; /* Analyze the data references. */ *************** vect_slp_analyze_bb_1 (basic_block bb) *** 2438,2445 **** } /* Mark all the statements that we do not want to vectorize. */ ! for (gimple_stmt_iterator gsi = gsi_start_bb (BB_VINFO_BB (bb_vinfo)); ! !gsi_end_p (gsi); gsi_next (&gsi)) { stmt_vec_info vinfo = vinfo_for_stmt (gsi_stmt (gsi)); if (STMT_SLP_TYPE (vinfo) != pure_slp) --- 2430,2437 ---- } /* Mark all the statements that we do not want to vectorize. */ ! for (gimple_stmt_iterator gsi = bb_vinfo->region_begin; ! gsi_stmt (gsi) != gsi_stmt (bb_vinfo->region_end); gsi_next (&gsi)) { stmt_vec_info vinfo = vinfo_for_stmt (gsi_stmt (gsi)); if (STMT_SLP_TYPE (vinfo) != pure_slp) *************** bool *** 2509,2585 **** vect_slp_bb (basic_block bb) { bb_vec_info bb_vinfo; - int insns = 0; gimple_stmt_iterator gsi; unsigned int vector_sizes; if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "===vect_slp_analyze_bb===\n"); - for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) - { - gimple *stmt = gsi_stmt (gsi); - if (!is_gimple_debug (stmt) - && !gimple_nop_p (stmt) - && gimple_code (stmt) != GIMPLE_LABEL) - insns++; - if (gimple_location (stmt) != UNKNOWN_LOCATION) - vect_location = gimple_location (stmt); - } - - if (insns > PARAM_VALUE (PARAM_SLP_MAX_INSNS_IN_BB)) - { - if (dump_enabled_p ()) - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, - "not vectorized: too many instructions in " - "basic block.\n"); - - return false; - } - /* Autodetect first vector size we try. */ current_vector_size = 0; vector_sizes = targetm.vectorize.autovectorize_vector_sizes (); while (1) { ! bb_vinfo = vect_slp_analyze_bb_1 (bb); ! if (bb_vinfo) { ! if (!dbg_cnt (vect_slp)) ! { ! destroy_bb_vec_info (bb_vinfo); ! return false; ! } if (dump_enabled_p ()) ! dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB\n"); vect_schedule_slp (bb_vinfo); if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, ! "BASIC BLOCK VECTORIZED\n"); destroy_bb_vec_info (bb_vinfo); ! return true; } ! destroy_bb_vec_info (bb_vinfo); vector_sizes &= ~current_vector_size; ! if (vector_sizes == 0 ! || current_vector_size == 0) ! return false; ! /* Try the next biggest vector size. */ ! current_vector_size = 1 << floor_log2 (vector_sizes); ! if (dump_enabled_p ()) ! dump_printf_loc (MSG_NOTE, vect_location, ! "***** Re-trying analysis with " ! "vector size %d\n", current_vector_size); } } --- 2501,2605 ---- vect_slp_bb (basic_block bb) { bb_vec_info bb_vinfo; gimple_stmt_iterator gsi; unsigned int vector_sizes; + bool any_vectorized = false; if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, "===vect_slp_analyze_bb===\n"); /* Autodetect first vector size we try. */ current_vector_size = 0; vector_sizes = targetm.vectorize.autovectorize_vector_sizes (); + gsi = gsi_start_bb (bb); + while (1) { ! if (gsi_end_p (gsi)) ! break; ! ! gimple_stmt_iterator region_begin = gsi; ! vec<data_reference_p> datarefs = vNULL; ! int insns = 0; ! ! for (; !gsi_end_p (gsi); gsi_next (&gsi)) { ! gimple *stmt = gsi_stmt (gsi); ! if (is_gimple_debug (stmt)) ! continue; ! insns++; ! ! if (gimple_location (stmt) != UNKNOWN_LOCATION) ! vect_location = gimple_location (stmt); ! ! if (!find_data_references_in_stmt (NULL, stmt, &datarefs)) ! break; ! } ! ! /* Skip leading unhandled stmts. */ ! if (gsi_stmt (region_begin) == gsi_stmt (gsi)) ! { ! gsi_next (&gsi); ! continue; ! } ! ! gimple_stmt_iterator region_end = gsi; + bool vectorized = false; + bb_vinfo = vect_slp_analyze_bb_1 (region_begin, region_end, + datarefs, insns); + if (bb_vinfo + && dbg_cnt (vect_slp)) + { if (dump_enabled_p ()) ! dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB part\n"); vect_schedule_slp (bb_vinfo); if (dump_enabled_p ()) dump_printf_loc (MSG_NOTE, vect_location, ! "basic block part vectorized\n"); destroy_bb_vec_info (bb_vinfo); ! vectorized = true; } + else + destroy_bb_vec_info (bb_vinfo); ! any_vectorized |= vectorized; vector_sizes &= ~current_vector_size; ! if (vectorized ! || vector_sizes == 0 ! || current_vector_size == 0) ! { ! if (gsi_end_p (region_end)) ! break; ! ! /* Skip the unhandled stmt. */ ! gsi_next (&gsi); ! ! /* And reset vector sizes. */ ! current_vector_size = 0; ! vector_sizes = targetm.vectorize.autovectorize_vector_sizes (); ! } ! else ! { ! /* Try the next biggest vector size. */ ! current_vector_size = 1 << floor_log2 (vector_sizes); ! if (dump_enabled_p ()) ! dump_printf_loc (MSG_NOTE, vect_location, ! "***** Re-trying analysis with " ! "vector size %d\n", current_vector_size); ! /* Start over. */ ! gsi = region_begin; ! } } + + return any_vectorized; } Index: gcc/tree-vect-patterns.c =================================================================== *** gcc/tree-vect-patterns.c.orig 2015-11-05 09:52:00.640227178 +0100 --- gcc/tree-vect-patterns.c 2015-11-05 13:25:46.060011765 +0100 *************** static bool *** 107,133 **** vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2) { stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1); ! loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo); ! bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_vinfo); ! ! if (!gimple_bb (stmt2)) ! return false; ! ! if (loop_vinfo) ! { ! struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); ! if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt2))) ! return false; ! } ! else ! { ! if (gimple_bb (stmt2) != BB_VINFO_BB (bb_vinfo) ! || gimple_code (stmt2) == GIMPLE_PHI) ! return false; ! } ! ! gcc_assert (vinfo_for_stmt (stmt2)); ! return true; } /* If the LHS of DEF_STMT has a single use, and that statement is --- 107,113 ---- vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2) { stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1); ! return vect_stmt_in_region_p (stmt_vinfo->vinfo, stmt2); } /* If the LHS of DEF_STMT has a single use, and that statement is *************** vect_pattern_recog (vec_info *vinfo) *** 3611,3643 **** loop = LOOP_VINFO_LOOP (loop_vinfo); bbs = LOOP_VINFO_BBS (loop_vinfo); nbbs = loop->num_nodes; } else { ! bbs = &as_a <bb_vec_info> (vinfo)->bb; ! nbbs = 1; ! } ! ! /* Scan through the loop stmts, applying the pattern recognition ! functions starting at each stmt visited: */ ! for (i = 0; i < nbbs; i++) ! { ! basic_block bb = bbs[i]; ! for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si)) ! { ! if (is_a <bb_vec_info> (vinfo) ! && (stmt = gsi_stmt (si)) && vinfo_for_stmt (stmt) && !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt))) ! continue; ! /* Scan over all generic vect_recog_xxx_pattern functions. */ ! for (j = 0; j < NUM_PATTERNS; j++) ! { vect_recog_func = vect_vect_recog_func_ptrs[j]; vect_pattern_recog_1 (vect_recog_func, si, &stmts_to_replace); ! } ! } } } --- 3591,3632 ---- loop = LOOP_VINFO_LOOP (loop_vinfo); bbs = LOOP_VINFO_BBS (loop_vinfo); nbbs = loop->num_nodes; + + /* Scan through the loop stmts, applying the pattern recognition + functions starting at each stmt visited: */ + for (i = 0; i < nbbs; i++) + { + basic_block bb = bbs[i]; + for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si)) + { + /* Scan over all generic vect_recog_xxx_pattern functions. */ + for (j = 0; j < NUM_PATTERNS; j++) + { + vect_recog_func = vect_vect_recog_func_ptrs[j]; + vect_pattern_recog_1 (vect_recog_func, si, + &stmts_to_replace); + } + } + } } else { ! bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo); ! for (si = bb_vinfo->region_begin; ! gsi_stmt (si) != gsi_stmt (bb_vinfo->region_end); gsi_next (&si)) ! { ! if ((stmt = gsi_stmt (si)) && vinfo_for_stmt (stmt) && !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt))) ! continue; ! /* Scan over all generic vect_recog_xxx_pattern functions. */ ! for (j = 0; j < NUM_PATTERNS; j++) ! { vect_recog_func = vect_vect_recog_func_ptrs[j]; vect_pattern_recog_1 (vect_recog_func, si, &stmts_to_replace); ! } ! } } } Index: gcc/config/i386/i386.c =================================================================== *** gcc/config/i386/i386.c.orig 2015-11-05 09:52:42.239687133 +0100 --- gcc/config/i386/i386.c 2015-11-05 11:09:09.451774562 +0100 *************** along with GCC; see the file COPYING3. *** 64,69 **** --- 64,70 ---- #include "context.h" #include "pass_manager.h" #include "target-globals.h" + #include "gimple-iterator.h" #include "tree-vectorizer.h" #include "shrink-wrap.h" #include "builtins.h" Index: gcc/testsuite/gcc.dg/vect/bb-slp-38.c =================================================================== *** /dev/null 1970-01-01 00:00:00.000000000 +0000 --- gcc/testsuite/gcc.dg/vect/bb-slp-38.c 2015-11-05 14:00:48.177644327 +0100 *************** *** 0 **** --- 1,44 ---- + /* { dg-require-effective-target vect_int } */ + + #include "tree-vect.h" + + extern void abort (void); + + int a[8], b[8]; + int x; + + void __attribute__((noinline,noclone)) + bar (void) + { + x = 1; + } + + void __attribute__((noinline,noclone)) + foo(void) + { + a[0] = b[0]; + a[1] = b[0]; + a[2] = b[3]; + a[3] = b[3]; + bar (); + a[4] = b[4]; + a[5] = b[7]; + a[6] = b[4]; + a[7] = b[7]; + } + + int main() + { + int i; + check_vect (); + for (i = 0; i < 8; ++i) + b[i] = i; + foo (); + if (a[0] != 0 || a[1] != 0 || a[2] != 3 || a[3] != 3 + || a[4] != 4 || a[5] != 7 || a[6] != 4 || a[7] != 7) + abort (); + return 0; + } + + /* { dg-final { scan-tree-dump "basic block vectorized" "slp2" { target vect_perm } } } */ + /* { dg-final { scan-tree-dump-times "basic block part vectorized" 2 "slp2" { target vect_perm } } } */ Index: gcc/tree-vect-stmts.c =================================================================== *** gcc/tree-vect-stmts.c.orig 2015-11-02 12:37:11.074249388 +0100 --- gcc/tree-vect-stmts.c 2015-11-05 13:29:21.413423692 +0100 *************** vect_is_simple_use (tree operand, vec_in *** 8196,8207 **** dump_gimple_stmt (MSG_NOTE, TDF_SLIM, *def_stmt, 0); } ! basic_block bb = gimple_bb (*def_stmt); ! if ((is_a <loop_vec_info> (vinfo) ! && !flow_bb_inside_loop_p (as_a <loop_vec_info> (vinfo)->loop, bb)) ! || (is_a <bb_vec_info> (vinfo) ! && (bb != as_a <bb_vec_info> (vinfo)->bb ! || gimple_code (*def_stmt) == GIMPLE_PHI))) *dt = vect_external_def; else { --- 8196,8202 ---- dump_gimple_stmt (MSG_NOTE, TDF_SLIM, *def_stmt, 0); } ! if (! vect_stmt_in_region_p (vinfo, *def_stmt)) *dt = vect_external_def; else { Index: gcc/tree-vectorizer.c =================================================================== *** gcc/tree-vectorizer.c.orig 2015-11-04 09:23:53.724687806 +0100 --- gcc/tree-vectorizer.c 2015-11-05 13:55:08.299817570 +0100 *************** vect_destroy_datarefs (vec_info *vinfo) *** 350,355 **** --- 350,382 ---- } + /* Return whether STMT is inside the region we try to vectorize. */ + + bool + vect_stmt_in_region_p (vec_info *vinfo, gimple *stmt) + { + if (!gimple_bb (stmt)) + return false; + + if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo)) + { + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); + if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt))) + return false; + } + else + { + bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo); + if (gimple_bb (stmt) != BB_VINFO_BB (bb_vinfo) + || gimple_uid (stmt) == -1U + || gimple_code (stmt) == GIMPLE_PHI) + return false; + } + + return true; + } + + /* If LOOP has been versioned during ifcvt, return the internal call guarding it. */ *************** pass_slp_vectorize::execute (function *f *** 692,697 **** --- 719,732 ---- scev_initialize (); } + /* Mark all stmts as not belonging to the current region. */ + FOR_EACH_BB_FN (bb, fun) + { + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi); + gsi_next (&gsi)) + gimple_set_uid (gsi_stmt (gsi), -1); + } + init_stmt_vec_info_vec (); FOR_EACH_BB_FN (bb, fun) Index: gcc/config/aarch64/aarch64.c =================================================================== *** gcc/config/aarch64/aarch64.c.orig 2015-10-28 11:22:25.290823112 +0100 --- gcc/config/aarch64/aarch64.c 2015-11-06 10:24:21.539818027 +0100 *************** *** 52,57 **** --- 52,58 ---- #include "params.h" #include "gimplify.h" #include "dwarf2.h" + #include "gimple-iterator.h" #include "tree-vectorizer.h" #include "aarch64-cost-tables.h" #include "dumpfile.h" ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Make BB vectorizer work on sub-BBs 2015-11-06 9:43 [PATCH] Make BB vectorizer work on sub-BBs Richard Biener @ 2015-11-06 11:10 ` Richard Biener 2015-11-06 11:12 ` Kyrill Tkachov 2015-11-06 16:13 ` Jeff Law 0 siblings, 2 replies; 8+ messages in thread From: Richard Biener @ 2015-11-06 11:10 UTC (permalink / raw) To: gcc-patches On Fri, 6 Nov 2015, Richard Biener wrote: > > The following patch makes the BB vectorizer not only handle BB heads > (until the first stmt with a data reference it cannot handle) but > arbitrary regions in a BB separated by such stmts. > > This improves the number of BB vectorizations from 469 to 556 > in a quick test on SPEC CPU 2006 with -Ofast on x86_64 and > 1x400.perlbench 1x410.bwaves 1x416.gamess 1x450.soplex 1x453.povray > 1x481.wrf failing both patched and unpatched (have to update my > config used for such experiments it seems ...) > > Bootstrapped and tested on x86_64-unknown-linux-gnu, aarch64 cross built. > > I'm currently re-testing for a cosmetic change I made when writing > the changelog. > > I expected (and there are) some issues with compile-time. Left > is unpatched and right is patched. > > '403.gcc': 00:00:54 (54) | '403.gcc': 00:00:55 (55) > '483.xalancbmk': 00:02:20 (140) | '483.xalancbmk': 00:02:24 (144) > '416.gamess': 00:02:36 (156) | '416.gamess': 00:02:37 (157) > '435.gromacs': 00:00:18 (18) | '435.gromacs': 00:00:19 (19) > '447.dealII': 00:01:31 (91) | '447.dealII': 00:01:33 (93) > '453.povray': 00:04:54 (294) | '453.povray': 00:08:54 (534) > '454.calculix': 00:00:34 (34) | '454.calculix': 00:00:52 (52) > '481.wrf': 00:01:57 (117) | '481.wrf': 00:01:59 (119) > > other benchmarks are unchanged. I'm double-checking now that a followup > patch I have which re-implements BB vectorization dependence checking > fixes this (that's the only quadraticness I know of). Fixes all but '453.povray': 00:04:54 (294) | '453.povray': 00:06:46 (406) it even improves compile-time on some: '464.h264ref': 00:00:26 (26) | '464.h264ref': 00:00:21 (21) it also increases the number of vectorized BBs to 722. Needs some work still though. Richard. > Richard. > > 2015-11-06 Richard Biener <rguenther@suse.de> > > * tree-vectorizer.h (struct _bb_vec_info): Add region_begin/end > members. > (vect_stmt_in_region_p): Declare. > * tree-vect-slp.c (new_bb_vec_info): Work on a region. > (destroy_bb_vec_info): Likewise. > (vect_bb_slp_scalar_cost): Use vect_stmt_in_region_p. > (vect_get_and_check_slp_defs): Likewise. > (vect_slp_analyze_bb_1): Refactor to make it work on sub-BBs. > (vect_slp_bb): Likewise. > * tree-vect-patterns.c (vect_same_loop_or_bb_p): Implement > in terms of vect_stmt_in_region_p. > (vect_pattern_recog): Iterate over the BB region. > * tree-vect-stmts.c (vect_is_simple_use): Use vect_stmt_in_region_p. > * tree-vectorizer.c (vect_stmt_in_region_p): New function. > (pass_slp_vectorize::execute): Initialize all stmt UIDs to -1. > > * config/i386/i386.c: Include gimple-iterator.h. > * config/aarch64/aarch64.c: Likewise. > > * gcc.dg/vect/bb-slp-38.c: New testcase. > > Index: gcc/tree-vectorizer.h > =================================================================== > *** gcc/tree-vectorizer.h.orig 2015-11-05 09:52:00.640227178 +0100 > --- gcc/tree-vectorizer.h 2015-11-05 13:20:58.385786476 +0100 > *************** nested_in_vect_loop_p (struct loop *loop > *** 390,395 **** > --- 390,397 ---- > typedef struct _bb_vec_info : public vec_info > { > basic_block bb; > + gimple_stmt_iterator region_begin; > + gimple_stmt_iterator region_end; > } *bb_vec_info; > > #define BB_VINFO_BB(B) (B)->bb > *************** void vect_pattern_recog (vec_info *); > *** 1085,1089 **** > --- 1087,1092 ---- > /* In tree-vectorizer.c. */ > unsigned vectorize_loops (void); > void vect_destroy_datarefs (vec_info *); > + bool vect_stmt_in_region_p (vec_info *, gimple *); > > #endif /* GCC_TREE_VECTORIZER_H */ > Index: gcc/tree-vect-slp.c > =================================================================== > *** gcc/tree-vect-slp.c.orig 2015-11-05 09:52:00.640227178 +0100 > --- gcc/tree-vect-slp.c 2015-11-06 10:22:56.707880233 +0100 > *************** vect_get_and_check_slp_defs (vec_info *v > *** 209,215 **** > unsigned int i, number_of_oprnds; > gimple *def_stmt; > enum vect_def_type dt = vect_uninitialized_def; > - struct loop *loop = NULL; > bool pattern = false; > slp_oprnd_info oprnd_info; > int first_op_idx = 1; > --- 209,214 ---- > *************** vect_get_and_check_slp_defs (vec_info *v > *** 218,226 **** > bool first = stmt_num == 0; > bool second = stmt_num == 1; > > - if (is_a <loop_vec_info> (vinfo)) > - loop = LOOP_VINFO_LOOP (as_a <loop_vec_info> (vinfo)); > - > if (is_gimple_call (stmt)) > { > number_of_oprnds = gimple_call_num_args (stmt); > --- 217,222 ---- > *************** again: > *** 276,286 **** > from the pattern. Check that all the stmts of the node are in the > pattern. */ > if (def_stmt && gimple_bb (def_stmt) > ! && ((is_a <loop_vec_info> (vinfo) > ! && flow_bb_inside_loop_p (loop, gimple_bb (def_stmt))) > ! || (is_a <bb_vec_info> (vinfo) > ! && gimple_bb (def_stmt) == as_a <bb_vec_info> (vinfo)->bb > ! && gimple_code (def_stmt) != GIMPLE_PHI)) > && vinfo_for_stmt (def_stmt) > && STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (def_stmt)) > && !STMT_VINFO_RELEVANT (vinfo_for_stmt (def_stmt)) > --- 272,278 ---- > from the pattern. Check that all the stmts of the node are in the > pattern. */ > if (def_stmt && gimple_bb (def_stmt) > ! && vect_stmt_in_region_p (vinfo, def_stmt) > && vinfo_for_stmt (def_stmt) > && STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (def_stmt)) > && !STMT_VINFO_RELEVANT (vinfo_for_stmt (def_stmt)) > *************** vect_detect_hybrid_slp (loop_vec_info lo > *** 2076,2091 **** > stmt_vec_info structs for all the stmts in it. */ > > static bb_vec_info > ! new_bb_vec_info (basic_block bb) > { > bb_vec_info res = NULL; > gimple_stmt_iterator gsi; > > res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info)); > res->kind = vec_info::bb; > BB_VINFO_BB (res) = bb; > > ! for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) > { > gimple *stmt = gsi_stmt (gsi); > gimple_set_uid (stmt, 0); > --- 2068,2088 ---- > stmt_vec_info structs for all the stmts in it. */ > > static bb_vec_info > ! new_bb_vec_info (gimple_stmt_iterator region_begin, > ! gimple_stmt_iterator region_end) > { > + basic_block bb = gsi_bb (region_begin); > bb_vec_info res = NULL; > gimple_stmt_iterator gsi; > > res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info)); > res->kind = vec_info::bb; > BB_VINFO_BB (res) = bb; > + res->region_begin = region_begin; > + res->region_end = region_end; > > ! for (gsi = region_begin; gsi_stmt (gsi) != gsi_stmt (region_end); > ! gsi_next (&gsi)) > { > gimple *stmt = gsi_stmt (gsi); > gimple_set_uid (stmt, 0); > *************** destroy_bb_vec_info (bb_vec_info bb_vinf > *** 2118,2124 **** > > bb = BB_VINFO_BB (bb_vinfo); > > ! for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si)) > { > gimple *stmt = gsi_stmt (si); > stmt_vec_info stmt_info = vinfo_for_stmt (stmt); > --- 2115,2122 ---- > > bb = BB_VINFO_BB (bb_vinfo); > > ! for (si = bb_vinfo->region_begin; > ! gsi_stmt (si) != gsi_stmt (bb_vinfo->region_end); gsi_next (&si)) > { > gimple *stmt = gsi_stmt (si); > stmt_vec_info stmt_info = vinfo_for_stmt (stmt); > *************** destroy_bb_vec_info (bb_vec_info bb_vinf > *** 2126,2131 **** > --- 2124,2132 ---- > if (stmt_info) > /* Free stmt_vec_info. */ > free_stmt_vec_info (stmt); > + > + /* Reset region marker. */ > + gimple_set_uid (stmt, -1); > } > > vect_destroy_datarefs (bb_vinfo); > *************** vect_bb_slp_scalar_cost (basic_block bb, > *** 2247,2254 **** > gimple *use_stmt; > FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p)) > if (!is_gimple_debug (use_stmt) > ! && (gimple_code (use_stmt) == GIMPLE_PHI > ! || gimple_bb (use_stmt) != bb > || !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (use_stmt)))) > { > (*life)[i] = true; > --- 2248,2255 ---- > gimple *use_stmt; > FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p)) > if (!is_gimple_debug (use_stmt) > ! && (! vect_stmt_in_region_p (vinfo_for_stmt (stmt)->vinfo, > ! use_stmt) > || !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (use_stmt)))) > { > (*life)[i] = true; > *************** vect_bb_vectorization_profitable_p (bb_v > *** 2327,2366 **** > /* Check if the basic block can be vectorized. */ > > static bb_vec_info > ! vect_slp_analyze_bb_1 (basic_block bb) > { > bb_vec_info bb_vinfo; > vec<slp_instance> slp_instances; > slp_instance instance; > int i; > int min_vf = 2; > - unsigned n_stmts = 0; > > ! bb_vinfo = new_bb_vec_info (bb); > if (!bb_vinfo) > return NULL; > > ! /* Gather all data references in the basic-block. */ > ! > ! for (gimple_stmt_iterator gsi = gsi_start_bb (bb); > ! !gsi_end_p (gsi); gsi_next (&gsi)) > ! { > ! gimple *stmt = gsi_stmt (gsi); > ! if (is_gimple_debug (stmt)) > ! continue; > ! ++n_stmts; > ! if (!find_data_references_in_stmt (NULL, stmt, > ! &BB_VINFO_DATAREFS (bb_vinfo))) > ! { > ! /* Mark the rest of the basic-block as unvectorizable. */ > ! for (; !gsi_end_p (gsi); gsi_next (&gsi)) > ! { > ! stmt = gsi_stmt (gsi); > ! STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt)) = false; > ! } > ! break; > ! } > ! } > > /* Analyze the data references. */ > > --- 2328,2358 ---- > /* Check if the basic block can be vectorized. */ > > static bb_vec_info > ! vect_slp_analyze_bb_1 (gimple_stmt_iterator region_begin, > ! gimple_stmt_iterator region_end, > ! vec<data_reference_p> datarefs, int n_stmts) > { > bb_vec_info bb_vinfo; > vec<slp_instance> slp_instances; > slp_instance instance; > int i; > int min_vf = 2; > > ! if (n_stmts > PARAM_VALUE (PARAM_SLP_MAX_INSNS_IN_BB)) > ! { > ! if (dump_enabled_p ()) > ! dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > ! "not vectorized: too many instructions in " > ! "basic block.\n"); > ! free_data_refs (datarefs); > ! return NULL; > ! } > ! > ! bb_vinfo = new_bb_vec_info (region_begin, region_end); > if (!bb_vinfo) > return NULL; > > ! BB_VINFO_DATAREFS (bb_vinfo) = datarefs; > > /* Analyze the data references. */ > > *************** vect_slp_analyze_bb_1 (basic_block bb) > *** 2438,2445 **** > } > > /* Mark all the statements that we do not want to vectorize. */ > ! for (gimple_stmt_iterator gsi = gsi_start_bb (BB_VINFO_BB (bb_vinfo)); > ! !gsi_end_p (gsi); gsi_next (&gsi)) > { > stmt_vec_info vinfo = vinfo_for_stmt (gsi_stmt (gsi)); > if (STMT_SLP_TYPE (vinfo) != pure_slp) > --- 2430,2437 ---- > } > > /* Mark all the statements that we do not want to vectorize. */ > ! for (gimple_stmt_iterator gsi = bb_vinfo->region_begin; > ! gsi_stmt (gsi) != gsi_stmt (bb_vinfo->region_end); gsi_next (&gsi)) > { > stmt_vec_info vinfo = vinfo_for_stmt (gsi_stmt (gsi)); > if (STMT_SLP_TYPE (vinfo) != pure_slp) > *************** bool > *** 2509,2585 **** > vect_slp_bb (basic_block bb) > { > bb_vec_info bb_vinfo; > - int insns = 0; > gimple_stmt_iterator gsi; > unsigned int vector_sizes; > > if (dump_enabled_p ()) > dump_printf_loc (MSG_NOTE, vect_location, "===vect_slp_analyze_bb===\n"); > > - for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) > - { > - gimple *stmt = gsi_stmt (gsi); > - if (!is_gimple_debug (stmt) > - && !gimple_nop_p (stmt) > - && gimple_code (stmt) != GIMPLE_LABEL) > - insns++; > - if (gimple_location (stmt) != UNKNOWN_LOCATION) > - vect_location = gimple_location (stmt); > - } > - > - if (insns > PARAM_VALUE (PARAM_SLP_MAX_INSNS_IN_BB)) > - { > - if (dump_enabled_p ()) > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > - "not vectorized: too many instructions in " > - "basic block.\n"); > - > - return false; > - } > - > /* Autodetect first vector size we try. */ > current_vector_size = 0; > vector_sizes = targetm.vectorize.autovectorize_vector_sizes (); > > while (1) > { > ! bb_vinfo = vect_slp_analyze_bb_1 (bb); > ! if (bb_vinfo) > { > ! if (!dbg_cnt (vect_slp)) > ! { > ! destroy_bb_vec_info (bb_vinfo); > ! return false; > ! } > > if (dump_enabled_p ()) > ! dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB\n"); > > vect_schedule_slp (bb_vinfo); > > if (dump_enabled_p ()) > dump_printf_loc (MSG_NOTE, vect_location, > ! "BASIC BLOCK VECTORIZED\n"); > > destroy_bb_vec_info (bb_vinfo); > > ! return true; > } > > ! destroy_bb_vec_info (bb_vinfo); > > vector_sizes &= ~current_vector_size; > ! if (vector_sizes == 0 > ! || current_vector_size == 0) > ! return false; > > ! /* Try the next biggest vector size. */ > ! current_vector_size = 1 << floor_log2 (vector_sizes); > ! if (dump_enabled_p ()) > ! dump_printf_loc (MSG_NOTE, vect_location, > ! "***** Re-trying analysis with " > ! "vector size %d\n", current_vector_size); > } > } > > > --- 2501,2605 ---- > vect_slp_bb (basic_block bb) > { > bb_vec_info bb_vinfo; > gimple_stmt_iterator gsi; > unsigned int vector_sizes; > + bool any_vectorized = false; > > if (dump_enabled_p ()) > dump_printf_loc (MSG_NOTE, vect_location, "===vect_slp_analyze_bb===\n"); > > /* Autodetect first vector size we try. */ > current_vector_size = 0; > vector_sizes = targetm.vectorize.autovectorize_vector_sizes (); > > + gsi = gsi_start_bb (bb); > + > while (1) > { > ! if (gsi_end_p (gsi)) > ! break; > ! > ! gimple_stmt_iterator region_begin = gsi; > ! vec<data_reference_p> datarefs = vNULL; > ! int insns = 0; > ! > ! for (; !gsi_end_p (gsi); gsi_next (&gsi)) > { > ! gimple *stmt = gsi_stmt (gsi); > ! if (is_gimple_debug (stmt)) > ! continue; > ! insns++; > ! > ! if (gimple_location (stmt) != UNKNOWN_LOCATION) > ! vect_location = gimple_location (stmt); > ! > ! if (!find_data_references_in_stmt (NULL, stmt, &datarefs)) > ! break; > ! } > ! > ! /* Skip leading unhandled stmts. */ > ! if (gsi_stmt (region_begin) == gsi_stmt (gsi)) > ! { > ! gsi_next (&gsi); > ! continue; > ! } > ! > ! gimple_stmt_iterator region_end = gsi; > > + bool vectorized = false; > + bb_vinfo = vect_slp_analyze_bb_1 (region_begin, region_end, > + datarefs, insns); > + if (bb_vinfo > + && dbg_cnt (vect_slp)) > + { > if (dump_enabled_p ()) > ! dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB part\n"); > > vect_schedule_slp (bb_vinfo); > > if (dump_enabled_p ()) > dump_printf_loc (MSG_NOTE, vect_location, > ! "basic block part vectorized\n"); > > destroy_bb_vec_info (bb_vinfo); > > ! vectorized = true; > } > + else > + destroy_bb_vec_info (bb_vinfo); > > ! any_vectorized |= vectorized; > > vector_sizes &= ~current_vector_size; > ! if (vectorized > ! || vector_sizes == 0 > ! || current_vector_size == 0) > ! { > ! if (gsi_end_p (region_end)) > ! break; > ! > ! /* Skip the unhandled stmt. */ > ! gsi_next (&gsi); > ! > ! /* And reset vector sizes. */ > ! current_vector_size = 0; > ! vector_sizes = targetm.vectorize.autovectorize_vector_sizes (); > ! } > ! else > ! { > ! /* Try the next biggest vector size. */ > ! current_vector_size = 1 << floor_log2 (vector_sizes); > ! if (dump_enabled_p ()) > ! dump_printf_loc (MSG_NOTE, vect_location, > ! "***** Re-trying analysis with " > ! "vector size %d\n", current_vector_size); > > ! /* Start over. */ > ! gsi = region_begin; > ! } > } > + > + return any_vectorized; > } > > > Index: gcc/tree-vect-patterns.c > =================================================================== > *** gcc/tree-vect-patterns.c.orig 2015-11-05 09:52:00.640227178 +0100 > --- gcc/tree-vect-patterns.c 2015-11-05 13:25:46.060011765 +0100 > *************** static bool > *** 107,133 **** > vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2) > { > stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1); > ! loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo); > ! bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_vinfo); > ! > ! if (!gimple_bb (stmt2)) > ! return false; > ! > ! if (loop_vinfo) > ! { > ! struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); > ! if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt2))) > ! return false; > ! } > ! else > ! { > ! if (gimple_bb (stmt2) != BB_VINFO_BB (bb_vinfo) > ! || gimple_code (stmt2) == GIMPLE_PHI) > ! return false; > ! } > ! > ! gcc_assert (vinfo_for_stmt (stmt2)); > ! return true; > } > > /* If the LHS of DEF_STMT has a single use, and that statement is > --- 107,113 ---- > vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2) > { > stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1); > ! return vect_stmt_in_region_p (stmt_vinfo->vinfo, stmt2); > } > > /* If the LHS of DEF_STMT has a single use, and that statement is > *************** vect_pattern_recog (vec_info *vinfo) > *** 3611,3643 **** > loop = LOOP_VINFO_LOOP (loop_vinfo); > bbs = LOOP_VINFO_BBS (loop_vinfo); > nbbs = loop->num_nodes; > } > else > { > ! bbs = &as_a <bb_vec_info> (vinfo)->bb; > ! nbbs = 1; > ! } > ! > ! /* Scan through the loop stmts, applying the pattern recognition > ! functions starting at each stmt visited: */ > ! for (i = 0; i < nbbs; i++) > ! { > ! basic_block bb = bbs[i]; > ! for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si)) > ! { > ! if (is_a <bb_vec_info> (vinfo) > ! && (stmt = gsi_stmt (si)) > && vinfo_for_stmt (stmt) > && !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt))) > ! continue; > > ! /* Scan over all generic vect_recog_xxx_pattern functions. */ > ! for (j = 0; j < NUM_PATTERNS; j++) > ! { > vect_recog_func = vect_vect_recog_func_ptrs[j]; > vect_pattern_recog_1 (vect_recog_func, si, > &stmts_to_replace); > ! } > ! } > } > } > --- 3591,3632 ---- > loop = LOOP_VINFO_LOOP (loop_vinfo); > bbs = LOOP_VINFO_BBS (loop_vinfo); > nbbs = loop->num_nodes; > + > + /* Scan through the loop stmts, applying the pattern recognition > + functions starting at each stmt visited: */ > + for (i = 0; i < nbbs; i++) > + { > + basic_block bb = bbs[i]; > + for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si)) > + { > + /* Scan over all generic vect_recog_xxx_pattern functions. */ > + for (j = 0; j < NUM_PATTERNS; j++) > + { > + vect_recog_func = vect_vect_recog_func_ptrs[j]; > + vect_pattern_recog_1 (vect_recog_func, si, > + &stmts_to_replace); > + } > + } > + } > } > else > { > ! bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo); > ! for (si = bb_vinfo->region_begin; > ! gsi_stmt (si) != gsi_stmt (bb_vinfo->region_end); gsi_next (&si)) > ! { > ! if ((stmt = gsi_stmt (si)) > && vinfo_for_stmt (stmt) > && !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt))) > ! continue; > > ! /* Scan over all generic vect_recog_xxx_pattern functions. */ > ! for (j = 0; j < NUM_PATTERNS; j++) > ! { > vect_recog_func = vect_vect_recog_func_ptrs[j]; > vect_pattern_recog_1 (vect_recog_func, si, > &stmts_to_replace); > ! } > ! } > } > } > Index: gcc/config/i386/i386.c > =================================================================== > *** gcc/config/i386/i386.c.orig 2015-11-05 09:52:42.239687133 +0100 > --- gcc/config/i386/i386.c 2015-11-05 11:09:09.451774562 +0100 > *************** along with GCC; see the file COPYING3. > *** 64,69 **** > --- 64,70 ---- > #include "context.h" > #include "pass_manager.h" > #include "target-globals.h" > + #include "gimple-iterator.h" > #include "tree-vectorizer.h" > #include "shrink-wrap.h" > #include "builtins.h" > Index: gcc/testsuite/gcc.dg/vect/bb-slp-38.c > =================================================================== > *** /dev/null 1970-01-01 00:00:00.000000000 +0000 > --- gcc/testsuite/gcc.dg/vect/bb-slp-38.c 2015-11-05 14:00:48.177644327 +0100 > *************** > *** 0 **** > --- 1,44 ---- > + /* { dg-require-effective-target vect_int } */ > + > + #include "tree-vect.h" > + > + extern void abort (void); > + > + int a[8], b[8]; > + int x; > + > + void __attribute__((noinline,noclone)) > + bar (void) > + { > + x = 1; > + } > + > + void __attribute__((noinline,noclone)) > + foo(void) > + { > + a[0] = b[0]; > + a[1] = b[0]; > + a[2] = b[3]; > + a[3] = b[3]; > + bar (); > + a[4] = b[4]; > + a[5] = b[7]; > + a[6] = b[4]; > + a[7] = b[7]; > + } > + > + int main() > + { > + int i; > + check_vect (); > + for (i = 0; i < 8; ++i) > + b[i] = i; > + foo (); > + if (a[0] != 0 || a[1] != 0 || a[2] != 3 || a[3] != 3 > + || a[4] != 4 || a[5] != 7 || a[6] != 4 || a[7] != 7) > + abort (); > + return 0; > + } > + > + /* { dg-final { scan-tree-dump "basic block vectorized" "slp2" { target vect_perm } } } */ > + /* { dg-final { scan-tree-dump-times "basic block part vectorized" 2 "slp2" { target vect_perm } } } */ > Index: gcc/tree-vect-stmts.c > =================================================================== > *** gcc/tree-vect-stmts.c.orig 2015-11-02 12:37:11.074249388 +0100 > --- gcc/tree-vect-stmts.c 2015-11-05 13:29:21.413423692 +0100 > *************** vect_is_simple_use (tree operand, vec_in > *** 8196,8207 **** > dump_gimple_stmt (MSG_NOTE, TDF_SLIM, *def_stmt, 0); > } > > ! basic_block bb = gimple_bb (*def_stmt); > ! if ((is_a <loop_vec_info> (vinfo) > ! && !flow_bb_inside_loop_p (as_a <loop_vec_info> (vinfo)->loop, bb)) > ! || (is_a <bb_vec_info> (vinfo) > ! && (bb != as_a <bb_vec_info> (vinfo)->bb > ! || gimple_code (*def_stmt) == GIMPLE_PHI))) > *dt = vect_external_def; > else > { > --- 8196,8202 ---- > dump_gimple_stmt (MSG_NOTE, TDF_SLIM, *def_stmt, 0); > } > > ! if (! vect_stmt_in_region_p (vinfo, *def_stmt)) > *dt = vect_external_def; > else > { > Index: gcc/tree-vectorizer.c > =================================================================== > *** gcc/tree-vectorizer.c.orig 2015-11-04 09:23:53.724687806 +0100 > --- gcc/tree-vectorizer.c 2015-11-05 13:55:08.299817570 +0100 > *************** vect_destroy_datarefs (vec_info *vinfo) > *** 350,355 **** > --- 350,382 ---- > } > > > + /* Return whether STMT is inside the region we try to vectorize. */ > + > + bool > + vect_stmt_in_region_p (vec_info *vinfo, gimple *stmt) > + { > + if (!gimple_bb (stmt)) > + return false; > + > + if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo)) > + { > + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); > + if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt))) > + return false; > + } > + else > + { > + bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo); > + if (gimple_bb (stmt) != BB_VINFO_BB (bb_vinfo) > + || gimple_uid (stmt) == -1U > + || gimple_code (stmt) == GIMPLE_PHI) > + return false; > + } > + > + return true; > + } > + > + > /* If LOOP has been versioned during ifcvt, return the internal call > guarding it. */ > > *************** pass_slp_vectorize::execute (function *f > *** 692,697 **** > --- 719,732 ---- > scev_initialize (); > } > > + /* Mark all stmts as not belonging to the current region. */ > + FOR_EACH_BB_FN (bb, fun) > + { > + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi); > + gsi_next (&gsi)) > + gimple_set_uid (gsi_stmt (gsi), -1); > + } > + > init_stmt_vec_info_vec (); > > FOR_EACH_BB_FN (bb, fun) > Index: gcc/config/aarch64/aarch64.c > =================================================================== > *** gcc/config/aarch64/aarch64.c.orig 2015-10-28 11:22:25.290823112 +0100 > --- gcc/config/aarch64/aarch64.c 2015-11-06 10:24:21.539818027 +0100 > *************** > *** 52,57 **** > --- 52,58 ---- > #include "params.h" > #include "gimplify.h" > #include "dwarf2.h" > + #include "gimple-iterator.h" > #include "tree-vectorizer.h" > #include "aarch64-cost-tables.h" > #include "dumpfile.h" > -- Richard Biener <rguenther@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Make BB vectorizer work on sub-BBs 2015-11-06 11:10 ` Richard Biener @ 2015-11-06 11:12 ` Kyrill Tkachov 2015-11-06 11:27 ` Richard Biener 2015-11-10 12:56 ` Christophe Lyon 2015-11-06 16:13 ` Jeff Law 1 sibling, 2 replies; 8+ messages in thread From: Kyrill Tkachov @ 2015-11-06 11:12 UTC (permalink / raw) To: Richard Biener, gcc-patches Hi Richard, On 06/11/15 11:09, Richard Biener wrote: > On Fri, 6 Nov 2015, Richard Biener wrote: > >> The following patch makes the BB vectorizer not only handle BB heads >> (until the first stmt with a data reference it cannot handle) but >> arbitrary regions in a BB separated by such stmts. >> >> This improves the number of BB vectorizations from 469 to 556 >> in a quick test on SPEC CPU 2006 with -Ofast on x86_64 and >> 1x400.perlbench 1x410.bwaves 1x416.gamess 1x450.soplex 1x453.povray >> 1x481.wrf failing both patched and unpatched (have to update my >> config used for such experiments it seems ...) >> >> Bootstrapped and tested on x86_64-unknown-linux-gnu, aarch64 cross built. >> >> I'm currently re-testing for a cosmetic change I made when writing >> the changelog. >> >> I expected (and there are) some issues with compile-time. Left >> is unpatched and right is patched. >> >> '403.gcc': 00:00:54 (54) | '403.gcc': 00:00:55 (55) >> '483.xalancbmk': 00:02:20 (140) | '483.xalancbmk': 00:02:24 (144) >> '416.gamess': 00:02:36 (156) | '416.gamess': 00:02:37 (157) >> '435.gromacs': 00:00:18 (18) | '435.gromacs': 00:00:19 (19) >> '447.dealII': 00:01:31 (91) | '447.dealII': 00:01:33 (93) >> '453.povray': 00:04:54 (294) | '453.povray': 00:08:54 (534) >> '454.calculix': 00:00:34 (34) | '454.calculix': 00:00:52 (52) >> '481.wrf': 00:01:57 (117) | '481.wrf': 00:01:59 (119) >> >> other benchmarks are unchanged. I'm double-checking now that a followup >> patch I have which re-implements BB vectorization dependence checking >> fixes this (that's the only quadraticness I know of). > Fixes all but > > '453.povray': 00:04:54 (294) | '453.povray': 00:06:46 (406) Note that povray is currently suffering from PR 68198 Kyrill > > it even improves compile-time on some: > > '464.h264ref': 00:00:26 (26) | '464.h264ref': 00:00:21 (21) > > it also increases the number of vectorized BBs to 722. > > Needs some work still though. > > Richard. > >> Richard. >> >> 2015-11-06 Richard Biener <rguenther@suse.de> >> >> * tree-vectorizer.h (struct _bb_vec_info): Add region_begin/end >> members. >> (vect_stmt_in_region_p): Declare. >> * tree-vect-slp.c (new_bb_vec_info): Work on a region. >> (destroy_bb_vec_info): Likewise. >> (vect_bb_slp_scalar_cost): Use vect_stmt_in_region_p. >> (vect_get_and_check_slp_defs): Likewise. >> (vect_slp_analyze_bb_1): Refactor to make it work on sub-BBs. >> (vect_slp_bb): Likewise. >> * tree-vect-patterns.c (vect_same_loop_or_bb_p): Implement >> in terms of vect_stmt_in_region_p. >> (vect_pattern_recog): Iterate over the BB region. >> * tree-vect-stmts.c (vect_is_simple_use): Use vect_stmt_in_region_p. >> * tree-vectorizer.c (vect_stmt_in_region_p): New function. >> (pass_slp_vectorize::execute): Initialize all stmt UIDs to -1. >> >> * config/i386/i386.c: Include gimple-iterator.h. >> * config/aarch64/aarch64.c: Likewise. >> >> * gcc.dg/vect/bb-slp-38.c: New testcase. >> >> Index: gcc/tree-vectorizer.h >> =================================================================== >> *** gcc/tree-vectorizer.h.orig 2015-11-05 09:52:00.640227178 +0100 >> --- gcc/tree-vectorizer.h 2015-11-05 13:20:58.385786476 +0100 >> *************** nested_in_vect_loop_p (struct loop *loop >> *** 390,395 **** >> --- 390,397 ---- >> typedef struct _bb_vec_info : public vec_info >> { >> basic_block bb; >> + gimple_stmt_iterator region_begin; >> + gimple_stmt_iterator region_end; >> } *bb_vec_info; >> >> #define BB_VINFO_BB(B) (B)->bb >> *************** void vect_pattern_recog (vec_info *); >> *** 1085,1089 **** >> --- 1087,1092 ---- >> /* In tree-vectorizer.c. */ >> unsigned vectorize_loops (void); >> void vect_destroy_datarefs (vec_info *); >> + bool vect_stmt_in_region_p (vec_info *, gimple *); >> >> #endif /* GCC_TREE_VECTORIZER_H */ >> Index: gcc/tree-vect-slp.c >> =================================================================== >> *** gcc/tree-vect-slp.c.orig 2015-11-05 09:52:00.640227178 +0100 >> --- gcc/tree-vect-slp.c 2015-11-06 10:22:56.707880233 +0100 >> *************** vect_get_and_check_slp_defs (vec_info *v >> *** 209,215 **** >> unsigned int i, number_of_oprnds; >> gimple *def_stmt; >> enum vect_def_type dt = vect_uninitialized_def; >> - struct loop *loop = NULL; >> bool pattern = false; >> slp_oprnd_info oprnd_info; >> int first_op_idx = 1; >> --- 209,214 ---- >> *************** vect_get_and_check_slp_defs (vec_info *v >> *** 218,226 **** >> bool first = stmt_num == 0; >> bool second = stmt_num == 1; >> >> - if (is_a <loop_vec_info> (vinfo)) >> - loop = LOOP_VINFO_LOOP (as_a <loop_vec_info> (vinfo)); >> - >> if (is_gimple_call (stmt)) >> { >> number_of_oprnds = gimple_call_num_args (stmt); >> --- 217,222 ---- >> *************** again: >> *** 276,286 **** >> from the pattern. Check that all the stmts of the node are in the >> pattern. */ >> if (def_stmt && gimple_bb (def_stmt) >> ! && ((is_a <loop_vec_info> (vinfo) >> ! && flow_bb_inside_loop_p (loop, gimple_bb (def_stmt))) >> ! || (is_a <bb_vec_info> (vinfo) >> ! && gimple_bb (def_stmt) == as_a <bb_vec_info> (vinfo)->bb >> ! && gimple_code (def_stmt) != GIMPLE_PHI)) >> && vinfo_for_stmt (def_stmt) >> && STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (def_stmt)) >> && !STMT_VINFO_RELEVANT (vinfo_for_stmt (def_stmt)) >> --- 272,278 ---- >> from the pattern. Check that all the stmts of the node are in the >> pattern. */ >> if (def_stmt && gimple_bb (def_stmt) >> ! && vect_stmt_in_region_p (vinfo, def_stmt) >> && vinfo_for_stmt (def_stmt) >> && STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (def_stmt)) >> && !STMT_VINFO_RELEVANT (vinfo_for_stmt (def_stmt)) >> *************** vect_detect_hybrid_slp (loop_vec_info lo >> *** 2076,2091 **** >> stmt_vec_info structs for all the stmts in it. */ >> >> static bb_vec_info >> ! new_bb_vec_info (basic_block bb) >> { >> bb_vec_info res = NULL; >> gimple_stmt_iterator gsi; >> >> res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info)); >> res->kind = vec_info::bb; >> BB_VINFO_BB (res) = bb; >> >> ! for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) >> { >> gimple *stmt = gsi_stmt (gsi); >> gimple_set_uid (stmt, 0); >> --- 2068,2088 ---- >> stmt_vec_info structs for all the stmts in it. */ >> >> static bb_vec_info >> ! new_bb_vec_info (gimple_stmt_iterator region_begin, >> ! gimple_stmt_iterator region_end) >> { >> + basic_block bb = gsi_bb (region_begin); >> bb_vec_info res = NULL; >> gimple_stmt_iterator gsi; >> >> res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info)); >> res->kind = vec_info::bb; >> BB_VINFO_BB (res) = bb; >> + res->region_begin = region_begin; >> + res->region_end = region_end; >> >> ! for (gsi = region_begin; gsi_stmt (gsi) != gsi_stmt (region_end); >> ! gsi_next (&gsi)) >> { >> gimple *stmt = gsi_stmt (gsi); >> gimple_set_uid (stmt, 0); >> *************** destroy_bb_vec_info (bb_vec_info bb_vinf >> *** 2118,2124 **** >> >> bb = BB_VINFO_BB (bb_vinfo); >> >> ! for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si)) >> { >> gimple *stmt = gsi_stmt (si); >> stmt_vec_info stmt_info = vinfo_for_stmt (stmt); >> --- 2115,2122 ---- >> >> bb = BB_VINFO_BB (bb_vinfo); >> >> ! for (si = bb_vinfo->region_begin; >> ! gsi_stmt (si) != gsi_stmt (bb_vinfo->region_end); gsi_next (&si)) >> { >> gimple *stmt = gsi_stmt (si); >> stmt_vec_info stmt_info = vinfo_for_stmt (stmt); >> *************** destroy_bb_vec_info (bb_vec_info bb_vinf >> *** 2126,2131 **** >> --- 2124,2132 ---- >> if (stmt_info) >> /* Free stmt_vec_info. */ >> free_stmt_vec_info (stmt); >> + >> + /* Reset region marker. */ >> + gimple_set_uid (stmt, -1); >> } >> >> vect_destroy_datarefs (bb_vinfo); >> *************** vect_bb_slp_scalar_cost (basic_block bb, >> *** 2247,2254 **** >> gimple *use_stmt; >> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p)) >> if (!is_gimple_debug (use_stmt) >> ! && (gimple_code (use_stmt) == GIMPLE_PHI >> ! || gimple_bb (use_stmt) != bb >> || !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (use_stmt)))) >> { >> (*life)[i] = true; >> --- 2248,2255 ---- >> gimple *use_stmt; >> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p)) >> if (!is_gimple_debug (use_stmt) >> ! && (! vect_stmt_in_region_p (vinfo_for_stmt (stmt)->vinfo, >> ! use_stmt) >> || !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (use_stmt)))) >> { >> (*life)[i] = true; >> *************** vect_bb_vectorization_profitable_p (bb_v >> *** 2327,2366 **** >> /* Check if the basic block can be vectorized. */ >> >> static bb_vec_info >> ! vect_slp_analyze_bb_1 (basic_block bb) >> { >> bb_vec_info bb_vinfo; >> vec<slp_instance> slp_instances; >> slp_instance instance; >> int i; >> int min_vf = 2; >> - unsigned n_stmts = 0; >> >> ! bb_vinfo = new_bb_vec_info (bb); >> if (!bb_vinfo) >> return NULL; >> >> ! /* Gather all data references in the basic-block. */ >> ! >> ! for (gimple_stmt_iterator gsi = gsi_start_bb (bb); >> ! !gsi_end_p (gsi); gsi_next (&gsi)) >> ! { >> ! gimple *stmt = gsi_stmt (gsi); >> ! if (is_gimple_debug (stmt)) >> ! continue; >> ! ++n_stmts; >> ! if (!find_data_references_in_stmt (NULL, stmt, >> ! &BB_VINFO_DATAREFS (bb_vinfo))) >> ! { >> ! /* Mark the rest of the basic-block as unvectorizable. */ >> ! for (; !gsi_end_p (gsi); gsi_next (&gsi)) >> ! { >> ! stmt = gsi_stmt (gsi); >> ! STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt)) = false; >> ! } >> ! break; >> ! } >> ! } >> >> /* Analyze the data references. */ >> >> --- 2328,2358 ---- >> /* Check if the basic block can be vectorized. */ >> >> static bb_vec_info >> ! vect_slp_analyze_bb_1 (gimple_stmt_iterator region_begin, >> ! gimple_stmt_iterator region_end, >> ! vec<data_reference_p> datarefs, int n_stmts) >> { >> bb_vec_info bb_vinfo; >> vec<slp_instance> slp_instances; >> slp_instance instance; >> int i; >> int min_vf = 2; >> >> ! if (n_stmts > PARAM_VALUE (PARAM_SLP_MAX_INSNS_IN_BB)) >> ! { >> ! if (dump_enabled_p ()) >> ! dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >> ! "not vectorized: too many instructions in " >> ! "basic block.\n"); >> ! free_data_refs (datarefs); >> ! return NULL; >> ! } >> ! >> ! bb_vinfo = new_bb_vec_info (region_begin, region_end); >> if (!bb_vinfo) >> return NULL; >> >> ! BB_VINFO_DATAREFS (bb_vinfo) = datarefs; >> >> /* Analyze the data references. */ >> >> *************** vect_slp_analyze_bb_1 (basic_block bb) >> *** 2438,2445 **** >> } >> >> /* Mark all the statements that we do not want to vectorize. */ >> ! for (gimple_stmt_iterator gsi = gsi_start_bb (BB_VINFO_BB (bb_vinfo)); >> ! !gsi_end_p (gsi); gsi_next (&gsi)) >> { >> stmt_vec_info vinfo = vinfo_for_stmt (gsi_stmt (gsi)); >> if (STMT_SLP_TYPE (vinfo) != pure_slp) >> --- 2430,2437 ---- >> } >> >> /* Mark all the statements that we do not want to vectorize. */ >> ! for (gimple_stmt_iterator gsi = bb_vinfo->region_begin; >> ! gsi_stmt (gsi) != gsi_stmt (bb_vinfo->region_end); gsi_next (&gsi)) >> { >> stmt_vec_info vinfo = vinfo_for_stmt (gsi_stmt (gsi)); >> if (STMT_SLP_TYPE (vinfo) != pure_slp) >> *************** bool >> *** 2509,2585 **** >> vect_slp_bb (basic_block bb) >> { >> bb_vec_info bb_vinfo; >> - int insns = 0; >> gimple_stmt_iterator gsi; >> unsigned int vector_sizes; >> >> if (dump_enabled_p ()) >> dump_printf_loc (MSG_NOTE, vect_location, "===vect_slp_analyze_bb===\n"); >> >> - for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) >> - { >> - gimple *stmt = gsi_stmt (gsi); >> - if (!is_gimple_debug (stmt) >> - && !gimple_nop_p (stmt) >> - && gimple_code (stmt) != GIMPLE_LABEL) >> - insns++; >> - if (gimple_location (stmt) != UNKNOWN_LOCATION) >> - vect_location = gimple_location (stmt); >> - } >> - >> - if (insns > PARAM_VALUE (PARAM_SLP_MAX_INSNS_IN_BB)) >> - { >> - if (dump_enabled_p ()) >> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >> - "not vectorized: too many instructions in " >> - "basic block.\n"); >> - >> - return false; >> - } >> - >> /* Autodetect first vector size we try. */ >> current_vector_size = 0; >> vector_sizes = targetm.vectorize.autovectorize_vector_sizes (); >> >> while (1) >> { >> ! bb_vinfo = vect_slp_analyze_bb_1 (bb); >> ! if (bb_vinfo) >> { >> ! if (!dbg_cnt (vect_slp)) >> ! { >> ! destroy_bb_vec_info (bb_vinfo); >> ! return false; >> ! } >> >> if (dump_enabled_p ()) >> ! dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB\n"); >> >> vect_schedule_slp (bb_vinfo); >> >> if (dump_enabled_p ()) >> dump_printf_loc (MSG_NOTE, vect_location, >> ! "BASIC BLOCK VECTORIZED\n"); >> >> destroy_bb_vec_info (bb_vinfo); >> >> ! return true; >> } >> >> ! destroy_bb_vec_info (bb_vinfo); >> >> vector_sizes &= ~current_vector_size; >> ! if (vector_sizes == 0 >> ! || current_vector_size == 0) >> ! return false; >> >> ! /* Try the next biggest vector size. */ >> ! current_vector_size = 1 << floor_log2 (vector_sizes); >> ! if (dump_enabled_p ()) >> ! dump_printf_loc (MSG_NOTE, vect_location, >> ! "***** Re-trying analysis with " >> ! "vector size %d\n", current_vector_size); >> } >> } >> >> >> --- 2501,2605 ---- >> vect_slp_bb (basic_block bb) >> { >> bb_vec_info bb_vinfo; >> gimple_stmt_iterator gsi; >> unsigned int vector_sizes; >> + bool any_vectorized = false; >> >> if (dump_enabled_p ()) >> dump_printf_loc (MSG_NOTE, vect_location, "===vect_slp_analyze_bb===\n"); >> >> /* Autodetect first vector size we try. */ >> current_vector_size = 0; >> vector_sizes = targetm.vectorize.autovectorize_vector_sizes (); >> >> + gsi = gsi_start_bb (bb); >> + >> while (1) >> { >> ! if (gsi_end_p (gsi)) >> ! break; >> ! >> ! gimple_stmt_iterator region_begin = gsi; >> ! vec<data_reference_p> datarefs = vNULL; >> ! int insns = 0; >> ! >> ! for (; !gsi_end_p (gsi); gsi_next (&gsi)) >> { >> ! gimple *stmt = gsi_stmt (gsi); >> ! if (is_gimple_debug (stmt)) >> ! continue; >> ! insns++; >> ! >> ! if (gimple_location (stmt) != UNKNOWN_LOCATION) >> ! vect_location = gimple_location (stmt); >> ! >> ! if (!find_data_references_in_stmt (NULL, stmt, &datarefs)) >> ! break; >> ! } >> ! >> ! /* Skip leading unhandled stmts. */ >> ! if (gsi_stmt (region_begin) == gsi_stmt (gsi)) >> ! { >> ! gsi_next (&gsi); >> ! continue; >> ! } >> ! >> ! gimple_stmt_iterator region_end = gsi; >> >> + bool vectorized = false; >> + bb_vinfo = vect_slp_analyze_bb_1 (region_begin, region_end, >> + datarefs, insns); >> + if (bb_vinfo >> + && dbg_cnt (vect_slp)) >> + { >> if (dump_enabled_p ()) >> ! dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB part\n"); >> >> vect_schedule_slp (bb_vinfo); >> >> if (dump_enabled_p ()) >> dump_printf_loc (MSG_NOTE, vect_location, >> ! "basic block part vectorized\n"); >> >> destroy_bb_vec_info (bb_vinfo); >> >> ! vectorized = true; >> } >> + else >> + destroy_bb_vec_info (bb_vinfo); >> >> ! any_vectorized |= vectorized; >> >> vector_sizes &= ~current_vector_size; >> ! if (vectorized >> ! || vector_sizes == 0 >> ! || current_vector_size == 0) >> ! { >> ! if (gsi_end_p (region_end)) >> ! break; >> ! >> ! /* Skip the unhandled stmt. */ >> ! gsi_next (&gsi); >> ! >> ! /* And reset vector sizes. */ >> ! current_vector_size = 0; >> ! vector_sizes = targetm.vectorize.autovectorize_vector_sizes (); >> ! } >> ! else >> ! { >> ! /* Try the next biggest vector size. */ >> ! current_vector_size = 1 << floor_log2 (vector_sizes); >> ! if (dump_enabled_p ()) >> ! dump_printf_loc (MSG_NOTE, vect_location, >> ! "***** Re-trying analysis with " >> ! "vector size %d\n", current_vector_size); >> >> ! /* Start over. */ >> ! gsi = region_begin; >> ! } >> } >> + >> + return any_vectorized; >> } >> >> >> Index: gcc/tree-vect-patterns.c >> =================================================================== >> *** gcc/tree-vect-patterns.c.orig 2015-11-05 09:52:00.640227178 +0100 >> --- gcc/tree-vect-patterns.c 2015-11-05 13:25:46.060011765 +0100 >> *************** static bool >> *** 107,133 **** >> vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2) >> { >> stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1); >> ! loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo); >> ! bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_vinfo); >> ! >> ! if (!gimple_bb (stmt2)) >> ! return false; >> ! >> ! if (loop_vinfo) >> ! { >> ! struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); >> ! if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt2))) >> ! return false; >> ! } >> ! else >> ! { >> ! if (gimple_bb (stmt2) != BB_VINFO_BB (bb_vinfo) >> ! || gimple_code (stmt2) == GIMPLE_PHI) >> ! return false; >> ! } >> ! >> ! gcc_assert (vinfo_for_stmt (stmt2)); >> ! return true; >> } >> >> /* If the LHS of DEF_STMT has a single use, and that statement is >> --- 107,113 ---- >> vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2) >> { >> stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1); >> ! return vect_stmt_in_region_p (stmt_vinfo->vinfo, stmt2); >> } >> >> /* If the LHS of DEF_STMT has a single use, and that statement is >> *************** vect_pattern_recog (vec_info *vinfo) >> *** 3611,3643 **** >> loop = LOOP_VINFO_LOOP (loop_vinfo); >> bbs = LOOP_VINFO_BBS (loop_vinfo); >> nbbs = loop->num_nodes; >> } >> else >> { >> ! bbs = &as_a <bb_vec_info> (vinfo)->bb; >> ! nbbs = 1; >> ! } >> ! >> ! /* Scan through the loop stmts, applying the pattern recognition >> ! functions starting at each stmt visited: */ >> ! for (i = 0; i < nbbs; i++) >> ! { >> ! basic_block bb = bbs[i]; >> ! for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si)) >> ! { >> ! if (is_a <bb_vec_info> (vinfo) >> ! && (stmt = gsi_stmt (si)) >> && vinfo_for_stmt (stmt) >> && !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt))) >> ! continue; >> >> ! /* Scan over all generic vect_recog_xxx_pattern functions. */ >> ! for (j = 0; j < NUM_PATTERNS; j++) >> ! { >> vect_recog_func = vect_vect_recog_func_ptrs[j]; >> vect_pattern_recog_1 (vect_recog_func, si, >> &stmts_to_replace); >> ! } >> ! } >> } >> } >> --- 3591,3632 ---- >> loop = LOOP_VINFO_LOOP (loop_vinfo); >> bbs = LOOP_VINFO_BBS (loop_vinfo); >> nbbs = loop->num_nodes; >> + >> + /* Scan through the loop stmts, applying the pattern recognition >> + functions starting at each stmt visited: */ >> + for (i = 0; i < nbbs; i++) >> + { >> + basic_block bb = bbs[i]; >> + for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si)) >> + { >> + /* Scan over all generic vect_recog_xxx_pattern functions. */ >> + for (j = 0; j < NUM_PATTERNS; j++) >> + { >> + vect_recog_func = vect_vect_recog_func_ptrs[j]; >> + vect_pattern_recog_1 (vect_recog_func, si, >> + &stmts_to_replace); >> + } >> + } >> + } >> } >> else >> { >> ! bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo); >> ! for (si = bb_vinfo->region_begin; >> ! gsi_stmt (si) != gsi_stmt (bb_vinfo->region_end); gsi_next (&si)) >> ! { >> ! if ((stmt = gsi_stmt (si)) >> && vinfo_for_stmt (stmt) >> && !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt))) >> ! continue; >> >> ! /* Scan over all generic vect_recog_xxx_pattern functions. */ >> ! for (j = 0; j < NUM_PATTERNS; j++) >> ! { >> vect_recog_func = vect_vect_recog_func_ptrs[j]; >> vect_pattern_recog_1 (vect_recog_func, si, >> &stmts_to_replace); >> ! } >> ! } >> } >> } >> Index: gcc/config/i386/i386.c >> =================================================================== >> *** gcc/config/i386/i386.c.orig 2015-11-05 09:52:42.239687133 +0100 >> --- gcc/config/i386/i386.c 2015-11-05 11:09:09.451774562 +0100 >> *************** along with GCC; see the file COPYING3. >> *** 64,69 **** >> --- 64,70 ---- >> #include "context.h" >> #include "pass_manager.h" >> #include "target-globals.h" >> + #include "gimple-iterator.h" >> #include "tree-vectorizer.h" >> #include "shrink-wrap.h" >> #include "builtins.h" >> Index: gcc/testsuite/gcc.dg/vect/bb-slp-38.c >> =================================================================== >> *** /dev/null 1970-01-01 00:00:00.000000000 +0000 >> --- gcc/testsuite/gcc.dg/vect/bb-slp-38.c 2015-11-05 14:00:48.177644327 +0100 >> *************** >> *** 0 **** >> --- 1,44 ---- >> + /* { dg-require-effective-target vect_int } */ >> + >> + #include "tree-vect.h" >> + >> + extern void abort (void); >> + >> + int a[8], b[8]; >> + int x; >> + >> + void __attribute__((noinline,noclone)) >> + bar (void) >> + { >> + x = 1; >> + } >> + >> + void __attribute__((noinline,noclone)) >> + foo(void) >> + { >> + a[0] = b[0]; >> + a[1] = b[0]; >> + a[2] = b[3]; >> + a[3] = b[3]; >> + bar (); >> + a[4] = b[4]; >> + a[5] = b[7]; >> + a[6] = b[4]; >> + a[7] = b[7]; >> + } >> + >> + int main() >> + { >> + int i; >> + check_vect (); >> + for (i = 0; i < 8; ++i) >> + b[i] = i; >> + foo (); >> + if (a[0] != 0 || a[1] != 0 || a[2] != 3 || a[3] != 3 >> + || a[4] != 4 || a[5] != 7 || a[6] != 4 || a[7] != 7) >> + abort (); >> + return 0; >> + } >> + >> + /* { dg-final { scan-tree-dump "basic block vectorized" "slp2" { target vect_perm } } } */ >> + /* { dg-final { scan-tree-dump-times "basic block part vectorized" 2 "slp2" { target vect_perm } } } */ >> Index: gcc/tree-vect-stmts.c >> =================================================================== >> *** gcc/tree-vect-stmts.c.orig 2015-11-02 12:37:11.074249388 +0100 >> --- gcc/tree-vect-stmts.c 2015-11-05 13:29:21.413423692 +0100 >> *************** vect_is_simple_use (tree operand, vec_in >> *** 8196,8207 **** >> dump_gimple_stmt (MSG_NOTE, TDF_SLIM, *def_stmt, 0); >> } >> >> ! basic_block bb = gimple_bb (*def_stmt); >> ! if ((is_a <loop_vec_info> (vinfo) >> ! && !flow_bb_inside_loop_p (as_a <loop_vec_info> (vinfo)->loop, bb)) >> ! || (is_a <bb_vec_info> (vinfo) >> ! && (bb != as_a <bb_vec_info> (vinfo)->bb >> ! || gimple_code (*def_stmt) == GIMPLE_PHI))) >> *dt = vect_external_def; >> else >> { >> --- 8196,8202 ---- >> dump_gimple_stmt (MSG_NOTE, TDF_SLIM, *def_stmt, 0); >> } >> >> ! if (! vect_stmt_in_region_p (vinfo, *def_stmt)) >> *dt = vect_external_def; >> else >> { >> Index: gcc/tree-vectorizer.c >> =================================================================== >> *** gcc/tree-vectorizer.c.orig 2015-11-04 09:23:53.724687806 +0100 >> --- gcc/tree-vectorizer.c 2015-11-05 13:55:08.299817570 +0100 >> *************** vect_destroy_datarefs (vec_info *vinfo) >> *** 350,355 **** >> --- 350,382 ---- >> } >> >> >> + /* Return whether STMT is inside the region we try to vectorize. */ >> + >> + bool >> + vect_stmt_in_region_p (vec_info *vinfo, gimple *stmt) >> + { >> + if (!gimple_bb (stmt)) >> + return false; >> + >> + if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo)) >> + { >> + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); >> + if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt))) >> + return false; >> + } >> + else >> + { >> + bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo); >> + if (gimple_bb (stmt) != BB_VINFO_BB (bb_vinfo) >> + || gimple_uid (stmt) == -1U >> + || gimple_code (stmt) == GIMPLE_PHI) >> + return false; >> + } >> + >> + return true; >> + } >> + >> + >> /* If LOOP has been versioned during ifcvt, return the internal call >> guarding it. */ >> >> *************** pass_slp_vectorize::execute (function *f >> *** 692,697 **** >> --- 719,732 ---- >> scev_initialize (); >> } >> >> + /* Mark all stmts as not belonging to the current region. */ >> + FOR_EACH_BB_FN (bb, fun) >> + { >> + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p (gsi); >> + gsi_next (&gsi)) >> + gimple_set_uid (gsi_stmt (gsi), -1); >> + } >> + >> init_stmt_vec_info_vec (); >> >> FOR_EACH_BB_FN (bb, fun) >> Index: gcc/config/aarch64/aarch64.c >> =================================================================== >> *** gcc/config/aarch64/aarch64.c.orig 2015-10-28 11:22:25.290823112 +0100 >> --- gcc/config/aarch64/aarch64.c 2015-11-06 10:24:21.539818027 +0100 >> *************** >> *** 52,57 **** >> --- 52,58 ---- >> #include "params.h" >> #include "gimplify.h" >> #include "dwarf2.h" >> + #include "gimple-iterator.h" >> #include "tree-vectorizer.h" >> #include "aarch64-cost-tables.h" >> #include "dumpfile.h" >> ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Make BB vectorizer work on sub-BBs 2015-11-06 11:12 ` Kyrill Tkachov @ 2015-11-06 11:27 ` Richard Biener 2015-11-10 12:56 ` Christophe Lyon 1 sibling, 0 replies; 8+ messages in thread From: Richard Biener @ 2015-11-06 11:27 UTC (permalink / raw) To: Kyrill Tkachov; +Cc: gcc-patches On Fri, 6 Nov 2015, Kyrill Tkachov wrote: > Hi Richard, > > On 06/11/15 11:09, Richard Biener wrote: > > On Fri, 6 Nov 2015, Richard Biener wrote: > > > > > The following patch makes the BB vectorizer not only handle BB heads > > > (until the first stmt with a data reference it cannot handle) but > > > arbitrary regions in a BB separated by such stmts. > > > > > > This improves the number of BB vectorizations from 469 to 556 > > > in a quick test on SPEC CPU 2006 with -Ofast on x86_64 and > > > 1x400.perlbench 1x410.bwaves 1x416.gamess 1x450.soplex 1x453.povray > > > 1x481.wrf failing both patched and unpatched (have to update my > > > config used for such experiments it seems ...) > > > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu, aarch64 cross built. > > > > > > I'm currently re-testing for a cosmetic change I made when writing > > > the changelog. > > > > > > I expected (and there are) some issues with compile-time. Left > > > is unpatched and right is patched. > > > > > > '403.gcc': 00:00:54 (54) | '403.gcc': 00:00:55 (55) > > > '483.xalancbmk': 00:02:20 (140) | '483.xalancbmk': 00:02:24 (144) > > > '416.gamess': 00:02:36 (156) | '416.gamess': 00:02:37 (157) > > > '435.gromacs': 00:00:18 (18) | '435.gromacs': 00:00:19 (19) > > > '447.dealII': 00:01:31 (91) | '447.dealII': 00:01:33 (93) > > > '453.povray': 00:04:54 (294) | '453.povray': 00:08:54 (534) > > > '454.calculix': 00:00:34 (34) | '454.calculix': 00:00:52 (52) > > > '481.wrf': 00:01:57 (117) | '481.wrf': 00:01:59 (119) > > > > > > other benchmarks are unchanged. I'm double-checking now that a followup > > > patch I have which re-implements BB vectorization dependence checking > > > fixes this (that's the only quadraticness I know of). > > Fixes all but > > > > '453.povray': 00:04:54 (294) | '453.povray': 00:06:46 (406) > > Note that povray is currently suffering from PR 68198 Ah, yeah. Seems to run into /space/rguenther/install-trunk/usr/local/bin/g++ -c -o fnpovfpu.o -DSPEC_CPU -DNDEBUG -Ofast -fopt-info-vec -ftime-report -Wl,-rpath=/abuild/rguenther/install-trunk/usr/local/lib64 -DSPEC_CPU_LP64 -Wno-multichar fnpovfpu.cpp g++: internal compiler error: Killed (program cc1plus) Please submit a full bug report, with preprocessed source if appropriate. See <http://gcc.gnu.org/bugs.html> for instructions. specmake: *** [fnpovfpu.o] Error 4 and dmesg [7525617.394116] Out of memory: Kill process 31426 (cc1plus) score 832 or sacrif ice child [7525617.394117] Killed process 31426 (cc1plus) total-vm:8399700kB, anon-rss:679 0020kB, file-rss:1584kB for me (and that's the one taking all the time). I can imagine that with many basic-blocks the patch might end up as a net slowdown still. I'll try to investigate anyway, maybe I'm leaking sth. Richard. > Kyrill > > > > > it even improves compile-time on some: > > > > '464.h264ref': 00:00:26 (26) | '464.h264ref': 00:00:21 (21) > > > > it also increases the number of vectorized BBs to 722. > > > > Needs some work still though. > > > > Richard. > > > > > Richard. > > > > > > 2015-11-06 Richard Biener <rguenther@suse.de> > > > > > > * tree-vectorizer.h (struct _bb_vec_info): Add region_begin/end > > > members. > > > (vect_stmt_in_region_p): Declare. > > > * tree-vect-slp.c (new_bb_vec_info): Work on a region. > > > (destroy_bb_vec_info): Likewise. > > > (vect_bb_slp_scalar_cost): Use vect_stmt_in_region_p. > > > (vect_get_and_check_slp_defs): Likewise. > > > (vect_slp_analyze_bb_1): Refactor to make it work on sub-BBs. > > > (vect_slp_bb): Likewise. > > > * tree-vect-patterns.c (vect_same_loop_or_bb_p): Implement > > > in terms of vect_stmt_in_region_p. > > > (vect_pattern_recog): Iterate over the BB region. > > > * tree-vect-stmts.c (vect_is_simple_use): Use vect_stmt_in_region_p. > > > * tree-vectorizer.c (vect_stmt_in_region_p): New function. > > > (pass_slp_vectorize::execute): Initialize all stmt UIDs to -1. > > > > > > * config/i386/i386.c: Include gimple-iterator.h. > > > * config/aarch64/aarch64.c: Likewise. > > > > > > * gcc.dg/vect/bb-slp-38.c: New testcase. > > > > > > Index: gcc/tree-vectorizer.h > > > =================================================================== > > > *** gcc/tree-vectorizer.h.orig 2015-11-05 09:52:00.640227178 +0100 > > > --- gcc/tree-vectorizer.h 2015-11-05 13:20:58.385786476 +0100 > > > *************** nested_in_vect_loop_p (struct loop *loop > > > *** 390,395 **** > > > --- 390,397 ---- > > > typedef struct _bb_vec_info : public vec_info > > > { > > > basic_block bb; > > > + gimple_stmt_iterator region_begin; > > > + gimple_stmt_iterator region_end; > > > } *bb_vec_info; > > > #define BB_VINFO_BB(B) (B)->bb > > > *************** void vect_pattern_recog (vec_info *); > > > *** 1085,1089 **** > > > --- 1087,1092 ---- > > > /* In tree-vectorizer.c. */ > > > unsigned vectorize_loops (void); > > > void vect_destroy_datarefs (vec_info *); > > > + bool vect_stmt_in_region_p (vec_info *, gimple *); > > > #endif /* GCC_TREE_VECTORIZER_H */ > > > Index: gcc/tree-vect-slp.c > > > =================================================================== > > > *** gcc/tree-vect-slp.c.orig 2015-11-05 09:52:00.640227178 +0100 > > > --- gcc/tree-vect-slp.c 2015-11-06 10:22:56.707880233 +0100 > > > *************** vect_get_and_check_slp_defs (vec_info *v > > > *** 209,215 **** > > > unsigned int i, number_of_oprnds; > > > gimple *def_stmt; > > > enum vect_def_type dt = vect_uninitialized_def; > > > - struct loop *loop = NULL; > > > bool pattern = false; > > > slp_oprnd_info oprnd_info; > > > int first_op_idx = 1; > > > --- 209,214 ---- > > > *************** vect_get_and_check_slp_defs (vec_info *v > > > *** 218,226 **** > > > bool first = stmt_num == 0; > > > bool second = stmt_num == 1; > > > - if (is_a <loop_vec_info> (vinfo)) > > > - loop = LOOP_VINFO_LOOP (as_a <loop_vec_info> (vinfo)); > > > - > > > if (is_gimple_call (stmt)) > > > { > > > number_of_oprnds = gimple_call_num_args (stmt); > > > --- 217,222 ---- > > > *************** again: > > > *** 276,286 **** > > > from the pattern. Check that all the stmts of the node are in > > > the > > > pattern. */ > > > if (def_stmt && gimple_bb (def_stmt) > > > ! && ((is_a <loop_vec_info> (vinfo) > > > ! && flow_bb_inside_loop_p (loop, gimple_bb (def_stmt))) > > > ! || (is_a <bb_vec_info> (vinfo) > > > ! && gimple_bb (def_stmt) == as_a <bb_vec_info> (vinfo)->bb > > > ! && gimple_code (def_stmt) != GIMPLE_PHI)) > > > && vinfo_for_stmt (def_stmt) > > > && STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (def_stmt)) > > > && !STMT_VINFO_RELEVANT (vinfo_for_stmt (def_stmt)) > > > --- 272,278 ---- > > > from the pattern. Check that all the stmts of the node are in > > > the > > > pattern. */ > > > if (def_stmt && gimple_bb (def_stmt) > > > ! && vect_stmt_in_region_p (vinfo, def_stmt) > > > && vinfo_for_stmt (def_stmt) > > > && STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (def_stmt)) > > > && !STMT_VINFO_RELEVANT (vinfo_for_stmt (def_stmt)) > > > *************** vect_detect_hybrid_slp (loop_vec_info lo > > > *** 2076,2091 **** > > > stmt_vec_info structs for all the stmts in it. */ > > > static bb_vec_info > > > ! new_bb_vec_info (basic_block bb) > > > { > > > bb_vec_info res = NULL; > > > gimple_stmt_iterator gsi; > > > res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info)); > > > res->kind = vec_info::bb; > > > BB_VINFO_BB (res) = bb; > > > ! for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) > > > { > > > gimple *stmt = gsi_stmt (gsi); > > > gimple_set_uid (stmt, 0); > > > --- 2068,2088 ---- > > > stmt_vec_info structs for all the stmts in it. */ > > > static bb_vec_info > > > ! new_bb_vec_info (gimple_stmt_iterator region_begin, > > > ! gimple_stmt_iterator region_end) > > > { > > > + basic_block bb = gsi_bb (region_begin); > > > bb_vec_info res = NULL; > > > gimple_stmt_iterator gsi; > > > res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info)); > > > res->kind = vec_info::bb; > > > BB_VINFO_BB (res) = bb; > > > + res->region_begin = region_begin; > > > + res->region_end = region_end; > > > ! for (gsi = region_begin; gsi_stmt (gsi) != gsi_stmt (region_end); > > > ! gsi_next (&gsi)) > > > { > > > gimple *stmt = gsi_stmt (gsi); > > > gimple_set_uid (stmt, 0); > > > *************** destroy_bb_vec_info (bb_vec_info bb_vinf > > > *** 2118,2124 **** > > > bb = BB_VINFO_BB (bb_vinfo); > > > ! for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si)) > > > { > > > gimple *stmt = gsi_stmt (si); > > > stmt_vec_info stmt_info = vinfo_for_stmt (stmt); > > > --- 2115,2122 ---- > > > bb = BB_VINFO_BB (bb_vinfo); > > > ! for (si = bb_vinfo->region_begin; > > > ! gsi_stmt (si) != gsi_stmt (bb_vinfo->region_end); gsi_next (&si)) > > > { > > > gimple *stmt = gsi_stmt (si); > > > stmt_vec_info stmt_info = vinfo_for_stmt (stmt); > > > *************** destroy_bb_vec_info (bb_vec_info bb_vinf > > > *** 2126,2131 **** > > > --- 2124,2132 ---- > > > if (stmt_info) > > > /* Free stmt_vec_info. */ > > > free_stmt_vec_info (stmt); > > > + > > > + /* Reset region marker. */ > > > + gimple_set_uid (stmt, -1); > > > } > > > vect_destroy_datarefs (bb_vinfo); > > > *************** vect_bb_slp_scalar_cost (basic_block bb, > > > *** 2247,2254 **** > > > gimple *use_stmt; > > > FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p)) > > > if (!is_gimple_debug (use_stmt) > > > ! && (gimple_code (use_stmt) == GIMPLE_PHI > > > ! || gimple_bb (use_stmt) != bb > > > || !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (use_stmt)))) > > > { > > > (*life)[i] = true; > > > --- 2248,2255 ---- > > > gimple *use_stmt; > > > FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR (def_p)) > > > if (!is_gimple_debug (use_stmt) > > > ! && (! vect_stmt_in_region_p (vinfo_for_stmt (stmt)->vinfo, > > > ! use_stmt) > > > || !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (use_stmt)))) > > > { > > > (*life)[i] = true; > > > *************** vect_bb_vectorization_profitable_p (bb_v > > > *** 2327,2366 **** > > > /* Check if the basic block can be vectorized. */ > > > static bb_vec_info > > > ! vect_slp_analyze_bb_1 (basic_block bb) > > > { > > > bb_vec_info bb_vinfo; > > > vec<slp_instance> slp_instances; > > > slp_instance instance; > > > int i; > > > int min_vf = 2; > > > - unsigned n_stmts = 0; > > > ! bb_vinfo = new_bb_vec_info (bb); > > > if (!bb_vinfo) > > > return NULL; > > > ! /* Gather all data references in the basic-block. */ > > > ! > > > ! for (gimple_stmt_iterator gsi = gsi_start_bb (bb); > > > ! !gsi_end_p (gsi); gsi_next (&gsi)) > > > ! { > > > ! gimple *stmt = gsi_stmt (gsi); > > > ! if (is_gimple_debug (stmt)) > > > ! continue; > > > ! ++n_stmts; > > > ! if (!find_data_references_in_stmt (NULL, stmt, > > > ! &BB_VINFO_DATAREFS (bb_vinfo))) > > > ! { > > > ! /* Mark the rest of the basic-block as unvectorizable. */ > > > ! for (; !gsi_end_p (gsi); gsi_next (&gsi)) > > > ! { > > > ! stmt = gsi_stmt (gsi); > > > ! STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt)) = false; > > > ! } > > > ! break; > > > ! } > > > ! } > > > /* Analyze the data references. */ > > > --- 2328,2358 ---- > > > /* Check if the basic block can be vectorized. */ > > > static bb_vec_info > > > ! vect_slp_analyze_bb_1 (gimple_stmt_iterator region_begin, > > > ! gimple_stmt_iterator region_end, > > > ! vec<data_reference_p> datarefs, int n_stmts) > > > { > > > bb_vec_info bb_vinfo; > > > vec<slp_instance> slp_instances; > > > slp_instance instance; > > > int i; > > > int min_vf = 2; > > > ! if (n_stmts > PARAM_VALUE (PARAM_SLP_MAX_INSNS_IN_BB)) > > > ! { > > > ! if (dump_enabled_p ()) > > > ! dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > > ! "not vectorized: too many instructions in " > > > ! "basic block.\n"); > > > ! free_data_refs (datarefs); > > > ! return NULL; > > > ! } > > > ! > > > ! bb_vinfo = new_bb_vec_info (region_begin, region_end); > > > if (!bb_vinfo) > > > return NULL; > > > ! BB_VINFO_DATAREFS (bb_vinfo) = datarefs; > > > /* Analyze the data references. */ > > > *************** vect_slp_analyze_bb_1 (basic_block bb) > > > *** 2438,2445 **** > > > } > > > /* Mark all the statements that we do not want to vectorize. */ > > > ! for (gimple_stmt_iterator gsi = gsi_start_bb (BB_VINFO_BB (bb_vinfo)); > > > ! !gsi_end_p (gsi); gsi_next (&gsi)) > > > { > > > stmt_vec_info vinfo = vinfo_for_stmt (gsi_stmt (gsi)); > > > if (STMT_SLP_TYPE (vinfo) != pure_slp) > > > --- 2430,2437 ---- > > > } > > > /* Mark all the statements that we do not want to vectorize. */ > > > ! for (gimple_stmt_iterator gsi = bb_vinfo->region_begin; > > > ! gsi_stmt (gsi) != gsi_stmt (bb_vinfo->region_end); gsi_next > > > (&gsi)) > > > { > > > stmt_vec_info vinfo = vinfo_for_stmt (gsi_stmt (gsi)); > > > if (STMT_SLP_TYPE (vinfo) != pure_slp) > > > *************** bool > > > *** 2509,2585 **** > > > vect_slp_bb (basic_block bb) > > > { > > > bb_vec_info bb_vinfo; > > > - int insns = 0; > > > gimple_stmt_iterator gsi; > > > unsigned int vector_sizes; > > > if (dump_enabled_p ()) > > > dump_printf_loc (MSG_NOTE, vect_location, > > > "===vect_slp_analyze_bb===\n"); > > > - for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) > > > - { > > > - gimple *stmt = gsi_stmt (gsi); > > > - if (!is_gimple_debug (stmt) > > > - && !gimple_nop_p (stmt) > > > - && gimple_code (stmt) != GIMPLE_LABEL) > > > - insns++; > > > - if (gimple_location (stmt) != UNKNOWN_LOCATION) > > > - vect_location = gimple_location (stmt); > > > - } > > > - > > > - if (insns > PARAM_VALUE (PARAM_SLP_MAX_INSNS_IN_BB)) > > > - { > > > - if (dump_enabled_p ()) > > > - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > > > - "not vectorized: too many instructions in " > > > - "basic block.\n"); > > > - > > > - return false; > > > - } > > > - > > > /* Autodetect first vector size we try. */ > > > current_vector_size = 0; > > > vector_sizes = targetm.vectorize.autovectorize_vector_sizes (); > > > while (1) > > > { > > > ! bb_vinfo = vect_slp_analyze_bb_1 (bb); > > > ! if (bb_vinfo) > > > { > > > ! if (!dbg_cnt (vect_slp)) > > > ! { > > > ! destroy_bb_vec_info (bb_vinfo); > > > ! return false; > > > ! } > > > if (dump_enabled_p ()) > > > ! dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB\n"); > > > vect_schedule_slp (bb_vinfo); > > > if (dump_enabled_p ()) > > > dump_printf_loc (MSG_NOTE, vect_location, > > > ! "BASIC BLOCK VECTORIZED\n"); > > > destroy_bb_vec_info (bb_vinfo); > > > ! return true; > > > } > > > ! destroy_bb_vec_info (bb_vinfo); > > > vector_sizes &= ~current_vector_size; > > > ! if (vector_sizes == 0 > > > ! || current_vector_size == 0) > > > ! return false; > > > ! /* Try the next biggest vector size. */ > > > ! current_vector_size = 1 << floor_log2 (vector_sizes); > > > ! if (dump_enabled_p ()) > > > ! dump_printf_loc (MSG_NOTE, vect_location, > > > ! "***** Re-trying analysis with " > > > ! "vector size %d\n", current_vector_size); > > > } > > > } > > > --- 2501,2605 ---- > > > vect_slp_bb (basic_block bb) > > > { > > > bb_vec_info bb_vinfo; > > > gimple_stmt_iterator gsi; > > > unsigned int vector_sizes; > > > + bool any_vectorized = false; > > > if (dump_enabled_p ()) > > > dump_printf_loc (MSG_NOTE, vect_location, > > > "===vect_slp_analyze_bb===\n"); > > > /* Autodetect first vector size we try. */ > > > current_vector_size = 0; > > > vector_sizes = targetm.vectorize.autovectorize_vector_sizes (); > > > + gsi = gsi_start_bb (bb); > > > + > > > while (1) > > > { > > > ! if (gsi_end_p (gsi)) > > > ! break; > > > ! > > > ! gimple_stmt_iterator region_begin = gsi; > > > ! vec<data_reference_p> datarefs = vNULL; > > > ! int insns = 0; > > > ! > > > ! for (; !gsi_end_p (gsi); gsi_next (&gsi)) > > > { > > > ! gimple *stmt = gsi_stmt (gsi); > > > ! if (is_gimple_debug (stmt)) > > > ! continue; > > > ! insns++; > > > ! > > > ! if (gimple_location (stmt) != UNKNOWN_LOCATION) > > > ! vect_location = gimple_location (stmt); > > > ! > > > ! if (!find_data_references_in_stmt (NULL, stmt, &datarefs)) > > > ! break; > > > ! } > > > ! > > > ! /* Skip leading unhandled stmts. */ > > > ! if (gsi_stmt (region_begin) == gsi_stmt (gsi)) > > > ! { > > > ! gsi_next (&gsi); > > > ! continue; > > > ! } > > > ! > > > ! gimple_stmt_iterator region_end = gsi; > > > + bool vectorized = false; > > > + bb_vinfo = vect_slp_analyze_bb_1 (region_begin, region_end, > > > + datarefs, insns); > > > + if (bb_vinfo > > > + && dbg_cnt (vect_slp)) > > > + { > > > if (dump_enabled_p ()) > > > ! dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB part\n"); > > > vect_schedule_slp (bb_vinfo); > > > if (dump_enabled_p ()) > > > dump_printf_loc (MSG_NOTE, vect_location, > > > ! "basic block part vectorized\n"); > > > destroy_bb_vec_info (bb_vinfo); > > > ! vectorized = true; > > > } > > > + else > > > + destroy_bb_vec_info (bb_vinfo); > > > ! any_vectorized |= vectorized; > > > vector_sizes &= ~current_vector_size; > > > ! if (vectorized > > > ! || vector_sizes == 0 > > > ! || current_vector_size == 0) > > > ! { > > > ! if (gsi_end_p (region_end)) > > > ! break; > > > ! > > > ! /* Skip the unhandled stmt. */ > > > ! gsi_next (&gsi); > > > ! > > > ! /* And reset vector sizes. */ > > > ! current_vector_size = 0; > > > ! vector_sizes = targetm.vectorize.autovectorize_vector_sizes (); > > > ! } > > > ! else > > > ! { > > > ! /* Try the next biggest vector size. */ > > > ! current_vector_size = 1 << floor_log2 (vector_sizes); > > > ! if (dump_enabled_p ()) > > > ! dump_printf_loc (MSG_NOTE, vect_location, > > > ! "***** Re-trying analysis with " > > > ! "vector size %d\n", current_vector_size); > > > ! /* Start over. */ > > > ! gsi = region_begin; > > > ! } > > > } > > > + > > > + return any_vectorized; > > > } > > > Index: gcc/tree-vect-patterns.c > > > =================================================================== > > > *** gcc/tree-vect-patterns.c.orig 2015-11-05 09:52:00.640227178 +0100 > > > --- gcc/tree-vect-patterns.c 2015-11-05 13:25:46.060011765 +0100 > > > *************** static bool > > > *** 107,133 **** > > > vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2) > > > { > > > stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1); > > > ! loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo); > > > ! bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_vinfo); > > > ! > > > ! if (!gimple_bb (stmt2)) > > > ! return false; > > > ! > > > ! if (loop_vinfo) > > > ! { > > > ! struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); > > > ! if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt2))) > > > ! return false; > > > ! } > > > ! else > > > ! { > > > ! if (gimple_bb (stmt2) != BB_VINFO_BB (bb_vinfo) > > > ! || gimple_code (stmt2) == GIMPLE_PHI) > > > ! return false; > > > ! } > > > ! > > > ! gcc_assert (vinfo_for_stmt (stmt2)); > > > ! return true; > > > } > > > /* If the LHS of DEF_STMT has a single use, and that statement is > > > --- 107,113 ---- > > > vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2) > > > { > > > stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1); > > > ! return vect_stmt_in_region_p (stmt_vinfo->vinfo, stmt2); > > > } > > > /* If the LHS of DEF_STMT has a single use, and that statement is > > > *************** vect_pattern_recog (vec_info *vinfo) > > > *** 3611,3643 **** > > > loop = LOOP_VINFO_LOOP (loop_vinfo); > > > bbs = LOOP_VINFO_BBS (loop_vinfo); > > > nbbs = loop->num_nodes; > > > } > > > else > > > { > > > ! bbs = &as_a <bb_vec_info> (vinfo)->bb; > > > ! nbbs = 1; > > > ! } > > > ! > > > ! /* Scan through the loop stmts, applying the pattern recognition > > > ! functions starting at each stmt visited: */ > > > ! for (i = 0; i < nbbs; i++) > > > ! { > > > ! basic_block bb = bbs[i]; > > > ! for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si)) > > > ! { > > > ! if (is_a <bb_vec_info> (vinfo) > > > ! && (stmt = gsi_stmt (si)) > > > && vinfo_for_stmt (stmt) > > > && !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt))) > > > ! continue; > > > ! /* Scan over all generic vect_recog_xxx_pattern functions. > > > */ > > > ! for (j = 0; j < NUM_PATTERNS; j++) > > > ! { > > > vect_recog_func = vect_vect_recog_func_ptrs[j]; > > > vect_pattern_recog_1 (vect_recog_func, si, > > > &stmts_to_replace); > > > ! } > > > ! } > > > } > > > } > > > --- 3591,3632 ---- > > > loop = LOOP_VINFO_LOOP (loop_vinfo); > > > bbs = LOOP_VINFO_BBS (loop_vinfo); > > > nbbs = loop->num_nodes; > > > + > > > + /* Scan through the loop stmts, applying the pattern recognition > > > + functions starting at each stmt visited: */ > > > + for (i = 0; i < nbbs; i++) > > > + { > > > + basic_block bb = bbs[i]; > > > + for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si)) > > > + { > > > + /* Scan over all generic vect_recog_xxx_pattern functions. */ > > > + for (j = 0; j < NUM_PATTERNS; j++) > > > + { > > > + vect_recog_func = vect_vect_recog_func_ptrs[j]; > > > + vect_pattern_recog_1 (vect_recog_func, si, > > > + &stmts_to_replace); > > > + } > > > + } > > > + } > > > } > > > else > > > { > > > ! bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo); > > > ! for (si = bb_vinfo->region_begin; > > > ! gsi_stmt (si) != gsi_stmt (bb_vinfo->region_end); gsi_next (&si)) > > > ! { > > > ! if ((stmt = gsi_stmt (si)) > > > && vinfo_for_stmt (stmt) > > > && !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt))) > > > ! continue; > > > ! /* Scan over all generic vect_recog_xxx_pattern functions. > > > */ > > > ! for (j = 0; j < NUM_PATTERNS; j++) > > > ! { > > > vect_recog_func = vect_vect_recog_func_ptrs[j]; > > > vect_pattern_recog_1 (vect_recog_func, si, > > > &stmts_to_replace); > > > ! } > > > ! } > > > } > > > } > > > Index: gcc/config/i386/i386.c > > > =================================================================== > > > *** gcc/config/i386/i386.c.orig 2015-11-05 09:52:42.239687133 +0100 > > > --- gcc/config/i386/i386.c 2015-11-05 11:09:09.451774562 +0100 > > > *************** along with GCC; see the file COPYING3. > > > *** 64,69 **** > > > --- 64,70 ---- > > > #include "context.h" > > > #include "pass_manager.h" > > > #include "target-globals.h" > > > + #include "gimple-iterator.h" > > > #include "tree-vectorizer.h" > > > #include "shrink-wrap.h" > > > #include "builtins.h" > > > Index: gcc/testsuite/gcc.dg/vect/bb-slp-38.c > > > =================================================================== > > > *** /dev/null 1970-01-01 00:00:00.000000000 +0000 > > > --- gcc/testsuite/gcc.dg/vect/bb-slp-38.c 2015-11-05 14:00:48.177644327 > > > +0100 > > > *************** > > > *** 0 **** > > > --- 1,44 ---- > > > + /* { dg-require-effective-target vect_int } */ > > > + > > > + #include "tree-vect.h" > > > + > > > + extern void abort (void); > > > + > > > + int a[8], b[8]; > > > + int x; > > > + > > > + void __attribute__((noinline,noclone)) > > > + bar (void) > > > + { > > > + x = 1; > > > + } > > > + > > > + void __attribute__((noinline,noclone)) > > > + foo(void) > > > + { > > > + a[0] = b[0]; > > > + a[1] = b[0]; > > > + a[2] = b[3]; > > > + a[3] = b[3]; > > > + bar (); > > > + a[4] = b[4]; > > > + a[5] = b[7]; > > > + a[6] = b[4]; > > > + a[7] = b[7]; > > > + } > > > + > > > + int main() > > > + { > > > + int i; > > > + check_vect (); > > > + for (i = 0; i < 8; ++i) > > > + b[i] = i; > > > + foo (); > > > + if (a[0] != 0 || a[1] != 0 || a[2] != 3 || a[3] != 3 > > > + || a[4] != 4 || a[5] != 7 || a[6] != 4 || a[7] != 7) > > > + abort (); > > > + return 0; > > > + } > > > + > > > + /* { dg-final { scan-tree-dump "basic block vectorized" "slp2" { target > > > vect_perm } } } */ > > > + /* { dg-final { scan-tree-dump-times "basic block part vectorized" 2 > > > "slp2" { target vect_perm } } } */ > > > Index: gcc/tree-vect-stmts.c > > > =================================================================== > > > *** gcc/tree-vect-stmts.c.orig 2015-11-02 12:37:11.074249388 +0100 > > > --- gcc/tree-vect-stmts.c 2015-11-05 13:29:21.413423692 +0100 > > > *************** vect_is_simple_use (tree operand, vec_in > > > *** 8196,8207 **** > > > dump_gimple_stmt (MSG_NOTE, TDF_SLIM, *def_stmt, 0); > > > } > > > ! basic_block bb = gimple_bb (*def_stmt); > > > ! if ((is_a <loop_vec_info> (vinfo) > > > ! && !flow_bb_inside_loop_p (as_a <loop_vec_info> (vinfo)->loop, > > > bb)) > > > ! || (is_a <bb_vec_info> (vinfo) > > > ! && (bb != as_a <bb_vec_info> (vinfo)->bb > > > ! || gimple_code (*def_stmt) == GIMPLE_PHI))) > > > *dt = vect_external_def; > > > else > > > { > > > --- 8196,8202 ---- > > > dump_gimple_stmt (MSG_NOTE, TDF_SLIM, *def_stmt, 0); > > > } > > > ! if (! vect_stmt_in_region_p (vinfo, *def_stmt)) > > > *dt = vect_external_def; > > > else > > > { > > > Index: gcc/tree-vectorizer.c > > > =================================================================== > > > *** gcc/tree-vectorizer.c.orig 2015-11-04 09:23:53.724687806 +0100 > > > --- gcc/tree-vectorizer.c 2015-11-05 13:55:08.299817570 +0100 > > > *************** vect_destroy_datarefs (vec_info *vinfo) > > > *** 350,355 **** > > > --- 350,382 ---- > > > } > > > + /* Return whether STMT is inside the region we try to vectorize. > > > */ > > > + > > > + bool > > > + vect_stmt_in_region_p (vec_info *vinfo, gimple *stmt) > > > + { > > > + if (!gimple_bb (stmt)) > > > + return false; > > > + > > > + if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo)) > > > + { > > > + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); > > > + if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt))) > > > + return false; > > > + } > > > + else > > > + { > > > + bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo); > > > + if (gimple_bb (stmt) != BB_VINFO_BB (bb_vinfo) > > > + || gimple_uid (stmt) == -1U > > > + || gimple_code (stmt) == GIMPLE_PHI) > > > + return false; > > > + } > > > + > > > + return true; > > > + } > > > + > > > + > > > /* If LOOP has been versioned during ifcvt, return the internal call > > > guarding it. */ > > > *************** pass_slp_vectorize::execute (function *f > > > *** 692,697 **** > > > --- 719,732 ---- > > > scev_initialize (); > > > } > > > + /* Mark all stmts as not belonging to the current region. */ > > > + FOR_EACH_BB_FN (bb, fun) > > > + { > > > + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p > > > (gsi); > > > + gsi_next (&gsi)) > > > + gimple_set_uid (gsi_stmt (gsi), -1); > > > + } > > > + > > > init_stmt_vec_info_vec (); > > > FOR_EACH_BB_FN (bb, fun) > > > Index: gcc/config/aarch64/aarch64.c > > > =================================================================== > > > *** gcc/config/aarch64/aarch64.c.orig 2015-10-28 11:22:25.290823112 > > > +0100 > > > --- gcc/config/aarch64/aarch64.c 2015-11-06 10:24:21.539818027 +0100 > > > *************** > > > *** 52,57 **** > > > --- 52,58 ---- > > > #include "params.h" > > > #include "gimplify.h" > > > #include "dwarf2.h" > > > + #include "gimple-iterator.h" > > > #include "tree-vectorizer.h" > > > #include "aarch64-cost-tables.h" > > > #include "dumpfile.h" > > > > > -- Richard Biener <rguenther@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Make BB vectorizer work on sub-BBs 2015-11-06 11:12 ` Kyrill Tkachov 2015-11-06 11:27 ` Richard Biener @ 2015-11-10 12:56 ` Christophe Lyon 2015-11-10 13:03 ` Richard Biener 1 sibling, 1 reply; 8+ messages in thread From: Christophe Lyon @ 2015-11-10 12:56 UTC (permalink / raw) To: Kyrill Tkachov; +Cc: Richard Biener, gcc-patches On 6 November 2015 at 12:11, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote: > Hi Richard, > > > On 06/11/15 11:09, Richard Biener wrote: >> >> On Fri, 6 Nov 2015, Richard Biener wrote: >> >>> The following patch makes the BB vectorizer not only handle BB heads >>> (until the first stmt with a data reference it cannot handle) but >>> arbitrary regions in a BB separated by such stmts. >>> >>> This improves the number of BB vectorizations from 469 to 556 >>> in a quick test on SPEC CPU 2006 with -Ofast on x86_64 and >>> 1x400.perlbench 1x410.bwaves 1x416.gamess 1x450.soplex 1x453.povray >>> 1x481.wrf failing both patched and unpatched (have to update my >>> config used for such experiments it seems ...) >>> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu, aarch64 cross built. >>> >>> I'm currently re-testing for a cosmetic change I made when writing >>> the changelog. >>> >>> I expected (and there are) some issues with compile-time. Left >>> is unpatched and right is patched. >>> >>> '403.gcc': 00:00:54 (54) | '403.gcc': 00:00:55 (55) >>> '483.xalancbmk': 00:02:20 (140) | '483.xalancbmk': 00:02:24 (144) >>> '416.gamess': 00:02:36 (156) | '416.gamess': 00:02:37 (157) >>> '435.gromacs': 00:00:18 (18) | '435.gromacs': 00:00:19 (19) >>> '447.dealII': 00:01:31 (91) | '447.dealII': 00:01:33 (93) >>> '453.povray': 00:04:54 (294) | '453.povray': 00:08:54 (534) >>> '454.calculix': 00:00:34 (34) | '454.calculix': 00:00:52 (52) >>> '481.wrf': 00:01:57 (117) | '481.wrf': 00:01:59 (119) >>> >>> other benchmarks are unchanged. I'm double-checking now that a followup >>> patch I have which re-implements BB vectorization dependence checking >>> fixes this (that's the only quadraticness I know of). >> >> Fixes all but >> >> '453.povray': 00:04:54 (294) | '453.povray': 00:06:46 (406) > > > Note that povray is currently suffering from PR 68198 > Hi, I've also noticed that the new test bb-slp-38 fails on armeb: FAIL: gcc.dg/vect/bb-slp-38.c -flto -ffat-lto-objects scan-tree-dump-times slp2 "basic block part vectorized" 2 FAIL: gcc.dg/vect/bb-slp-38.c scan-tree-dump-times slp2 "basic block part vectorized" 2 I haven't checked in more detail, maybe it's similar to what we discussed in PR65962 > Kyrill > > >> >> it even improves compile-time on some: >> >> '464.h264ref': 00:00:26 (26) | '464.h264ref': 00:00:21 (21) >> >> it also increases the number of vectorized BBs to 722. >> >> Needs some work still though. >> >> Richard. >> >>> Richard. >>> >>> 2015-11-06 Richard Biener <rguenther@suse.de> >>> >>> * tree-vectorizer.h (struct _bb_vec_info): Add region_begin/end >>> members. >>> (vect_stmt_in_region_p): Declare. >>> * tree-vect-slp.c (new_bb_vec_info): Work on a region. >>> (destroy_bb_vec_info): Likewise. >>> (vect_bb_slp_scalar_cost): Use vect_stmt_in_region_p. >>> (vect_get_and_check_slp_defs): Likewise. >>> (vect_slp_analyze_bb_1): Refactor to make it work on sub-BBs. >>> (vect_slp_bb): Likewise. >>> * tree-vect-patterns.c (vect_same_loop_or_bb_p): Implement >>> in terms of vect_stmt_in_region_p. >>> (vect_pattern_recog): Iterate over the BB region. >>> * tree-vect-stmts.c (vect_is_simple_use): Use >>> vect_stmt_in_region_p. >>> * tree-vectorizer.c (vect_stmt_in_region_p): New function. >>> (pass_slp_vectorize::execute): Initialize all stmt UIDs to -1. >>> >>> * config/i386/i386.c: Include gimple-iterator.h. >>> * config/aarch64/aarch64.c: Likewise. >>> >>> * gcc.dg/vect/bb-slp-38.c: New testcase. >>> >>> Index: gcc/tree-vectorizer.h >>> =================================================================== >>> *** gcc/tree-vectorizer.h.orig 2015-11-05 09:52:00.640227178 +0100 >>> --- gcc/tree-vectorizer.h 2015-11-05 13:20:58.385786476 +0100 >>> *************** nested_in_vect_loop_p (struct loop *loop >>> *** 390,395 **** >>> --- 390,397 ---- >>> typedef struct _bb_vec_info : public vec_info >>> { >>> basic_block bb; >>> + gimple_stmt_iterator region_begin; >>> + gimple_stmt_iterator region_end; >>> } *bb_vec_info; >>> #define BB_VINFO_BB(B) (B)->bb >>> *************** void vect_pattern_recog (vec_info *); >>> *** 1085,1089 **** >>> --- 1087,1092 ---- >>> /* In tree-vectorizer.c. */ >>> unsigned vectorize_loops (void); >>> void vect_destroy_datarefs (vec_info *); >>> + bool vect_stmt_in_region_p (vec_info *, gimple *); >>> #endif /* GCC_TREE_VECTORIZER_H */ >>> Index: gcc/tree-vect-slp.c >>> =================================================================== >>> *** gcc/tree-vect-slp.c.orig 2015-11-05 09:52:00.640227178 +0100 >>> --- gcc/tree-vect-slp.c 2015-11-06 10:22:56.707880233 +0100 >>> *************** vect_get_and_check_slp_defs (vec_info *v >>> *** 209,215 **** >>> unsigned int i, number_of_oprnds; >>> gimple *def_stmt; >>> enum vect_def_type dt = vect_uninitialized_def; >>> - struct loop *loop = NULL; >>> bool pattern = false; >>> slp_oprnd_info oprnd_info; >>> int first_op_idx = 1; >>> --- 209,214 ---- >>> *************** vect_get_and_check_slp_defs (vec_info *v >>> *** 218,226 **** >>> bool first = stmt_num == 0; >>> bool second = stmt_num == 1; >>> - if (is_a <loop_vec_info> (vinfo)) >>> - loop = LOOP_VINFO_LOOP (as_a <loop_vec_info> (vinfo)); >>> - >>> if (is_gimple_call (stmt)) >>> { >>> number_of_oprnds = gimple_call_num_args (stmt); >>> --- 217,222 ---- >>> *************** again: >>> *** 276,286 **** >>> from the pattern. Check that all the stmts of the node are >>> in the >>> pattern. */ >>> if (def_stmt && gimple_bb (def_stmt) >>> ! && ((is_a <loop_vec_info> (vinfo) >>> ! && flow_bb_inside_loop_p (loop, gimple_bb (def_stmt))) >>> ! || (is_a <bb_vec_info> (vinfo) >>> ! && gimple_bb (def_stmt) == as_a <bb_vec_info> >>> (vinfo)->bb >>> ! && gimple_code (def_stmt) != GIMPLE_PHI)) >>> && vinfo_for_stmt (def_stmt) >>> && STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (def_stmt)) >>> && !STMT_VINFO_RELEVANT (vinfo_for_stmt (def_stmt)) >>> --- 272,278 ---- >>> from the pattern. Check that all the stmts of the node are >>> in the >>> pattern. */ >>> if (def_stmt && gimple_bb (def_stmt) >>> ! && vect_stmt_in_region_p (vinfo, def_stmt) >>> && vinfo_for_stmt (def_stmt) >>> && STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (def_stmt)) >>> && !STMT_VINFO_RELEVANT (vinfo_for_stmt (def_stmt)) >>> *************** vect_detect_hybrid_slp (loop_vec_info lo >>> *** 2076,2091 **** >>> stmt_vec_info structs for all the stmts in it. */ >>> static bb_vec_info >>> ! new_bb_vec_info (basic_block bb) >>> { >>> bb_vec_info res = NULL; >>> gimple_stmt_iterator gsi; >>> res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info)); >>> res->kind = vec_info::bb; >>> BB_VINFO_BB (res) = bb; >>> ! for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) >>> { >>> gimple *stmt = gsi_stmt (gsi); >>> gimple_set_uid (stmt, 0); >>> --- 2068,2088 ---- >>> stmt_vec_info structs for all the stmts in it. */ >>> static bb_vec_info >>> ! new_bb_vec_info (gimple_stmt_iterator region_begin, >>> ! gimple_stmt_iterator region_end) >>> { >>> + basic_block bb = gsi_bb (region_begin); >>> bb_vec_info res = NULL; >>> gimple_stmt_iterator gsi; >>> res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info)); >>> res->kind = vec_info::bb; >>> BB_VINFO_BB (res) = bb; >>> + res->region_begin = region_begin; >>> + res->region_end = region_end; >>> ! for (gsi = region_begin; gsi_stmt (gsi) != gsi_stmt (region_end); >>> ! gsi_next (&gsi)) >>> { >>> gimple *stmt = gsi_stmt (gsi); >>> gimple_set_uid (stmt, 0); >>> *************** destroy_bb_vec_info (bb_vec_info bb_vinf >>> *** 2118,2124 **** >>> bb = BB_VINFO_BB (bb_vinfo); >>> ! for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si)) >>> { >>> gimple *stmt = gsi_stmt (si); >>> stmt_vec_info stmt_info = vinfo_for_stmt (stmt); >>> --- 2115,2122 ---- >>> bb = BB_VINFO_BB (bb_vinfo); >>> ! for (si = bb_vinfo->region_begin; >>> ! gsi_stmt (si) != gsi_stmt (bb_vinfo->region_end); gsi_next >>> (&si)) >>> { >>> gimple *stmt = gsi_stmt (si); >>> stmt_vec_info stmt_info = vinfo_for_stmt (stmt); >>> *************** destroy_bb_vec_info (bb_vec_info bb_vinf >>> *** 2126,2131 **** >>> --- 2124,2132 ---- >>> if (stmt_info) >>> /* Free stmt_vec_info. */ >>> free_stmt_vec_info (stmt); >>> + >>> + /* Reset region marker. */ >>> + gimple_set_uid (stmt, -1); >>> } >>> vect_destroy_datarefs (bb_vinfo); >>> *************** vect_bb_slp_scalar_cost (basic_block bb, >>> *** 2247,2254 **** >>> gimple *use_stmt; >>> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR >>> (def_p)) >>> if (!is_gimple_debug (use_stmt) >>> ! && (gimple_code (use_stmt) == GIMPLE_PHI >>> ! || gimple_bb (use_stmt) != bb >>> || !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt >>> (use_stmt)))) >>> { >>> (*life)[i] = true; >>> --- 2248,2255 ---- >>> gimple *use_stmt; >>> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR >>> (def_p)) >>> if (!is_gimple_debug (use_stmt) >>> ! && (! vect_stmt_in_region_p (vinfo_for_stmt >>> (stmt)->vinfo, >>> ! use_stmt) >>> || !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt >>> (use_stmt)))) >>> { >>> (*life)[i] = true; >>> *************** vect_bb_vectorization_profitable_p (bb_v >>> *** 2327,2366 **** >>> /* Check if the basic block can be vectorized. */ >>> static bb_vec_info >>> ! vect_slp_analyze_bb_1 (basic_block bb) >>> { >>> bb_vec_info bb_vinfo; >>> vec<slp_instance> slp_instances; >>> slp_instance instance; >>> int i; >>> int min_vf = 2; >>> - unsigned n_stmts = 0; >>> ! bb_vinfo = new_bb_vec_info (bb); >>> if (!bb_vinfo) >>> return NULL; >>> ! /* Gather all data references in the basic-block. */ >>> ! >>> ! for (gimple_stmt_iterator gsi = gsi_start_bb (bb); >>> ! !gsi_end_p (gsi); gsi_next (&gsi)) >>> ! { >>> ! gimple *stmt = gsi_stmt (gsi); >>> ! if (is_gimple_debug (stmt)) >>> ! continue; >>> ! ++n_stmts; >>> ! if (!find_data_references_in_stmt (NULL, stmt, >>> ! &BB_VINFO_DATAREFS (bb_vinfo))) >>> ! { >>> ! /* Mark the rest of the basic-block as unvectorizable. */ >>> ! for (; !gsi_end_p (gsi); gsi_next (&gsi)) >>> ! { >>> ! stmt = gsi_stmt (gsi); >>> ! STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt)) = false; >>> ! } >>> ! break; >>> ! } >>> ! } >>> /* Analyze the data references. */ >>> --- 2328,2358 ---- >>> /* Check if the basic block can be vectorized. */ >>> static bb_vec_info >>> ! vect_slp_analyze_bb_1 (gimple_stmt_iterator region_begin, >>> ! gimple_stmt_iterator region_end, >>> ! vec<data_reference_p> datarefs, int n_stmts) >>> { >>> bb_vec_info bb_vinfo; >>> vec<slp_instance> slp_instances; >>> slp_instance instance; >>> int i; >>> int min_vf = 2; >>> ! if (n_stmts > PARAM_VALUE (PARAM_SLP_MAX_INSNS_IN_BB)) >>> ! { >>> ! if (dump_enabled_p ()) >>> ! dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >>> ! "not vectorized: too many instructions in " >>> ! "basic block.\n"); >>> ! free_data_refs (datarefs); >>> ! return NULL; >>> ! } >>> ! >>> ! bb_vinfo = new_bb_vec_info (region_begin, region_end); >>> if (!bb_vinfo) >>> return NULL; >>> ! BB_VINFO_DATAREFS (bb_vinfo) = datarefs; >>> /* Analyze the data references. */ >>> *************** vect_slp_analyze_bb_1 (basic_block bb) >>> *** 2438,2445 **** >>> } >>> /* Mark all the statements that we do not want to vectorize. */ >>> ! for (gimple_stmt_iterator gsi = gsi_start_bb (BB_VINFO_BB >>> (bb_vinfo)); >>> ! !gsi_end_p (gsi); gsi_next (&gsi)) >>> { >>> stmt_vec_info vinfo = vinfo_for_stmt (gsi_stmt (gsi)); >>> if (STMT_SLP_TYPE (vinfo) != pure_slp) >>> --- 2430,2437 ---- >>> } >>> /* Mark all the statements that we do not want to vectorize. */ >>> ! for (gimple_stmt_iterator gsi = bb_vinfo->region_begin; >>> ! gsi_stmt (gsi) != gsi_stmt (bb_vinfo->region_end); gsi_next >>> (&gsi)) >>> { >>> stmt_vec_info vinfo = vinfo_for_stmt (gsi_stmt (gsi)); >>> if (STMT_SLP_TYPE (vinfo) != pure_slp) >>> *************** bool >>> *** 2509,2585 **** >>> vect_slp_bb (basic_block bb) >>> { >>> bb_vec_info bb_vinfo; >>> - int insns = 0; >>> gimple_stmt_iterator gsi; >>> unsigned int vector_sizes; >>> if (dump_enabled_p ()) >>> dump_printf_loc (MSG_NOTE, vect_location, >>> "===vect_slp_analyze_bb===\n"); >>> - for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) >>> - { >>> - gimple *stmt = gsi_stmt (gsi); >>> - if (!is_gimple_debug (stmt) >>> - && !gimple_nop_p (stmt) >>> - && gimple_code (stmt) != GIMPLE_LABEL) >>> - insns++; >>> - if (gimple_location (stmt) != UNKNOWN_LOCATION) >>> - vect_location = gimple_location (stmt); >>> - } >>> - >>> - if (insns > PARAM_VALUE (PARAM_SLP_MAX_INSNS_IN_BB)) >>> - { >>> - if (dump_enabled_p ()) >>> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >>> - "not vectorized: too many instructions in " >>> - "basic block.\n"); >>> - >>> - return false; >>> - } >>> - >>> /* Autodetect first vector size we try. */ >>> current_vector_size = 0; >>> vector_sizes = targetm.vectorize.autovectorize_vector_sizes (); >>> while (1) >>> { >>> ! bb_vinfo = vect_slp_analyze_bb_1 (bb); >>> ! if (bb_vinfo) >>> { >>> ! if (!dbg_cnt (vect_slp)) >>> ! { >>> ! destroy_bb_vec_info (bb_vinfo); >>> ! return false; >>> ! } >>> if (dump_enabled_p ()) >>> ! dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB\n"); >>> vect_schedule_slp (bb_vinfo); >>> if (dump_enabled_p ()) >>> dump_printf_loc (MSG_NOTE, vect_location, >>> ! "BASIC BLOCK VECTORIZED\n"); >>> destroy_bb_vec_info (bb_vinfo); >>> ! return true; >>> } >>> ! destroy_bb_vec_info (bb_vinfo); >>> vector_sizes &= ~current_vector_size; >>> ! if (vector_sizes == 0 >>> ! || current_vector_size == 0) >>> ! return false; >>> ! /* Try the next biggest vector size. */ >>> ! current_vector_size = 1 << floor_log2 (vector_sizes); >>> ! if (dump_enabled_p ()) >>> ! dump_printf_loc (MSG_NOTE, vect_location, >>> ! "***** Re-trying analysis with " >>> ! "vector size %d\n", current_vector_size); >>> } >>> } >>> --- 2501,2605 ---- >>> vect_slp_bb (basic_block bb) >>> { >>> bb_vec_info bb_vinfo; >>> gimple_stmt_iterator gsi; >>> unsigned int vector_sizes; >>> + bool any_vectorized = false; >>> if (dump_enabled_p ()) >>> dump_printf_loc (MSG_NOTE, vect_location, >>> "===vect_slp_analyze_bb===\n"); >>> /* Autodetect first vector size we try. */ >>> current_vector_size = 0; >>> vector_sizes = targetm.vectorize.autovectorize_vector_sizes (); >>> + gsi = gsi_start_bb (bb); >>> + >>> while (1) >>> { >>> ! if (gsi_end_p (gsi)) >>> ! break; >>> ! >>> ! gimple_stmt_iterator region_begin = gsi; >>> ! vec<data_reference_p> datarefs = vNULL; >>> ! int insns = 0; >>> ! >>> ! for (; !gsi_end_p (gsi); gsi_next (&gsi)) >>> { >>> ! gimple *stmt = gsi_stmt (gsi); >>> ! if (is_gimple_debug (stmt)) >>> ! continue; >>> ! insns++; >>> ! >>> ! if (gimple_location (stmt) != UNKNOWN_LOCATION) >>> ! vect_location = gimple_location (stmt); >>> ! >>> ! if (!find_data_references_in_stmt (NULL, stmt, &datarefs)) >>> ! break; >>> ! } >>> ! >>> ! /* Skip leading unhandled stmts. */ >>> ! if (gsi_stmt (region_begin) == gsi_stmt (gsi)) >>> ! { >>> ! gsi_next (&gsi); >>> ! continue; >>> ! } >>> ! >>> ! gimple_stmt_iterator region_end = gsi; >>> + bool vectorized = false; >>> + bb_vinfo = vect_slp_analyze_bb_1 (region_begin, region_end, >>> + datarefs, insns); >>> + if (bb_vinfo >>> + && dbg_cnt (vect_slp)) >>> + { >>> if (dump_enabled_p ()) >>> ! dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB >>> part\n"); >>> vect_schedule_slp (bb_vinfo); >>> if (dump_enabled_p ()) >>> dump_printf_loc (MSG_NOTE, vect_location, >>> ! "basic block part vectorized\n"); >>> destroy_bb_vec_info (bb_vinfo); >>> ! vectorized = true; >>> } >>> + else >>> + destroy_bb_vec_info (bb_vinfo); >>> ! any_vectorized |= vectorized; >>> vector_sizes &= ~current_vector_size; >>> ! if (vectorized >>> ! || vector_sizes == 0 >>> ! || current_vector_size == 0) >>> ! { >>> ! if (gsi_end_p (region_end)) >>> ! break; >>> ! >>> ! /* Skip the unhandled stmt. */ >>> ! gsi_next (&gsi); >>> ! >>> ! /* And reset vector sizes. */ >>> ! current_vector_size = 0; >>> ! vector_sizes = targetm.vectorize.autovectorize_vector_sizes (); >>> ! } >>> ! else >>> ! { >>> ! /* Try the next biggest vector size. */ >>> ! current_vector_size = 1 << floor_log2 (vector_sizes); >>> ! if (dump_enabled_p ()) >>> ! dump_printf_loc (MSG_NOTE, vect_location, >>> ! "***** Re-trying analysis with " >>> ! "vector size %d\n", current_vector_size); >>> ! /* Start over. */ >>> ! gsi = region_begin; >>> ! } >>> } >>> + >>> + return any_vectorized; >>> } >>> Index: gcc/tree-vect-patterns.c >>> =================================================================== >>> *** gcc/tree-vect-patterns.c.orig 2015-11-05 09:52:00.640227178 >>> +0100 >>> --- gcc/tree-vect-patterns.c 2015-11-05 13:25:46.060011765 +0100 >>> *************** static bool >>> *** 107,133 **** >>> vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2) >>> { >>> stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1); >>> ! loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo); >>> ! bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_vinfo); >>> ! >>> ! if (!gimple_bb (stmt2)) >>> ! return false; >>> ! >>> ! if (loop_vinfo) >>> ! { >>> ! struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); >>> ! if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt2))) >>> ! return false; >>> ! } >>> ! else >>> ! { >>> ! if (gimple_bb (stmt2) != BB_VINFO_BB (bb_vinfo) >>> ! || gimple_code (stmt2) == GIMPLE_PHI) >>> ! return false; >>> ! } >>> ! >>> ! gcc_assert (vinfo_for_stmt (stmt2)); >>> ! return true; >>> } >>> /* If the LHS of DEF_STMT has a single use, and that statement is >>> --- 107,113 ---- >>> vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2) >>> { >>> stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1); >>> ! return vect_stmt_in_region_p (stmt_vinfo->vinfo, stmt2); >>> } >>> /* If the LHS of DEF_STMT has a single use, and that statement is >>> *************** vect_pattern_recog (vec_info *vinfo) >>> *** 3611,3643 **** >>> loop = LOOP_VINFO_LOOP (loop_vinfo); >>> bbs = LOOP_VINFO_BBS (loop_vinfo); >>> nbbs = loop->num_nodes; >>> } >>> else >>> { >>> ! bbs = &as_a <bb_vec_info> (vinfo)->bb; >>> ! nbbs = 1; >>> ! } >>> ! >>> ! /* Scan through the loop stmts, applying the pattern recognition >>> ! functions starting at each stmt visited: */ >>> ! for (i = 0; i < nbbs; i++) >>> ! { >>> ! basic_block bb = bbs[i]; >>> ! for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si)) >>> ! { >>> ! if (is_a <bb_vec_info> (vinfo) >>> ! && (stmt = gsi_stmt (si)) >>> && vinfo_for_stmt (stmt) >>> && !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt))) >>> ! continue; >>> ! /* Scan over all generic vect_recog_xxx_pattern functions. >>> */ >>> ! for (j = 0; j < NUM_PATTERNS; j++) >>> ! { >>> vect_recog_func = vect_vect_recog_func_ptrs[j]; >>> vect_pattern_recog_1 (vect_recog_func, si, >>> &stmts_to_replace); >>> ! } >>> ! } >>> } >>> } >>> --- 3591,3632 ---- >>> loop = LOOP_VINFO_LOOP (loop_vinfo); >>> bbs = LOOP_VINFO_BBS (loop_vinfo); >>> nbbs = loop->num_nodes; >>> + >>> + /* Scan through the loop stmts, applying the pattern recognition >>> + functions starting at each stmt visited: */ >>> + for (i = 0; i < nbbs; i++) >>> + { >>> + basic_block bb = bbs[i]; >>> + for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si)) >>> + { >>> + /* Scan over all generic vect_recog_xxx_pattern functions. >>> */ >>> + for (j = 0; j < NUM_PATTERNS; j++) >>> + { >>> + vect_recog_func = vect_vect_recog_func_ptrs[j]; >>> + vect_pattern_recog_1 (vect_recog_func, si, >>> + &stmts_to_replace); >>> + } >>> + } >>> + } >>> } >>> else >>> { >>> ! bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo); >>> ! for (si = bb_vinfo->region_begin; >>> ! gsi_stmt (si) != gsi_stmt (bb_vinfo->region_end); gsi_next >>> (&si)) >>> ! { >>> ! if ((stmt = gsi_stmt (si)) >>> && vinfo_for_stmt (stmt) >>> && !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt))) >>> ! continue; >>> ! /* Scan over all generic vect_recog_xxx_pattern functions. */ >>> ! for (j = 0; j < NUM_PATTERNS; j++) >>> ! { >>> vect_recog_func = vect_vect_recog_func_ptrs[j]; >>> vect_pattern_recog_1 (vect_recog_func, si, >>> &stmts_to_replace); >>> ! } >>> ! } >>> } >>> } >>> Index: gcc/config/i386/i386.c >>> =================================================================== >>> *** gcc/config/i386/i386.c.orig 2015-11-05 09:52:42.239687133 +0100 >>> --- gcc/config/i386/i386.c 2015-11-05 11:09:09.451774562 +0100 >>> *************** along with GCC; see the file COPYING3. >>> *** 64,69 **** >>> --- 64,70 ---- >>> #include "context.h" >>> #include "pass_manager.h" >>> #include "target-globals.h" >>> + #include "gimple-iterator.h" >>> #include "tree-vectorizer.h" >>> #include "shrink-wrap.h" >>> #include "builtins.h" >>> Index: gcc/testsuite/gcc.dg/vect/bb-slp-38.c >>> =================================================================== >>> *** /dev/null 1970-01-01 00:00:00.000000000 +0000 >>> --- gcc/testsuite/gcc.dg/vect/bb-slp-38.c 2015-11-05 >>> 14:00:48.177644327 +0100 >>> *************** >>> *** 0 **** >>> --- 1,44 ---- >>> + /* { dg-require-effective-target vect_int } */ >>> + >>> + #include "tree-vect.h" >>> + >>> + extern void abort (void); >>> + >>> + int a[8], b[8]; >>> + int x; >>> + >>> + void __attribute__((noinline,noclone)) >>> + bar (void) >>> + { >>> + x = 1; >>> + } >>> + >>> + void __attribute__((noinline,noclone)) >>> + foo(void) >>> + { >>> + a[0] = b[0]; >>> + a[1] = b[0]; >>> + a[2] = b[3]; >>> + a[3] = b[3]; >>> + bar (); >>> + a[4] = b[4]; >>> + a[5] = b[7]; >>> + a[6] = b[4]; >>> + a[7] = b[7]; >>> + } >>> + >>> + int main() >>> + { >>> + int i; >>> + check_vect (); >>> + for (i = 0; i < 8; ++i) >>> + b[i] = i; >>> + foo (); >>> + if (a[0] != 0 || a[1] != 0 || a[2] != 3 || a[3] != 3 >>> + || a[4] != 4 || a[5] != 7 || a[6] != 4 || a[7] != 7) >>> + abort (); >>> + return 0; >>> + } >>> + >>> + /* { dg-final { scan-tree-dump "basic block vectorized" "slp2" { target >>> vect_perm } } } */ >>> + /* { dg-final { scan-tree-dump-times "basic block part vectorized" 2 >>> "slp2" { target vect_perm } } } */ >>> Index: gcc/tree-vect-stmts.c >>> =================================================================== >>> *** gcc/tree-vect-stmts.c.orig 2015-11-02 12:37:11.074249388 +0100 >>> --- gcc/tree-vect-stmts.c 2015-11-05 13:29:21.413423692 +0100 >>> *************** vect_is_simple_use (tree operand, vec_in >>> *** 8196,8207 **** >>> dump_gimple_stmt (MSG_NOTE, TDF_SLIM, *def_stmt, 0); >>> } >>> ! basic_block bb = gimple_bb (*def_stmt); >>> ! if ((is_a <loop_vec_info> (vinfo) >>> ! && !flow_bb_inside_loop_p (as_a <loop_vec_info> (vinfo)->loop, >>> bb)) >>> ! || (is_a <bb_vec_info> (vinfo) >>> ! && (bb != as_a <bb_vec_info> (vinfo)->bb >>> ! || gimple_code (*def_stmt) == GIMPLE_PHI))) >>> *dt = vect_external_def; >>> else >>> { >>> --- 8196,8202 ---- >>> dump_gimple_stmt (MSG_NOTE, TDF_SLIM, *def_stmt, 0); >>> } >>> ! if (! vect_stmt_in_region_p (vinfo, *def_stmt)) >>> *dt = vect_external_def; >>> else >>> { >>> Index: gcc/tree-vectorizer.c >>> =================================================================== >>> *** gcc/tree-vectorizer.c.orig 2015-11-04 09:23:53.724687806 +0100 >>> --- gcc/tree-vectorizer.c 2015-11-05 13:55:08.299817570 +0100 >>> *************** vect_destroy_datarefs (vec_info *vinfo) >>> *** 350,355 **** >>> --- 350,382 ---- >>> } >>> + /* Return whether STMT is inside the region we try to vectorize. >>> */ >>> + >>> + bool >>> + vect_stmt_in_region_p (vec_info *vinfo, gimple *stmt) >>> + { >>> + if (!gimple_bb (stmt)) >>> + return false; >>> + >>> + if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo)) >>> + { >>> + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); >>> + if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt))) >>> + return false; >>> + } >>> + else >>> + { >>> + bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo); >>> + if (gimple_bb (stmt) != BB_VINFO_BB (bb_vinfo) >>> + || gimple_uid (stmt) == -1U >>> + || gimple_code (stmt) == GIMPLE_PHI) >>> + return false; >>> + } >>> + >>> + return true; >>> + } >>> + >>> + >>> /* If LOOP has been versioned during ifcvt, return the internal call >>> guarding it. */ >>> *************** pass_slp_vectorize::execute (function *f >>> *** 692,697 **** >>> --- 719,732 ---- >>> scev_initialize (); >>> } >>> + /* Mark all stmts as not belonging to the current region. */ >>> + FOR_EACH_BB_FN (bb, fun) >>> + { >>> + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p >>> (gsi); >>> + gsi_next (&gsi)) >>> + gimple_set_uid (gsi_stmt (gsi), -1); >>> + } >>> + >>> init_stmt_vec_info_vec (); >>> FOR_EACH_BB_FN (bb, fun) >>> Index: gcc/config/aarch64/aarch64.c >>> =================================================================== >>> *** gcc/config/aarch64/aarch64.c.orig 2015-10-28 11:22:25.290823112 >>> +0100 >>> --- gcc/config/aarch64/aarch64.c 2015-11-06 10:24:21.539818027 >>> +0100 >>> *************** >>> *** 52,57 **** >>> --- 52,58 ---- >>> #include "params.h" >>> #include "gimplify.h" >>> #include "dwarf2.h" >>> + #include "gimple-iterator.h" >>> #include "tree-vectorizer.h" >>> #include "aarch64-cost-tables.h" >>> #include "dumpfile.h" >>> > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Make BB vectorizer work on sub-BBs 2015-11-10 12:56 ` Christophe Lyon @ 2015-11-10 13:03 ` Richard Biener 2015-11-10 15:20 ` Christophe Lyon 0 siblings, 1 reply; 8+ messages in thread From: Richard Biener @ 2015-11-10 13:03 UTC (permalink / raw) To: Christophe Lyon; +Cc: Kyrill Tkachov, gcc-patches On Tue, 10 Nov 2015, Christophe Lyon wrote: > On 6 November 2015 at 12:11, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote: > > Hi Richard, > > > > > > On 06/11/15 11:09, Richard Biener wrote: > >> > >> On Fri, 6 Nov 2015, Richard Biener wrote: > >> > >>> The following patch makes the BB vectorizer not only handle BB heads > >>> (until the first stmt with a data reference it cannot handle) but > >>> arbitrary regions in a BB separated by such stmts. > >>> > >>> This improves the number of BB vectorizations from 469 to 556 > >>> in a quick test on SPEC CPU 2006 with -Ofast on x86_64 and > >>> 1x400.perlbench 1x410.bwaves 1x416.gamess 1x450.soplex 1x453.povray > >>> 1x481.wrf failing both patched and unpatched (have to update my > >>> config used for such experiments it seems ...) > >>> > >>> Bootstrapped and tested on x86_64-unknown-linux-gnu, aarch64 cross built. > >>> > >>> I'm currently re-testing for a cosmetic change I made when writing > >>> the changelog. > >>> > >>> I expected (and there are) some issues with compile-time. Left > >>> is unpatched and right is patched. > >>> > >>> '403.gcc': 00:00:54 (54) | '403.gcc': 00:00:55 (55) > >>> '483.xalancbmk': 00:02:20 (140) | '483.xalancbmk': 00:02:24 (144) > >>> '416.gamess': 00:02:36 (156) | '416.gamess': 00:02:37 (157) > >>> '435.gromacs': 00:00:18 (18) | '435.gromacs': 00:00:19 (19) > >>> '447.dealII': 00:01:31 (91) | '447.dealII': 00:01:33 (93) > >>> '453.povray': 00:04:54 (294) | '453.povray': 00:08:54 (534) > >>> '454.calculix': 00:00:34 (34) | '454.calculix': 00:00:52 (52) > >>> '481.wrf': 00:01:57 (117) | '481.wrf': 00:01:59 (119) > >>> > >>> other benchmarks are unchanged. I'm double-checking now that a followup > >>> patch I have which re-implements BB vectorization dependence checking > >>> fixes this (that's the only quadraticness I know of). > >> > >> Fixes all but > >> > >> '453.povray': 00:04:54 (294) | '453.povray': 00:06:46 (406) > > > > > > Note that povray is currently suffering from PR 68198 > > > > Hi, > > I've also noticed that the new test bb-slp-38 fails on armeb: > FAIL: gcc.dg/vect/bb-slp-38.c -flto -ffat-lto-objects > scan-tree-dump-times slp2 "basic block part vectorized" 2 > FAIL: gcc.dg/vect/bb-slp-38.c scan-tree-dump-times slp2 "basic block > part vectorized" 2 > > I haven't checked in more detail, maybe it's similar to what we > discussed in PR65962 Maybe though there is no misalignment involved as far as I can see. Please open a bug and attach vectorizer dumps. Richard. > > Kyrill > > > > > >> > >> it even improves compile-time on some: > >> > >> '464.h264ref': 00:00:26 (26) | '464.h264ref': 00:00:21 (21) > >> > >> it also increases the number of vectorized BBs to 722. > >> > >> Needs some work still though. > >> > >> Richard. > >> > >>> Richard. > >>> > >>> 2015-11-06 Richard Biener <rguenther@suse.de> > >>> > >>> * tree-vectorizer.h (struct _bb_vec_info): Add region_begin/end > >>> members. > >>> (vect_stmt_in_region_p): Declare. > >>> * tree-vect-slp.c (new_bb_vec_info): Work on a region. > >>> (destroy_bb_vec_info): Likewise. > >>> (vect_bb_slp_scalar_cost): Use vect_stmt_in_region_p. > >>> (vect_get_and_check_slp_defs): Likewise. > >>> (vect_slp_analyze_bb_1): Refactor to make it work on sub-BBs. > >>> (vect_slp_bb): Likewise. > >>> * tree-vect-patterns.c (vect_same_loop_or_bb_p): Implement > >>> in terms of vect_stmt_in_region_p. > >>> (vect_pattern_recog): Iterate over the BB region. > >>> * tree-vect-stmts.c (vect_is_simple_use): Use > >>> vect_stmt_in_region_p. > >>> * tree-vectorizer.c (vect_stmt_in_region_p): New function. > >>> (pass_slp_vectorize::execute): Initialize all stmt UIDs to -1. > >>> > >>> * config/i386/i386.c: Include gimple-iterator.h. > >>> * config/aarch64/aarch64.c: Likewise. > >>> > >>> * gcc.dg/vect/bb-slp-38.c: New testcase. > >>> > >>> Index: gcc/tree-vectorizer.h > >>> =================================================================== > >>> *** gcc/tree-vectorizer.h.orig 2015-11-05 09:52:00.640227178 +0100 > >>> --- gcc/tree-vectorizer.h 2015-11-05 13:20:58.385786476 +0100 > >>> *************** nested_in_vect_loop_p (struct loop *loop > >>> *** 390,395 **** > >>> --- 390,397 ---- > >>> typedef struct _bb_vec_info : public vec_info > >>> { > >>> basic_block bb; > >>> + gimple_stmt_iterator region_begin; > >>> + gimple_stmt_iterator region_end; > >>> } *bb_vec_info; > >>> #define BB_VINFO_BB(B) (B)->bb > >>> *************** void vect_pattern_recog (vec_info *); > >>> *** 1085,1089 **** > >>> --- 1087,1092 ---- > >>> /* In tree-vectorizer.c. */ > >>> unsigned vectorize_loops (void); > >>> void vect_destroy_datarefs (vec_info *); > >>> + bool vect_stmt_in_region_p (vec_info *, gimple *); > >>> #endif /* GCC_TREE_VECTORIZER_H */ > >>> Index: gcc/tree-vect-slp.c > >>> =================================================================== > >>> *** gcc/tree-vect-slp.c.orig 2015-11-05 09:52:00.640227178 +0100 > >>> --- gcc/tree-vect-slp.c 2015-11-06 10:22:56.707880233 +0100 > >>> *************** vect_get_and_check_slp_defs (vec_info *v > >>> *** 209,215 **** > >>> unsigned int i, number_of_oprnds; > >>> gimple *def_stmt; > >>> enum vect_def_type dt = vect_uninitialized_def; > >>> - struct loop *loop = NULL; > >>> bool pattern = false; > >>> slp_oprnd_info oprnd_info; > >>> int first_op_idx = 1; > >>> --- 209,214 ---- > >>> *************** vect_get_and_check_slp_defs (vec_info *v > >>> *** 218,226 **** > >>> bool first = stmt_num == 0; > >>> bool second = stmt_num == 1; > >>> - if (is_a <loop_vec_info> (vinfo)) > >>> - loop = LOOP_VINFO_LOOP (as_a <loop_vec_info> (vinfo)); > >>> - > >>> if (is_gimple_call (stmt)) > >>> { > >>> number_of_oprnds = gimple_call_num_args (stmt); > >>> --- 217,222 ---- > >>> *************** again: > >>> *** 276,286 **** > >>> from the pattern. Check that all the stmts of the node are > >>> in the > >>> pattern. */ > >>> if (def_stmt && gimple_bb (def_stmt) > >>> ! && ((is_a <loop_vec_info> (vinfo) > >>> ! && flow_bb_inside_loop_p (loop, gimple_bb (def_stmt))) > >>> ! || (is_a <bb_vec_info> (vinfo) > >>> ! && gimple_bb (def_stmt) == as_a <bb_vec_info> > >>> (vinfo)->bb > >>> ! && gimple_code (def_stmt) != GIMPLE_PHI)) > >>> && vinfo_for_stmt (def_stmt) > >>> && STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (def_stmt)) > >>> && !STMT_VINFO_RELEVANT (vinfo_for_stmt (def_stmt)) > >>> --- 272,278 ---- > >>> from the pattern. Check that all the stmts of the node are > >>> in the > >>> pattern. */ > >>> if (def_stmt && gimple_bb (def_stmt) > >>> ! && vect_stmt_in_region_p (vinfo, def_stmt) > >>> && vinfo_for_stmt (def_stmt) > >>> && STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (def_stmt)) > >>> && !STMT_VINFO_RELEVANT (vinfo_for_stmt (def_stmt)) > >>> *************** vect_detect_hybrid_slp (loop_vec_info lo > >>> *** 2076,2091 **** > >>> stmt_vec_info structs for all the stmts in it. */ > >>> static bb_vec_info > >>> ! new_bb_vec_info (basic_block bb) > >>> { > >>> bb_vec_info res = NULL; > >>> gimple_stmt_iterator gsi; > >>> res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info)); > >>> res->kind = vec_info::bb; > >>> BB_VINFO_BB (res) = bb; > >>> ! for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) > >>> { > >>> gimple *stmt = gsi_stmt (gsi); > >>> gimple_set_uid (stmt, 0); > >>> --- 2068,2088 ---- > >>> stmt_vec_info structs for all the stmts in it. */ > >>> static bb_vec_info > >>> ! new_bb_vec_info (gimple_stmt_iterator region_begin, > >>> ! gimple_stmt_iterator region_end) > >>> { > >>> + basic_block bb = gsi_bb (region_begin); > >>> bb_vec_info res = NULL; > >>> gimple_stmt_iterator gsi; > >>> res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info)); > >>> res->kind = vec_info::bb; > >>> BB_VINFO_BB (res) = bb; > >>> + res->region_begin = region_begin; > >>> + res->region_end = region_end; > >>> ! for (gsi = region_begin; gsi_stmt (gsi) != gsi_stmt (region_end); > >>> ! gsi_next (&gsi)) > >>> { > >>> gimple *stmt = gsi_stmt (gsi); > >>> gimple_set_uid (stmt, 0); > >>> *************** destroy_bb_vec_info (bb_vec_info bb_vinf > >>> *** 2118,2124 **** > >>> bb = BB_VINFO_BB (bb_vinfo); > >>> ! for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si)) > >>> { > >>> gimple *stmt = gsi_stmt (si); > >>> stmt_vec_info stmt_info = vinfo_for_stmt (stmt); > >>> --- 2115,2122 ---- > >>> bb = BB_VINFO_BB (bb_vinfo); > >>> ! for (si = bb_vinfo->region_begin; > >>> ! gsi_stmt (si) != gsi_stmt (bb_vinfo->region_end); gsi_next > >>> (&si)) > >>> { > >>> gimple *stmt = gsi_stmt (si); > >>> stmt_vec_info stmt_info = vinfo_for_stmt (stmt); > >>> *************** destroy_bb_vec_info (bb_vec_info bb_vinf > >>> *** 2126,2131 **** > >>> --- 2124,2132 ---- > >>> if (stmt_info) > >>> /* Free stmt_vec_info. */ > >>> free_stmt_vec_info (stmt); > >>> + > >>> + /* Reset region marker. */ > >>> + gimple_set_uid (stmt, -1); > >>> } > >>> vect_destroy_datarefs (bb_vinfo); > >>> *************** vect_bb_slp_scalar_cost (basic_block bb, > >>> *** 2247,2254 **** > >>> gimple *use_stmt; > >>> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR > >>> (def_p)) > >>> if (!is_gimple_debug (use_stmt) > >>> ! && (gimple_code (use_stmt) == GIMPLE_PHI > >>> ! || gimple_bb (use_stmt) != bb > >>> || !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt > >>> (use_stmt)))) > >>> { > >>> (*life)[i] = true; > >>> --- 2248,2255 ---- > >>> gimple *use_stmt; > >>> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR > >>> (def_p)) > >>> if (!is_gimple_debug (use_stmt) > >>> ! && (! vect_stmt_in_region_p (vinfo_for_stmt > >>> (stmt)->vinfo, > >>> ! use_stmt) > >>> || !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt > >>> (use_stmt)))) > >>> { > >>> (*life)[i] = true; > >>> *************** vect_bb_vectorization_profitable_p (bb_v > >>> *** 2327,2366 **** > >>> /* Check if the basic block can be vectorized. */ > >>> static bb_vec_info > >>> ! vect_slp_analyze_bb_1 (basic_block bb) > >>> { > >>> bb_vec_info bb_vinfo; > >>> vec<slp_instance> slp_instances; > >>> slp_instance instance; > >>> int i; > >>> int min_vf = 2; > >>> - unsigned n_stmts = 0; > >>> ! bb_vinfo = new_bb_vec_info (bb); > >>> if (!bb_vinfo) > >>> return NULL; > >>> ! /* Gather all data references in the basic-block. */ > >>> ! > >>> ! for (gimple_stmt_iterator gsi = gsi_start_bb (bb); > >>> ! !gsi_end_p (gsi); gsi_next (&gsi)) > >>> ! { > >>> ! gimple *stmt = gsi_stmt (gsi); > >>> ! if (is_gimple_debug (stmt)) > >>> ! continue; > >>> ! ++n_stmts; > >>> ! if (!find_data_references_in_stmt (NULL, stmt, > >>> ! &BB_VINFO_DATAREFS (bb_vinfo))) > >>> ! { > >>> ! /* Mark the rest of the basic-block as unvectorizable. */ > >>> ! for (; !gsi_end_p (gsi); gsi_next (&gsi)) > >>> ! { > >>> ! stmt = gsi_stmt (gsi); > >>> ! STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt)) = false; > >>> ! } > >>> ! break; > >>> ! } > >>> ! } > >>> /* Analyze the data references. */ > >>> --- 2328,2358 ---- > >>> /* Check if the basic block can be vectorized. */ > >>> static bb_vec_info > >>> ! vect_slp_analyze_bb_1 (gimple_stmt_iterator region_begin, > >>> ! gimple_stmt_iterator region_end, > >>> ! vec<data_reference_p> datarefs, int n_stmts) > >>> { > >>> bb_vec_info bb_vinfo; > >>> vec<slp_instance> slp_instances; > >>> slp_instance instance; > >>> int i; > >>> int min_vf = 2; > >>> ! if (n_stmts > PARAM_VALUE (PARAM_SLP_MAX_INSNS_IN_BB)) > >>> ! { > >>> ! if (dump_enabled_p ()) > >>> ! dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > >>> ! "not vectorized: too many instructions in " > >>> ! "basic block.\n"); > >>> ! free_data_refs (datarefs); > >>> ! return NULL; > >>> ! } > >>> ! > >>> ! bb_vinfo = new_bb_vec_info (region_begin, region_end); > >>> if (!bb_vinfo) > >>> return NULL; > >>> ! BB_VINFO_DATAREFS (bb_vinfo) = datarefs; > >>> /* Analyze the data references. */ > >>> *************** vect_slp_analyze_bb_1 (basic_block bb) > >>> *** 2438,2445 **** > >>> } > >>> /* Mark all the statements that we do not want to vectorize. */ > >>> ! for (gimple_stmt_iterator gsi = gsi_start_bb (BB_VINFO_BB > >>> (bb_vinfo)); > >>> ! !gsi_end_p (gsi); gsi_next (&gsi)) > >>> { > >>> stmt_vec_info vinfo = vinfo_for_stmt (gsi_stmt (gsi)); > >>> if (STMT_SLP_TYPE (vinfo) != pure_slp) > >>> --- 2430,2437 ---- > >>> } > >>> /* Mark all the statements that we do not want to vectorize. */ > >>> ! for (gimple_stmt_iterator gsi = bb_vinfo->region_begin; > >>> ! gsi_stmt (gsi) != gsi_stmt (bb_vinfo->region_end); gsi_next > >>> (&gsi)) > >>> { > >>> stmt_vec_info vinfo = vinfo_for_stmt (gsi_stmt (gsi)); > >>> if (STMT_SLP_TYPE (vinfo) != pure_slp) > >>> *************** bool > >>> *** 2509,2585 **** > >>> vect_slp_bb (basic_block bb) > >>> { > >>> bb_vec_info bb_vinfo; > >>> - int insns = 0; > >>> gimple_stmt_iterator gsi; > >>> unsigned int vector_sizes; > >>> if (dump_enabled_p ()) > >>> dump_printf_loc (MSG_NOTE, vect_location, > >>> "===vect_slp_analyze_bb===\n"); > >>> - for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) > >>> - { > >>> - gimple *stmt = gsi_stmt (gsi); > >>> - if (!is_gimple_debug (stmt) > >>> - && !gimple_nop_p (stmt) > >>> - && gimple_code (stmt) != GIMPLE_LABEL) > >>> - insns++; > >>> - if (gimple_location (stmt) != UNKNOWN_LOCATION) > >>> - vect_location = gimple_location (stmt); > >>> - } > >>> - > >>> - if (insns > PARAM_VALUE (PARAM_SLP_MAX_INSNS_IN_BB)) > >>> - { > >>> - if (dump_enabled_p ()) > >>> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > >>> - "not vectorized: too many instructions in " > >>> - "basic block.\n"); > >>> - > >>> - return false; > >>> - } > >>> - > >>> /* Autodetect first vector size we try. */ > >>> current_vector_size = 0; > >>> vector_sizes = targetm.vectorize.autovectorize_vector_sizes (); > >>> while (1) > >>> { > >>> ! bb_vinfo = vect_slp_analyze_bb_1 (bb); > >>> ! if (bb_vinfo) > >>> { > >>> ! if (!dbg_cnt (vect_slp)) > >>> ! { > >>> ! destroy_bb_vec_info (bb_vinfo); > >>> ! return false; > >>> ! } > >>> if (dump_enabled_p ()) > >>> ! dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB\n"); > >>> vect_schedule_slp (bb_vinfo); > >>> if (dump_enabled_p ()) > >>> dump_printf_loc (MSG_NOTE, vect_location, > >>> ! "BASIC BLOCK VECTORIZED\n"); > >>> destroy_bb_vec_info (bb_vinfo); > >>> ! return true; > >>> } > >>> ! destroy_bb_vec_info (bb_vinfo); > >>> vector_sizes &= ~current_vector_size; > >>> ! if (vector_sizes == 0 > >>> ! || current_vector_size == 0) > >>> ! return false; > >>> ! /* Try the next biggest vector size. */ > >>> ! current_vector_size = 1 << floor_log2 (vector_sizes); > >>> ! if (dump_enabled_p ()) > >>> ! dump_printf_loc (MSG_NOTE, vect_location, > >>> ! "***** Re-trying analysis with " > >>> ! "vector size %d\n", current_vector_size); > >>> } > >>> } > >>> --- 2501,2605 ---- > >>> vect_slp_bb (basic_block bb) > >>> { > >>> bb_vec_info bb_vinfo; > >>> gimple_stmt_iterator gsi; > >>> unsigned int vector_sizes; > >>> + bool any_vectorized = false; > >>> if (dump_enabled_p ()) > >>> dump_printf_loc (MSG_NOTE, vect_location, > >>> "===vect_slp_analyze_bb===\n"); > >>> /* Autodetect first vector size we try. */ > >>> current_vector_size = 0; > >>> vector_sizes = targetm.vectorize.autovectorize_vector_sizes (); > >>> + gsi = gsi_start_bb (bb); > >>> + > >>> while (1) > >>> { > >>> ! if (gsi_end_p (gsi)) > >>> ! break; > >>> ! > >>> ! gimple_stmt_iterator region_begin = gsi; > >>> ! vec<data_reference_p> datarefs = vNULL; > >>> ! int insns = 0; > >>> ! > >>> ! for (; !gsi_end_p (gsi); gsi_next (&gsi)) > >>> { > >>> ! gimple *stmt = gsi_stmt (gsi); > >>> ! if (is_gimple_debug (stmt)) > >>> ! continue; > >>> ! insns++; > >>> ! > >>> ! if (gimple_location (stmt) != UNKNOWN_LOCATION) > >>> ! vect_location = gimple_location (stmt); > >>> ! > >>> ! if (!find_data_references_in_stmt (NULL, stmt, &datarefs)) > >>> ! break; > >>> ! } > >>> ! > >>> ! /* Skip leading unhandled stmts. */ > >>> ! if (gsi_stmt (region_begin) == gsi_stmt (gsi)) > >>> ! { > >>> ! gsi_next (&gsi); > >>> ! continue; > >>> ! } > >>> ! > >>> ! gimple_stmt_iterator region_end = gsi; > >>> + bool vectorized = false; > >>> + bb_vinfo = vect_slp_analyze_bb_1 (region_begin, region_end, > >>> + datarefs, insns); > >>> + if (bb_vinfo > >>> + && dbg_cnt (vect_slp)) > >>> + { > >>> if (dump_enabled_p ()) > >>> ! dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB > >>> part\n"); > >>> vect_schedule_slp (bb_vinfo); > >>> if (dump_enabled_p ()) > >>> dump_printf_loc (MSG_NOTE, vect_location, > >>> ! "basic block part vectorized\n"); > >>> destroy_bb_vec_info (bb_vinfo); > >>> ! vectorized = true; > >>> } > >>> + else > >>> + destroy_bb_vec_info (bb_vinfo); > >>> ! any_vectorized |= vectorized; > >>> vector_sizes &= ~current_vector_size; > >>> ! if (vectorized > >>> ! || vector_sizes == 0 > >>> ! || current_vector_size == 0) > >>> ! { > >>> ! if (gsi_end_p (region_end)) > >>> ! break; > >>> ! > >>> ! /* Skip the unhandled stmt. */ > >>> ! gsi_next (&gsi); > >>> ! > >>> ! /* And reset vector sizes. */ > >>> ! current_vector_size = 0; > >>> ! vector_sizes = targetm.vectorize.autovectorize_vector_sizes (); > >>> ! } > >>> ! else > >>> ! { > >>> ! /* Try the next biggest vector size. */ > >>> ! current_vector_size = 1 << floor_log2 (vector_sizes); > >>> ! if (dump_enabled_p ()) > >>> ! dump_printf_loc (MSG_NOTE, vect_location, > >>> ! "***** Re-trying analysis with " > >>> ! "vector size %d\n", current_vector_size); > >>> ! /* Start over. */ > >>> ! gsi = region_begin; > >>> ! } > >>> } > >>> + > >>> + return any_vectorized; > >>> } > >>> Index: gcc/tree-vect-patterns.c > >>> =================================================================== > >>> *** gcc/tree-vect-patterns.c.orig 2015-11-05 09:52:00.640227178 > >>> +0100 > >>> --- gcc/tree-vect-patterns.c 2015-11-05 13:25:46.060011765 +0100 > >>> *************** static bool > >>> *** 107,133 **** > >>> vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2) > >>> { > >>> stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1); > >>> ! loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo); > >>> ! bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_vinfo); > >>> ! > >>> ! if (!gimple_bb (stmt2)) > >>> ! return false; > >>> ! > >>> ! if (loop_vinfo) > >>> ! { > >>> ! struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); > >>> ! if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt2))) > >>> ! return false; > >>> ! } > >>> ! else > >>> ! { > >>> ! if (gimple_bb (stmt2) != BB_VINFO_BB (bb_vinfo) > >>> ! || gimple_code (stmt2) == GIMPLE_PHI) > >>> ! return false; > >>> ! } > >>> ! > >>> ! gcc_assert (vinfo_for_stmt (stmt2)); > >>> ! return true; > >>> } > >>> /* If the LHS of DEF_STMT has a single use, and that statement is > >>> --- 107,113 ---- > >>> vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2) > >>> { > >>> stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1); > >>> ! return vect_stmt_in_region_p (stmt_vinfo->vinfo, stmt2); > >>> } > >>> /* If the LHS of DEF_STMT has a single use, and that statement is > >>> *************** vect_pattern_recog (vec_info *vinfo) > >>> *** 3611,3643 **** > >>> loop = LOOP_VINFO_LOOP (loop_vinfo); > >>> bbs = LOOP_VINFO_BBS (loop_vinfo); > >>> nbbs = loop->num_nodes; > >>> } > >>> else > >>> { > >>> ! bbs = &as_a <bb_vec_info> (vinfo)->bb; > >>> ! nbbs = 1; > >>> ! } > >>> ! > >>> ! /* Scan through the loop stmts, applying the pattern recognition > >>> ! functions starting at each stmt visited: */ > >>> ! for (i = 0; i < nbbs; i++) > >>> ! { > >>> ! basic_block bb = bbs[i]; > >>> ! for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si)) > >>> ! { > >>> ! if (is_a <bb_vec_info> (vinfo) > >>> ! && (stmt = gsi_stmt (si)) > >>> && vinfo_for_stmt (stmt) > >>> && !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt))) > >>> ! continue; > >>> ! /* Scan over all generic vect_recog_xxx_pattern functions. > >>> */ > >>> ! for (j = 0; j < NUM_PATTERNS; j++) > >>> ! { > >>> vect_recog_func = vect_vect_recog_func_ptrs[j]; > >>> vect_pattern_recog_1 (vect_recog_func, si, > >>> &stmts_to_replace); > >>> ! } > >>> ! } > >>> } > >>> } > >>> --- 3591,3632 ---- > >>> loop = LOOP_VINFO_LOOP (loop_vinfo); > >>> bbs = LOOP_VINFO_BBS (loop_vinfo); > >>> nbbs = loop->num_nodes; > >>> + > >>> + /* Scan through the loop stmts, applying the pattern recognition > >>> + functions starting at each stmt visited: */ > >>> + for (i = 0; i < nbbs; i++) > >>> + { > >>> + basic_block bb = bbs[i]; > >>> + for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si)) > >>> + { > >>> + /* Scan over all generic vect_recog_xxx_pattern functions. > >>> */ > >>> + for (j = 0; j < NUM_PATTERNS; j++) > >>> + { > >>> + vect_recog_func = vect_vect_recog_func_ptrs[j]; > >>> + vect_pattern_recog_1 (vect_recog_func, si, > >>> + &stmts_to_replace); > >>> + } > >>> + } > >>> + } > >>> } > >>> else > >>> { > >>> ! bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo); > >>> ! for (si = bb_vinfo->region_begin; > >>> ! gsi_stmt (si) != gsi_stmt (bb_vinfo->region_end); gsi_next > >>> (&si)) > >>> ! { > >>> ! if ((stmt = gsi_stmt (si)) > >>> && vinfo_for_stmt (stmt) > >>> && !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt))) > >>> ! continue; > >>> ! /* Scan over all generic vect_recog_xxx_pattern functions. */ > >>> ! for (j = 0; j < NUM_PATTERNS; j++) > >>> ! { > >>> vect_recog_func = vect_vect_recog_func_ptrs[j]; > >>> vect_pattern_recog_1 (vect_recog_func, si, > >>> &stmts_to_replace); > >>> ! } > >>> ! } > >>> } > >>> } > >>> Index: gcc/config/i386/i386.c > >>> =================================================================== > >>> *** gcc/config/i386/i386.c.orig 2015-11-05 09:52:42.239687133 +0100 > >>> --- gcc/config/i386/i386.c 2015-11-05 11:09:09.451774562 +0100 > >>> *************** along with GCC; see the file COPYING3. > >>> *** 64,69 **** > >>> --- 64,70 ---- > >>> #include "context.h" > >>> #include "pass_manager.h" > >>> #include "target-globals.h" > >>> + #include "gimple-iterator.h" > >>> #include "tree-vectorizer.h" > >>> #include "shrink-wrap.h" > >>> #include "builtins.h" > >>> Index: gcc/testsuite/gcc.dg/vect/bb-slp-38.c > >>> =================================================================== > >>> *** /dev/null 1970-01-01 00:00:00.000000000 +0000 > >>> --- gcc/testsuite/gcc.dg/vect/bb-slp-38.c 2015-11-05 > >>> 14:00:48.177644327 +0100 > >>> *************** > >>> *** 0 **** > >>> --- 1,44 ---- > >>> + /* { dg-require-effective-target vect_int } */ > >>> + > >>> + #include "tree-vect.h" > >>> + > >>> + extern void abort (void); > >>> + > >>> + int a[8], b[8]; > >>> + int x; > >>> + > >>> + void __attribute__((noinline,noclone)) > >>> + bar (void) > >>> + { > >>> + x = 1; > >>> + } > >>> + > >>> + void __attribute__((noinline,noclone)) > >>> + foo(void) > >>> + { > >>> + a[0] = b[0]; > >>> + a[1] = b[0]; > >>> + a[2] = b[3]; > >>> + a[3] = b[3]; > >>> + bar (); > >>> + a[4] = b[4]; > >>> + a[5] = b[7]; > >>> + a[6] = b[4]; > >>> + a[7] = b[7]; > >>> + } > >>> + > >>> + int main() > >>> + { > >>> + int i; > >>> + check_vect (); > >>> + for (i = 0; i < 8; ++i) > >>> + b[i] = i; > >>> + foo (); > >>> + if (a[0] != 0 || a[1] != 0 || a[2] != 3 || a[3] != 3 > >>> + || a[4] != 4 || a[5] != 7 || a[6] != 4 || a[7] != 7) > >>> + abort (); > >>> + return 0; > >>> + } > >>> + > >>> + /* { dg-final { scan-tree-dump "basic block vectorized" "slp2" { target > >>> vect_perm } } } */ > >>> + /* { dg-final { scan-tree-dump-times "basic block part vectorized" 2 > >>> "slp2" { target vect_perm } } } */ > >>> Index: gcc/tree-vect-stmts.c > >>> =================================================================== > >>> *** gcc/tree-vect-stmts.c.orig 2015-11-02 12:37:11.074249388 +0100 > >>> --- gcc/tree-vect-stmts.c 2015-11-05 13:29:21.413423692 +0100 > >>> *************** vect_is_simple_use (tree operand, vec_in > >>> *** 8196,8207 **** > >>> dump_gimple_stmt (MSG_NOTE, TDF_SLIM, *def_stmt, 0); > >>> } > >>> ! basic_block bb = gimple_bb (*def_stmt); > >>> ! if ((is_a <loop_vec_info> (vinfo) > >>> ! && !flow_bb_inside_loop_p (as_a <loop_vec_info> (vinfo)->loop, > >>> bb)) > >>> ! || (is_a <bb_vec_info> (vinfo) > >>> ! && (bb != as_a <bb_vec_info> (vinfo)->bb > >>> ! || gimple_code (*def_stmt) == GIMPLE_PHI))) > >>> *dt = vect_external_def; > >>> else > >>> { > >>> --- 8196,8202 ---- > >>> dump_gimple_stmt (MSG_NOTE, TDF_SLIM, *def_stmt, 0); > >>> } > >>> ! if (! vect_stmt_in_region_p (vinfo, *def_stmt)) > >>> *dt = vect_external_def; > >>> else > >>> { > >>> Index: gcc/tree-vectorizer.c > >>> =================================================================== > >>> *** gcc/tree-vectorizer.c.orig 2015-11-04 09:23:53.724687806 +0100 > >>> --- gcc/tree-vectorizer.c 2015-11-05 13:55:08.299817570 +0100 > >>> *************** vect_destroy_datarefs (vec_info *vinfo) > >>> *** 350,355 **** > >>> --- 350,382 ---- > >>> } > >>> + /* Return whether STMT is inside the region we try to vectorize. > >>> */ > >>> + > >>> + bool > >>> + vect_stmt_in_region_p (vec_info *vinfo, gimple *stmt) > >>> + { > >>> + if (!gimple_bb (stmt)) > >>> + return false; > >>> + > >>> + if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo)) > >>> + { > >>> + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); > >>> + if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt))) > >>> + return false; > >>> + } > >>> + else > >>> + { > >>> + bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo); > >>> + if (gimple_bb (stmt) != BB_VINFO_BB (bb_vinfo) > >>> + || gimple_uid (stmt) == -1U > >>> + || gimple_code (stmt) == GIMPLE_PHI) > >>> + return false; > >>> + } > >>> + > >>> + return true; > >>> + } > >>> + > >>> + > >>> /* If LOOP has been versioned during ifcvt, return the internal call > >>> guarding it. */ > >>> *************** pass_slp_vectorize::execute (function *f > >>> *** 692,697 **** > >>> --- 719,732 ---- > >>> scev_initialize (); > >>> } > >>> + /* Mark all stmts as not belonging to the current region. */ > >>> + FOR_EACH_BB_FN (bb, fun) > >>> + { > >>> + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p > >>> (gsi); > >>> + gsi_next (&gsi)) > >>> + gimple_set_uid (gsi_stmt (gsi), -1); > >>> + } > >>> + > >>> init_stmt_vec_info_vec (); > >>> FOR_EACH_BB_FN (bb, fun) > >>> Index: gcc/config/aarch64/aarch64.c > >>> =================================================================== > >>> *** gcc/config/aarch64/aarch64.c.orig 2015-10-28 11:22:25.290823112 > >>> +0100 > >>> --- gcc/config/aarch64/aarch64.c 2015-11-06 10:24:21.539818027 > >>> +0100 > >>> *************** > >>> *** 52,57 **** > >>> --- 52,58 ---- > >>> #include "params.h" > >>> #include "gimplify.h" > >>> #include "dwarf2.h" > >>> + #include "gimple-iterator.h" > >>> #include "tree-vectorizer.h" > >>> #include "aarch64-cost-tables.h" > >>> #include "dumpfile.h" > >>> > > > > -- Richard Biener <rguenther@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Make BB vectorizer work on sub-BBs 2015-11-10 13:03 ` Richard Biener @ 2015-11-10 15:20 ` Christophe Lyon 0 siblings, 0 replies; 8+ messages in thread From: Christophe Lyon @ 2015-11-10 15:20 UTC (permalink / raw) To: Richard Biener; +Cc: Kyrill Tkachov, gcc-patches On 10 November 2015 at 14:02, Richard Biener <rguenther@suse.de> wrote: > On Tue, 10 Nov 2015, Christophe Lyon wrote: > >> On 6 November 2015 at 12:11, Kyrill Tkachov <kyrylo.tkachov@arm.com> wrote: >> > Hi Richard, >> > >> > >> > On 06/11/15 11:09, Richard Biener wrote: >> >> >> >> On Fri, 6 Nov 2015, Richard Biener wrote: >> >> >> >>> The following patch makes the BB vectorizer not only handle BB heads >> >>> (until the first stmt with a data reference it cannot handle) but >> >>> arbitrary regions in a BB separated by such stmts. >> >>> >> >>> This improves the number of BB vectorizations from 469 to 556 >> >>> in a quick test on SPEC CPU 2006 with -Ofast on x86_64 and >> >>> 1x400.perlbench 1x410.bwaves 1x416.gamess 1x450.soplex 1x453.povray >> >>> 1x481.wrf failing both patched and unpatched (have to update my >> >>> config used for such experiments it seems ...) >> >>> >> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu, aarch64 cross built. >> >>> >> >>> I'm currently re-testing for a cosmetic change I made when writing >> >>> the changelog. >> >>> >> >>> I expected (and there are) some issues with compile-time. Left >> >>> is unpatched and right is patched. >> >>> >> >>> '403.gcc': 00:00:54 (54) | '403.gcc': 00:00:55 (55) >> >>> '483.xalancbmk': 00:02:20 (140) | '483.xalancbmk': 00:02:24 (144) >> >>> '416.gamess': 00:02:36 (156) | '416.gamess': 00:02:37 (157) >> >>> '435.gromacs': 00:00:18 (18) | '435.gromacs': 00:00:19 (19) >> >>> '447.dealII': 00:01:31 (91) | '447.dealII': 00:01:33 (93) >> >>> '453.povray': 00:04:54 (294) | '453.povray': 00:08:54 (534) >> >>> '454.calculix': 00:00:34 (34) | '454.calculix': 00:00:52 (52) >> >>> '481.wrf': 00:01:57 (117) | '481.wrf': 00:01:59 (119) >> >>> >> >>> other benchmarks are unchanged. I'm double-checking now that a followup >> >>> patch I have which re-implements BB vectorization dependence checking >> >>> fixes this (that's the only quadraticness I know of). >> >> >> >> Fixes all but >> >> >> >> '453.povray': 00:04:54 (294) | '453.povray': 00:06:46 (406) >> > >> > >> > Note that povray is currently suffering from PR 68198 >> > >> >> Hi, >> >> I've also noticed that the new test bb-slp-38 fails on armeb: >> FAIL: gcc.dg/vect/bb-slp-38.c -flto -ffat-lto-objects >> scan-tree-dump-times slp2 "basic block part vectorized" 2 >> FAIL: gcc.dg/vect/bb-slp-38.c scan-tree-dump-times slp2 "basic block >> part vectorized" 2 >> >> I haven't checked in more detail, maybe it's similar to what we >> discussed in PR65962 > > Maybe though there is no misalignment involved as far as I can see. > > Please open a bug and attach vectorizer dumps. > OK, this is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68275 > Richard. > >> > Kyrill >> > >> > >> >> >> >> it even improves compile-time on some: >> >> >> >> '464.h264ref': 00:00:26 (26) | '464.h264ref': 00:00:21 (21) >> >> >> >> it also increases the number of vectorized BBs to 722. >> >> >> >> Needs some work still though. >> >> >> >> Richard. >> >> >> >>> Richard. >> >>> >> >>> 2015-11-06 Richard Biener <rguenther@suse.de> >> >>> >> >>> * tree-vectorizer.h (struct _bb_vec_info): Add region_begin/end >> >>> members. >> >>> (vect_stmt_in_region_p): Declare. >> >>> * tree-vect-slp.c (new_bb_vec_info): Work on a region. >> >>> (destroy_bb_vec_info): Likewise. >> >>> (vect_bb_slp_scalar_cost): Use vect_stmt_in_region_p. >> >>> (vect_get_and_check_slp_defs): Likewise. >> >>> (vect_slp_analyze_bb_1): Refactor to make it work on sub-BBs. >> >>> (vect_slp_bb): Likewise. >> >>> * tree-vect-patterns.c (vect_same_loop_or_bb_p): Implement >> >>> in terms of vect_stmt_in_region_p. >> >>> (vect_pattern_recog): Iterate over the BB region. >> >>> * tree-vect-stmts.c (vect_is_simple_use): Use >> >>> vect_stmt_in_region_p. >> >>> * tree-vectorizer.c (vect_stmt_in_region_p): New function. >> >>> (pass_slp_vectorize::execute): Initialize all stmt UIDs to -1. >> >>> >> >>> * config/i386/i386.c: Include gimple-iterator.h. >> >>> * config/aarch64/aarch64.c: Likewise. >> >>> >> >>> * gcc.dg/vect/bb-slp-38.c: New testcase. >> >>> >> >>> Index: gcc/tree-vectorizer.h >> >>> =================================================================== >> >>> *** gcc/tree-vectorizer.h.orig 2015-11-05 09:52:00.640227178 +0100 >> >>> --- gcc/tree-vectorizer.h 2015-11-05 13:20:58.385786476 +0100 >> >>> *************** nested_in_vect_loop_p (struct loop *loop >> >>> *** 390,395 **** >> >>> --- 390,397 ---- >> >>> typedef struct _bb_vec_info : public vec_info >> >>> { >> >>> basic_block bb; >> >>> + gimple_stmt_iterator region_begin; >> >>> + gimple_stmt_iterator region_end; >> >>> } *bb_vec_info; >> >>> #define BB_VINFO_BB(B) (B)->bb >> >>> *************** void vect_pattern_recog (vec_info *); >> >>> *** 1085,1089 **** >> >>> --- 1087,1092 ---- >> >>> /* In tree-vectorizer.c. */ >> >>> unsigned vectorize_loops (void); >> >>> void vect_destroy_datarefs (vec_info *); >> >>> + bool vect_stmt_in_region_p (vec_info *, gimple *); >> >>> #endif /* GCC_TREE_VECTORIZER_H */ >> >>> Index: gcc/tree-vect-slp.c >> >>> =================================================================== >> >>> *** gcc/tree-vect-slp.c.orig 2015-11-05 09:52:00.640227178 +0100 >> >>> --- gcc/tree-vect-slp.c 2015-11-06 10:22:56.707880233 +0100 >> >>> *************** vect_get_and_check_slp_defs (vec_info *v >> >>> *** 209,215 **** >> >>> unsigned int i, number_of_oprnds; >> >>> gimple *def_stmt; >> >>> enum vect_def_type dt = vect_uninitialized_def; >> >>> - struct loop *loop = NULL; >> >>> bool pattern = false; >> >>> slp_oprnd_info oprnd_info; >> >>> int first_op_idx = 1; >> >>> --- 209,214 ---- >> >>> *************** vect_get_and_check_slp_defs (vec_info *v >> >>> *** 218,226 **** >> >>> bool first = stmt_num == 0; >> >>> bool second = stmt_num == 1; >> >>> - if (is_a <loop_vec_info> (vinfo)) >> >>> - loop = LOOP_VINFO_LOOP (as_a <loop_vec_info> (vinfo)); >> >>> - >> >>> if (is_gimple_call (stmt)) >> >>> { >> >>> number_of_oprnds = gimple_call_num_args (stmt); >> >>> --- 217,222 ---- >> >>> *************** again: >> >>> *** 276,286 **** >> >>> from the pattern. Check that all the stmts of the node are >> >>> in the >> >>> pattern. */ >> >>> if (def_stmt && gimple_bb (def_stmt) >> >>> ! && ((is_a <loop_vec_info> (vinfo) >> >>> ! && flow_bb_inside_loop_p (loop, gimple_bb (def_stmt))) >> >>> ! || (is_a <bb_vec_info> (vinfo) >> >>> ! && gimple_bb (def_stmt) == as_a <bb_vec_info> >> >>> (vinfo)->bb >> >>> ! && gimple_code (def_stmt) != GIMPLE_PHI)) >> >>> && vinfo_for_stmt (def_stmt) >> >>> && STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (def_stmt)) >> >>> && !STMT_VINFO_RELEVANT (vinfo_for_stmt (def_stmt)) >> >>> --- 272,278 ---- >> >>> from the pattern. Check that all the stmts of the node are >> >>> in the >> >>> pattern. */ >> >>> if (def_stmt && gimple_bb (def_stmt) >> >>> ! && vect_stmt_in_region_p (vinfo, def_stmt) >> >>> && vinfo_for_stmt (def_stmt) >> >>> && STMT_VINFO_IN_PATTERN_P (vinfo_for_stmt (def_stmt)) >> >>> && !STMT_VINFO_RELEVANT (vinfo_for_stmt (def_stmt)) >> >>> *************** vect_detect_hybrid_slp (loop_vec_info lo >> >>> *** 2076,2091 **** >> >>> stmt_vec_info structs for all the stmts in it. */ >> >>> static bb_vec_info >> >>> ! new_bb_vec_info (basic_block bb) >> >>> { >> >>> bb_vec_info res = NULL; >> >>> gimple_stmt_iterator gsi; >> >>> res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info)); >> >>> res->kind = vec_info::bb; >> >>> BB_VINFO_BB (res) = bb; >> >>> ! for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) >> >>> { >> >>> gimple *stmt = gsi_stmt (gsi); >> >>> gimple_set_uid (stmt, 0); >> >>> --- 2068,2088 ---- >> >>> stmt_vec_info structs for all the stmts in it. */ >> >>> static bb_vec_info >> >>> ! new_bb_vec_info (gimple_stmt_iterator region_begin, >> >>> ! gimple_stmt_iterator region_end) >> >>> { >> >>> + basic_block bb = gsi_bb (region_begin); >> >>> bb_vec_info res = NULL; >> >>> gimple_stmt_iterator gsi; >> >>> res = (bb_vec_info) xcalloc (1, sizeof (struct _bb_vec_info)); >> >>> res->kind = vec_info::bb; >> >>> BB_VINFO_BB (res) = bb; >> >>> + res->region_begin = region_begin; >> >>> + res->region_end = region_end; >> >>> ! for (gsi = region_begin; gsi_stmt (gsi) != gsi_stmt (region_end); >> >>> ! gsi_next (&gsi)) >> >>> { >> >>> gimple *stmt = gsi_stmt (gsi); >> >>> gimple_set_uid (stmt, 0); >> >>> *************** destroy_bb_vec_info (bb_vec_info bb_vinf >> >>> *** 2118,2124 **** >> >>> bb = BB_VINFO_BB (bb_vinfo); >> >>> ! for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si)) >> >>> { >> >>> gimple *stmt = gsi_stmt (si); >> >>> stmt_vec_info stmt_info = vinfo_for_stmt (stmt); >> >>> --- 2115,2122 ---- >> >>> bb = BB_VINFO_BB (bb_vinfo); >> >>> ! for (si = bb_vinfo->region_begin; >> >>> ! gsi_stmt (si) != gsi_stmt (bb_vinfo->region_end); gsi_next >> >>> (&si)) >> >>> { >> >>> gimple *stmt = gsi_stmt (si); >> >>> stmt_vec_info stmt_info = vinfo_for_stmt (stmt); >> >>> *************** destroy_bb_vec_info (bb_vec_info bb_vinf >> >>> *** 2126,2131 **** >> >>> --- 2124,2132 ---- >> >>> if (stmt_info) >> >>> /* Free stmt_vec_info. */ >> >>> free_stmt_vec_info (stmt); >> >>> + >> >>> + /* Reset region marker. */ >> >>> + gimple_set_uid (stmt, -1); >> >>> } >> >>> vect_destroy_datarefs (bb_vinfo); >> >>> *************** vect_bb_slp_scalar_cost (basic_block bb, >> >>> *** 2247,2254 **** >> >>> gimple *use_stmt; >> >>> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR >> >>> (def_p)) >> >>> if (!is_gimple_debug (use_stmt) >> >>> ! && (gimple_code (use_stmt) == GIMPLE_PHI >> >>> ! || gimple_bb (use_stmt) != bb >> >>> || !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt >> >>> (use_stmt)))) >> >>> { >> >>> (*life)[i] = true; >> >>> --- 2248,2255 ---- >> >>> gimple *use_stmt; >> >>> FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, DEF_FROM_PTR >> >>> (def_p)) >> >>> if (!is_gimple_debug (use_stmt) >> >>> ! && (! vect_stmt_in_region_p (vinfo_for_stmt >> >>> (stmt)->vinfo, >> >>> ! use_stmt) >> >>> || !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt >> >>> (use_stmt)))) >> >>> { >> >>> (*life)[i] = true; >> >>> *************** vect_bb_vectorization_profitable_p (bb_v >> >>> *** 2327,2366 **** >> >>> /* Check if the basic block can be vectorized. */ >> >>> static bb_vec_info >> >>> ! vect_slp_analyze_bb_1 (basic_block bb) >> >>> { >> >>> bb_vec_info bb_vinfo; >> >>> vec<slp_instance> slp_instances; >> >>> slp_instance instance; >> >>> int i; >> >>> int min_vf = 2; >> >>> - unsigned n_stmts = 0; >> >>> ! bb_vinfo = new_bb_vec_info (bb); >> >>> if (!bb_vinfo) >> >>> return NULL; >> >>> ! /* Gather all data references in the basic-block. */ >> >>> ! >> >>> ! for (gimple_stmt_iterator gsi = gsi_start_bb (bb); >> >>> ! !gsi_end_p (gsi); gsi_next (&gsi)) >> >>> ! { >> >>> ! gimple *stmt = gsi_stmt (gsi); >> >>> ! if (is_gimple_debug (stmt)) >> >>> ! continue; >> >>> ! ++n_stmts; >> >>> ! if (!find_data_references_in_stmt (NULL, stmt, >> >>> ! &BB_VINFO_DATAREFS (bb_vinfo))) >> >>> ! { >> >>> ! /* Mark the rest of the basic-block as unvectorizable. */ >> >>> ! for (; !gsi_end_p (gsi); gsi_next (&gsi)) >> >>> ! { >> >>> ! stmt = gsi_stmt (gsi); >> >>> ! STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt)) = false; >> >>> ! } >> >>> ! break; >> >>> ! } >> >>> ! } >> >>> /* Analyze the data references. */ >> >>> --- 2328,2358 ---- >> >>> /* Check if the basic block can be vectorized. */ >> >>> static bb_vec_info >> >>> ! vect_slp_analyze_bb_1 (gimple_stmt_iterator region_begin, >> >>> ! gimple_stmt_iterator region_end, >> >>> ! vec<data_reference_p> datarefs, int n_stmts) >> >>> { >> >>> bb_vec_info bb_vinfo; >> >>> vec<slp_instance> slp_instances; >> >>> slp_instance instance; >> >>> int i; >> >>> int min_vf = 2; >> >>> ! if (n_stmts > PARAM_VALUE (PARAM_SLP_MAX_INSNS_IN_BB)) >> >>> ! { >> >>> ! if (dump_enabled_p ()) >> >>> ! dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >> >>> ! "not vectorized: too many instructions in " >> >>> ! "basic block.\n"); >> >>> ! free_data_refs (datarefs); >> >>> ! return NULL; >> >>> ! } >> >>> ! >> >>> ! bb_vinfo = new_bb_vec_info (region_begin, region_end); >> >>> if (!bb_vinfo) >> >>> return NULL; >> >>> ! BB_VINFO_DATAREFS (bb_vinfo) = datarefs; >> >>> /* Analyze the data references. */ >> >>> *************** vect_slp_analyze_bb_1 (basic_block bb) >> >>> *** 2438,2445 **** >> >>> } >> >>> /* Mark all the statements that we do not want to vectorize. */ >> >>> ! for (gimple_stmt_iterator gsi = gsi_start_bb (BB_VINFO_BB >> >>> (bb_vinfo)); >> >>> ! !gsi_end_p (gsi); gsi_next (&gsi)) >> >>> { >> >>> stmt_vec_info vinfo = vinfo_for_stmt (gsi_stmt (gsi)); >> >>> if (STMT_SLP_TYPE (vinfo) != pure_slp) >> >>> --- 2430,2437 ---- >> >>> } >> >>> /* Mark all the statements that we do not want to vectorize. */ >> >>> ! for (gimple_stmt_iterator gsi = bb_vinfo->region_begin; >> >>> ! gsi_stmt (gsi) != gsi_stmt (bb_vinfo->region_end); gsi_next >> >>> (&gsi)) >> >>> { >> >>> stmt_vec_info vinfo = vinfo_for_stmt (gsi_stmt (gsi)); >> >>> if (STMT_SLP_TYPE (vinfo) != pure_slp) >> >>> *************** bool >> >>> *** 2509,2585 **** >> >>> vect_slp_bb (basic_block bb) >> >>> { >> >>> bb_vec_info bb_vinfo; >> >>> - int insns = 0; >> >>> gimple_stmt_iterator gsi; >> >>> unsigned int vector_sizes; >> >>> if (dump_enabled_p ()) >> >>> dump_printf_loc (MSG_NOTE, vect_location, >> >>> "===vect_slp_analyze_bb===\n"); >> >>> - for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) >> >>> - { >> >>> - gimple *stmt = gsi_stmt (gsi); >> >>> - if (!is_gimple_debug (stmt) >> >>> - && !gimple_nop_p (stmt) >> >>> - && gimple_code (stmt) != GIMPLE_LABEL) >> >>> - insns++; >> >>> - if (gimple_location (stmt) != UNKNOWN_LOCATION) >> >>> - vect_location = gimple_location (stmt); >> >>> - } >> >>> - >> >>> - if (insns > PARAM_VALUE (PARAM_SLP_MAX_INSNS_IN_BB)) >> >>> - { >> >>> - if (dump_enabled_p ()) >> >>> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >> >>> - "not vectorized: too many instructions in " >> >>> - "basic block.\n"); >> >>> - >> >>> - return false; >> >>> - } >> >>> - >> >>> /* Autodetect first vector size we try. */ >> >>> current_vector_size = 0; >> >>> vector_sizes = targetm.vectorize.autovectorize_vector_sizes (); >> >>> while (1) >> >>> { >> >>> ! bb_vinfo = vect_slp_analyze_bb_1 (bb); >> >>> ! if (bb_vinfo) >> >>> { >> >>> ! if (!dbg_cnt (vect_slp)) >> >>> ! { >> >>> ! destroy_bb_vec_info (bb_vinfo); >> >>> ! return false; >> >>> ! } >> >>> if (dump_enabled_p ()) >> >>> ! dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB\n"); >> >>> vect_schedule_slp (bb_vinfo); >> >>> if (dump_enabled_p ()) >> >>> dump_printf_loc (MSG_NOTE, vect_location, >> >>> ! "BASIC BLOCK VECTORIZED\n"); >> >>> destroy_bb_vec_info (bb_vinfo); >> >>> ! return true; >> >>> } >> >>> ! destroy_bb_vec_info (bb_vinfo); >> >>> vector_sizes &= ~current_vector_size; >> >>> ! if (vector_sizes == 0 >> >>> ! || current_vector_size == 0) >> >>> ! return false; >> >>> ! /* Try the next biggest vector size. */ >> >>> ! current_vector_size = 1 << floor_log2 (vector_sizes); >> >>> ! if (dump_enabled_p ()) >> >>> ! dump_printf_loc (MSG_NOTE, vect_location, >> >>> ! "***** Re-trying analysis with " >> >>> ! "vector size %d\n", current_vector_size); >> >>> } >> >>> } >> >>> --- 2501,2605 ---- >> >>> vect_slp_bb (basic_block bb) >> >>> { >> >>> bb_vec_info bb_vinfo; >> >>> gimple_stmt_iterator gsi; >> >>> unsigned int vector_sizes; >> >>> + bool any_vectorized = false; >> >>> if (dump_enabled_p ()) >> >>> dump_printf_loc (MSG_NOTE, vect_location, >> >>> "===vect_slp_analyze_bb===\n"); >> >>> /* Autodetect first vector size we try. */ >> >>> current_vector_size = 0; >> >>> vector_sizes = targetm.vectorize.autovectorize_vector_sizes (); >> >>> + gsi = gsi_start_bb (bb); >> >>> + >> >>> while (1) >> >>> { >> >>> ! if (gsi_end_p (gsi)) >> >>> ! break; >> >>> ! >> >>> ! gimple_stmt_iterator region_begin = gsi; >> >>> ! vec<data_reference_p> datarefs = vNULL; >> >>> ! int insns = 0; >> >>> ! >> >>> ! for (; !gsi_end_p (gsi); gsi_next (&gsi)) >> >>> { >> >>> ! gimple *stmt = gsi_stmt (gsi); >> >>> ! if (is_gimple_debug (stmt)) >> >>> ! continue; >> >>> ! insns++; >> >>> ! >> >>> ! if (gimple_location (stmt) != UNKNOWN_LOCATION) >> >>> ! vect_location = gimple_location (stmt); >> >>> ! >> >>> ! if (!find_data_references_in_stmt (NULL, stmt, &datarefs)) >> >>> ! break; >> >>> ! } >> >>> ! >> >>> ! /* Skip leading unhandled stmts. */ >> >>> ! if (gsi_stmt (region_begin) == gsi_stmt (gsi)) >> >>> ! { >> >>> ! gsi_next (&gsi); >> >>> ! continue; >> >>> ! } >> >>> ! >> >>> ! gimple_stmt_iterator region_end = gsi; >> >>> + bool vectorized = false; >> >>> + bb_vinfo = vect_slp_analyze_bb_1 (region_begin, region_end, >> >>> + datarefs, insns); >> >>> + if (bb_vinfo >> >>> + && dbg_cnt (vect_slp)) >> >>> + { >> >>> if (dump_enabled_p ()) >> >>> ! dump_printf_loc (MSG_NOTE, vect_location, "SLPing BB >> >>> part\n"); >> >>> vect_schedule_slp (bb_vinfo); >> >>> if (dump_enabled_p ()) >> >>> dump_printf_loc (MSG_NOTE, vect_location, >> >>> ! "basic block part vectorized\n"); >> >>> destroy_bb_vec_info (bb_vinfo); >> >>> ! vectorized = true; >> >>> } >> >>> + else >> >>> + destroy_bb_vec_info (bb_vinfo); >> >>> ! any_vectorized |= vectorized; >> >>> vector_sizes &= ~current_vector_size; >> >>> ! if (vectorized >> >>> ! || vector_sizes == 0 >> >>> ! || current_vector_size == 0) >> >>> ! { >> >>> ! if (gsi_end_p (region_end)) >> >>> ! break; >> >>> ! >> >>> ! /* Skip the unhandled stmt. */ >> >>> ! gsi_next (&gsi); >> >>> ! >> >>> ! /* And reset vector sizes. */ >> >>> ! current_vector_size = 0; >> >>> ! vector_sizes = targetm.vectorize.autovectorize_vector_sizes (); >> >>> ! } >> >>> ! else >> >>> ! { >> >>> ! /* Try the next biggest vector size. */ >> >>> ! current_vector_size = 1 << floor_log2 (vector_sizes); >> >>> ! if (dump_enabled_p ()) >> >>> ! dump_printf_loc (MSG_NOTE, vect_location, >> >>> ! "***** Re-trying analysis with " >> >>> ! "vector size %d\n", current_vector_size); >> >>> ! /* Start over. */ >> >>> ! gsi = region_begin; >> >>> ! } >> >>> } >> >>> + >> >>> + return any_vectorized; >> >>> } >> >>> Index: gcc/tree-vect-patterns.c >> >>> =================================================================== >> >>> *** gcc/tree-vect-patterns.c.orig 2015-11-05 09:52:00.640227178 >> >>> +0100 >> >>> --- gcc/tree-vect-patterns.c 2015-11-05 13:25:46.060011765 +0100 >> >>> *************** static bool >> >>> *** 107,133 **** >> >>> vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2) >> >>> { >> >>> stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1); >> >>> ! loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo); >> >>> ! bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_vinfo); >> >>> ! >> >>> ! if (!gimple_bb (stmt2)) >> >>> ! return false; >> >>> ! >> >>> ! if (loop_vinfo) >> >>> ! { >> >>> ! struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); >> >>> ! if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt2))) >> >>> ! return false; >> >>> ! } >> >>> ! else >> >>> ! { >> >>> ! if (gimple_bb (stmt2) != BB_VINFO_BB (bb_vinfo) >> >>> ! || gimple_code (stmt2) == GIMPLE_PHI) >> >>> ! return false; >> >>> ! } >> >>> ! >> >>> ! gcc_assert (vinfo_for_stmt (stmt2)); >> >>> ! return true; >> >>> } >> >>> /* If the LHS of DEF_STMT has a single use, and that statement is >> >>> --- 107,113 ---- >> >>> vect_same_loop_or_bb_p (gimple *stmt1, gimple *stmt2) >> >>> { >> >>> stmt_vec_info stmt_vinfo = vinfo_for_stmt (stmt1); >> >>> ! return vect_stmt_in_region_p (stmt_vinfo->vinfo, stmt2); >> >>> } >> >>> /* If the LHS of DEF_STMT has a single use, and that statement is >> >>> *************** vect_pattern_recog (vec_info *vinfo) >> >>> *** 3611,3643 **** >> >>> loop = LOOP_VINFO_LOOP (loop_vinfo); >> >>> bbs = LOOP_VINFO_BBS (loop_vinfo); >> >>> nbbs = loop->num_nodes; >> >>> } >> >>> else >> >>> { >> >>> ! bbs = &as_a <bb_vec_info> (vinfo)->bb; >> >>> ! nbbs = 1; >> >>> ! } >> >>> ! >> >>> ! /* Scan through the loop stmts, applying the pattern recognition >> >>> ! functions starting at each stmt visited: */ >> >>> ! for (i = 0; i < nbbs; i++) >> >>> ! { >> >>> ! basic_block bb = bbs[i]; >> >>> ! for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si)) >> >>> ! { >> >>> ! if (is_a <bb_vec_info> (vinfo) >> >>> ! && (stmt = gsi_stmt (si)) >> >>> && vinfo_for_stmt (stmt) >> >>> && !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt))) >> >>> ! continue; >> >>> ! /* Scan over all generic vect_recog_xxx_pattern functions. >> >>> */ >> >>> ! for (j = 0; j < NUM_PATTERNS; j++) >> >>> ! { >> >>> vect_recog_func = vect_vect_recog_func_ptrs[j]; >> >>> vect_pattern_recog_1 (vect_recog_func, si, >> >>> &stmts_to_replace); >> >>> ! } >> >>> ! } >> >>> } >> >>> } >> >>> --- 3591,3632 ---- >> >>> loop = LOOP_VINFO_LOOP (loop_vinfo); >> >>> bbs = LOOP_VINFO_BBS (loop_vinfo); >> >>> nbbs = loop->num_nodes; >> >>> + >> >>> + /* Scan through the loop stmts, applying the pattern recognition >> >>> + functions starting at each stmt visited: */ >> >>> + for (i = 0; i < nbbs; i++) >> >>> + { >> >>> + basic_block bb = bbs[i]; >> >>> + for (si = gsi_start_bb (bb); !gsi_end_p (si); gsi_next (&si)) >> >>> + { >> >>> + /* Scan over all generic vect_recog_xxx_pattern functions. >> >>> */ >> >>> + for (j = 0; j < NUM_PATTERNS; j++) >> >>> + { >> >>> + vect_recog_func = vect_vect_recog_func_ptrs[j]; >> >>> + vect_pattern_recog_1 (vect_recog_func, si, >> >>> + &stmts_to_replace); >> >>> + } >> >>> + } >> >>> + } >> >>> } >> >>> else >> >>> { >> >>> ! bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo); >> >>> ! for (si = bb_vinfo->region_begin; >> >>> ! gsi_stmt (si) != gsi_stmt (bb_vinfo->region_end); gsi_next >> >>> (&si)) >> >>> ! { >> >>> ! if ((stmt = gsi_stmt (si)) >> >>> && vinfo_for_stmt (stmt) >> >>> && !STMT_VINFO_VECTORIZABLE (vinfo_for_stmt (stmt))) >> >>> ! continue; >> >>> ! /* Scan over all generic vect_recog_xxx_pattern functions. */ >> >>> ! for (j = 0; j < NUM_PATTERNS; j++) >> >>> ! { >> >>> vect_recog_func = vect_vect_recog_func_ptrs[j]; >> >>> vect_pattern_recog_1 (vect_recog_func, si, >> >>> &stmts_to_replace); >> >>> ! } >> >>> ! } >> >>> } >> >>> } >> >>> Index: gcc/config/i386/i386.c >> >>> =================================================================== >> >>> *** gcc/config/i386/i386.c.orig 2015-11-05 09:52:42.239687133 +0100 >> >>> --- gcc/config/i386/i386.c 2015-11-05 11:09:09.451774562 +0100 >> >>> *************** along with GCC; see the file COPYING3. >> >>> *** 64,69 **** >> >>> --- 64,70 ---- >> >>> #include "context.h" >> >>> #include "pass_manager.h" >> >>> #include "target-globals.h" >> >>> + #include "gimple-iterator.h" >> >>> #include "tree-vectorizer.h" >> >>> #include "shrink-wrap.h" >> >>> #include "builtins.h" >> >>> Index: gcc/testsuite/gcc.dg/vect/bb-slp-38.c >> >>> =================================================================== >> >>> *** /dev/null 1970-01-01 00:00:00.000000000 +0000 >> >>> --- gcc/testsuite/gcc.dg/vect/bb-slp-38.c 2015-11-05 >> >>> 14:00:48.177644327 +0100 >> >>> *************** >> >>> *** 0 **** >> >>> --- 1,44 ---- >> >>> + /* { dg-require-effective-target vect_int } */ >> >>> + >> >>> + #include "tree-vect.h" >> >>> + >> >>> + extern void abort (void); >> >>> + >> >>> + int a[8], b[8]; >> >>> + int x; >> >>> + >> >>> + void __attribute__((noinline,noclone)) >> >>> + bar (void) >> >>> + { >> >>> + x = 1; >> >>> + } >> >>> + >> >>> + void __attribute__((noinline,noclone)) >> >>> + foo(void) >> >>> + { >> >>> + a[0] = b[0]; >> >>> + a[1] = b[0]; >> >>> + a[2] = b[3]; >> >>> + a[3] = b[3]; >> >>> + bar (); >> >>> + a[4] = b[4]; >> >>> + a[5] = b[7]; >> >>> + a[6] = b[4]; >> >>> + a[7] = b[7]; >> >>> + } >> >>> + >> >>> + int main() >> >>> + { >> >>> + int i; >> >>> + check_vect (); >> >>> + for (i = 0; i < 8; ++i) >> >>> + b[i] = i; >> >>> + foo (); >> >>> + if (a[0] != 0 || a[1] != 0 || a[2] != 3 || a[3] != 3 >> >>> + || a[4] != 4 || a[5] != 7 || a[6] != 4 || a[7] != 7) >> >>> + abort (); >> >>> + return 0; >> >>> + } >> >>> + >> >>> + /* { dg-final { scan-tree-dump "basic block vectorized" "slp2" { target >> >>> vect_perm } } } */ >> >>> + /* { dg-final { scan-tree-dump-times "basic block part vectorized" 2 >> >>> "slp2" { target vect_perm } } } */ >> >>> Index: gcc/tree-vect-stmts.c >> >>> =================================================================== >> >>> *** gcc/tree-vect-stmts.c.orig 2015-11-02 12:37:11.074249388 +0100 >> >>> --- gcc/tree-vect-stmts.c 2015-11-05 13:29:21.413423692 +0100 >> >>> *************** vect_is_simple_use (tree operand, vec_in >> >>> *** 8196,8207 **** >> >>> dump_gimple_stmt (MSG_NOTE, TDF_SLIM, *def_stmt, 0); >> >>> } >> >>> ! basic_block bb = gimple_bb (*def_stmt); >> >>> ! if ((is_a <loop_vec_info> (vinfo) >> >>> ! && !flow_bb_inside_loop_p (as_a <loop_vec_info> (vinfo)->loop, >> >>> bb)) >> >>> ! || (is_a <bb_vec_info> (vinfo) >> >>> ! && (bb != as_a <bb_vec_info> (vinfo)->bb >> >>> ! || gimple_code (*def_stmt) == GIMPLE_PHI))) >> >>> *dt = vect_external_def; >> >>> else >> >>> { >> >>> --- 8196,8202 ---- >> >>> dump_gimple_stmt (MSG_NOTE, TDF_SLIM, *def_stmt, 0); >> >>> } >> >>> ! if (! vect_stmt_in_region_p (vinfo, *def_stmt)) >> >>> *dt = vect_external_def; >> >>> else >> >>> { >> >>> Index: gcc/tree-vectorizer.c >> >>> =================================================================== >> >>> *** gcc/tree-vectorizer.c.orig 2015-11-04 09:23:53.724687806 +0100 >> >>> --- gcc/tree-vectorizer.c 2015-11-05 13:55:08.299817570 +0100 >> >>> *************** vect_destroy_datarefs (vec_info *vinfo) >> >>> *** 350,355 **** >> >>> --- 350,382 ---- >> >>> } >> >>> + /* Return whether STMT is inside the region we try to vectorize. >> >>> */ >> >>> + >> >>> + bool >> >>> + vect_stmt_in_region_p (vec_info *vinfo, gimple *stmt) >> >>> + { >> >>> + if (!gimple_bb (stmt)) >> >>> + return false; >> >>> + >> >>> + if (loop_vec_info loop_vinfo = dyn_cast <loop_vec_info> (vinfo)) >> >>> + { >> >>> + struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); >> >>> + if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt))) >> >>> + return false; >> >>> + } >> >>> + else >> >>> + { >> >>> + bb_vec_info bb_vinfo = as_a <bb_vec_info> (vinfo); >> >>> + if (gimple_bb (stmt) != BB_VINFO_BB (bb_vinfo) >> >>> + || gimple_uid (stmt) == -1U >> >>> + || gimple_code (stmt) == GIMPLE_PHI) >> >>> + return false; >> >>> + } >> >>> + >> >>> + return true; >> >>> + } >> >>> + >> >>> + >> >>> /* If LOOP has been versioned during ifcvt, return the internal call >> >>> guarding it. */ >> >>> *************** pass_slp_vectorize::execute (function *f >> >>> *** 692,697 **** >> >>> --- 719,732 ---- >> >>> scev_initialize (); >> >>> } >> >>> + /* Mark all stmts as not belonging to the current region. */ >> >>> + FOR_EACH_BB_FN (bb, fun) >> >>> + { >> >>> + for (gimple_stmt_iterator gsi = gsi_start_bb (bb); !gsi_end_p >> >>> (gsi); >> >>> + gsi_next (&gsi)) >> >>> + gimple_set_uid (gsi_stmt (gsi), -1); >> >>> + } >> >>> + >> >>> init_stmt_vec_info_vec (); >> >>> FOR_EACH_BB_FN (bb, fun) >> >>> Index: gcc/config/aarch64/aarch64.c >> >>> =================================================================== >> >>> *** gcc/config/aarch64/aarch64.c.orig 2015-10-28 11:22:25.290823112 >> >>> +0100 >> >>> --- gcc/config/aarch64/aarch64.c 2015-11-06 10:24:21.539818027 >> >>> +0100 >> >>> *************** >> >>> *** 52,57 **** >> >>> --- 52,58 ---- >> >>> #include "params.h" >> >>> #include "gimplify.h" >> >>> #include "dwarf2.h" >> >>> + #include "gimple-iterator.h" >> >>> #include "tree-vectorizer.h" >> >>> #include "aarch64-cost-tables.h" >> >>> #include "dumpfile.h" >> >>> >> > >> >> > > -- > Richard Biener <rguenther@suse.de> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg) ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH] Make BB vectorizer work on sub-BBs 2015-11-06 11:10 ` Richard Biener 2015-11-06 11:12 ` Kyrill Tkachov @ 2015-11-06 16:13 ` Jeff Law 1 sibling, 0 replies; 8+ messages in thread From: Jeff Law @ 2015-11-06 16:13 UTC (permalink / raw) To: Richard Biener, gcc-patches On 11/06/2015 04:09 AM, Richard Biener wrote: > On Fri, 6 Nov 2015, Richard Biener wrote: > >> >> The following patch makes the BB vectorizer not only handle BB heads >> (until the first stmt with a data reference it cannot handle) but >> arbitrary regions in a BB separated by such stmts. >> >> This improves the number of BB vectorizations from 469 to 556 >> in a quick test on SPEC CPU 2006 with -Ofast on x86_64 and >> 1x400.perlbench 1x410.bwaves 1x416.gamess 1x450.soplex 1x453.povray >> 1x481.wrf failing both patched and unpatched (have to update my >> config used for such experiments it seems ...) >> >> Bootstrapped and tested on x86_64-unknown-linux-gnu, aarch64 cross built. >> >> I'm currently re-testing for a cosmetic change I made when writing >> the changelog. >> >> I expected (and there are) some issues with compile-time. Left >> is unpatched and right is patched. >> >> '403.gcc': 00:00:54 (54) | '403.gcc': 00:00:55 (55) >> '483.xalancbmk': 00:02:20 (140) | '483.xalancbmk': 00:02:24 (144) >> '416.gamess': 00:02:36 (156) | '416.gamess': 00:02:37 (157) >> '435.gromacs': 00:00:18 (18) | '435.gromacs': 00:00:19 (19) >> '447.dealII': 00:01:31 (91) | '447.dealII': 00:01:33 (93) >> '453.povray': 00:04:54 (294) | '453.povray': 00:08:54 (534) >> '454.calculix': 00:00:34 (34) | '454.calculix': 00:00:52 (52) >> '481.wrf': 00:01:57 (117) | '481.wrf': 00:01:59 (119) >> >> other benchmarks are unchanged. I'm double-checking now that a followup >> patch I have which re-implements BB vectorization dependence checking >> fixes this (that's the only quadraticness I know of). > > Fixes all but > > '453.povray': 00:04:54 (294) | '453.povray': 00:06:46 (406) 453.povray is mine, related to the FSM bits. jeff ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-11-10 15:20 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-11-06 9:43 [PATCH] Make BB vectorizer work on sub-BBs Richard Biener 2015-11-06 11:10 ` Richard Biener 2015-11-06 11:12 ` Kyrill Tkachov 2015-11-06 11:27 ` Richard Biener 2015-11-10 12:56 ` Christophe Lyon 2015-11-10 13:03 ` Richard Biener 2015-11-10 15:20 ` Christophe Lyon 2015-11-06 16:13 ` Jeff Law
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).