From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 39831 invoked by alias); 23 Aug 2019 16:50:39 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 39760 invoked by uid 89); 23 Aug 2019 16:50:39 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-23.9 required=5.0 tests=AWL,BAYES_00,GIT_PATCH_0,GIT_PATCH_1,GIT_PATCH_2,GIT_PATCH_3,KAM_LOTSOFHASH,SPF_PASS autolearn=ham version=3.3.1 spammy=retaining, Drive X-HELO: foss.arm.com Received: from foss.arm.com (HELO foss.arm.com) (217.140.110.172) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Fri, 23 Aug 2019 16:50:35 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2950B1570; Fri, 23 Aug 2019 09:50:34 -0700 (PDT) Received: from [10.2.206.37] (e107157-lin.cambridge.arm.com [10.2.206.37]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id AD2653F246; Fri, 23 Aug 2019 09:50:33 -0700 (PDT) From: "Andre Vieira (lists)" Subject: [PATCH 1/2][vect]PR 88915: Vectorize epilogues when versioning loops In-Reply-To: To: gcc-patches , Richard Biener Message-ID: <385547e6-abbd-3633-ad69-d4fb6e604c97@arm.com> Date: Fri, 23 Aug 2019 17:17:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.8.0 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="------------0993C3C852F7FE3F3E246957" X-IsSubscribed: yes X-SW-Source: 2019-08/txt/msg01668.txt.bz2 This is a multi-part message in MIME format. --------------0993C3C852F7FE3F3E246957 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-length: 1990 Hi, This patch is an improvement on my last RFC. As you pointed out, we can do the vectorization analysis of the epilogues before doing the transformation, using the same approach as used by openmp simd. I have not yet incorporated the cost tweaks for vectorizing the epilogue, I would like to do this in a subsequent patch, to make it easier to test the differences. I currently disable the vectorization of epilogues when versioning for iterations. This is simply because I do not completely understand how the assumptions are created and couldn't determine whether using skip_vectors with this would work. If you don't think it is a problem or have a testcase to show it work I would gladly look at it. Bootstrapped this and the next patch on x86_64 and aarch64-unknown-linux-gnu, with no regressions (after test changes in next patch). gcc/ChangeLog: 2019-08-23 Andre Vieira PR 88915 * gentype.c (main): Add poly_uint64 type to generator. * tree-vect-loop.c (vect_analyze_loop_2): Make it determine whether we vectorize epilogue loops. (vect_analyze_loop): Idem. (vect_transform_loop): Pass decision to vectorize epilogues to vect_do_peeling. * tree-vect-loop-manip.c (vect_do_peeling): Enable skip-vectors when doing loop versioning if we decided to vectorize epilogues. (vect-loop_versioning): Moved decision to check_profitability based on cost model. * tree-vectorizer.h (vect_loop_versioning, vect_do_peeling, vect_analyze_loop, vect_transform_loop): Update declarations. * tree-vectorizer.c: Include params.h (try_vectorize_loop_1): Initialize vect_epilogues_nomask to PARAM_VECT_EPILOGUES_NOMASK and pass it to vect_analyze_loop and vect_transform_loop. Also make sure vectorizing epilogues does not count towards number of vectorized loops. --------------0993C3C852F7FE3F3E246957 Content-Type: text/x-patch; name="epilogues_1.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="epilogues_1.patch" Content-length: 14203 diff --git a/gcc/gengtype.c b/gcc/gengtype.c index 53317337cf8c8e8caefd6b819d28b3bba301e755..56ffa08a7dee54837441f0c743f8c0faa285c74b 100644 --- a/gcc/gengtype.c +++ b/gcc/gengtype.c @@ -5197,6 +5197,7 @@ main (int argc, char **argv) POS_HERE (do_scalar_typedef ("widest_int", &pos)); POS_HERE (do_scalar_typedef ("int64_t", &pos)); POS_HERE (do_scalar_typedef ("poly_int64", &pos)); + POS_HERE (do_scalar_typedef ("poly_uint64", &pos)); POS_HERE (do_scalar_typedef ("uint64_t", &pos)); POS_HERE (do_scalar_typedef ("uint8", &pos)); POS_HERE (do_scalar_typedef ("uintptr_t", &pos)); diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c index 5c25441c70a271f04730486e513437fffa75b7e3..3b5f14c45b5b9b601120c6776734bbafefe1e178 100644 --- a/gcc/tree-vect-loop-manip.c +++ b/gcc/tree-vect-loop-manip.c @@ -2401,7 +2401,8 @@ class loop * vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, tree *niters_vector, tree *step_vector, tree *niters_vector_mult_vf_var, int th, - bool check_profitability, bool niters_no_overflow) + bool check_profitability, bool niters_no_overflow, + bool vect_epilogues_nomask) { edge e, guard_e; tree type = TREE_TYPE (niters), guard_cond; @@ -2474,7 +2475,8 @@ vect_do_peeling (loop_vec_info loop_vinfo, tree niters, tree nitersm1, bool skip_vector = (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) ? maybe_lt (LOOP_VINFO_INT_NITERS (loop_vinfo), bound_prolog + bound_epilog) - : !LOOP_REQUIRES_VERSIONING (loop_vinfo)); + : (!LOOP_REQUIRES_VERSIONING (loop_vinfo) + || vect_epilogues_nomask)); /* Epilog loop must be executed if the number of iterations for epilog loop is known at compile time, otherwise we need to add a check at the end of vector loop and skip to the end of epilog loop. */ @@ -2966,9 +2968,7 @@ vect_create_cond_for_alias_checks (loop_vec_info loop_vinfo, tree * cond_expr) *COND_EXPR_STMT_LIST. */ class loop * -vect_loop_versioning (loop_vec_info loop_vinfo, - unsigned int th, bool check_profitability, - poly_uint64 versioning_threshold) +vect_loop_versioning (loop_vec_info loop_vinfo) { class loop *loop = LOOP_VINFO_LOOP (loop_vinfo), *nloop; class loop *scalar_loop = LOOP_VINFO_SCALAR_LOOP (loop_vinfo); @@ -2988,10 +2988,15 @@ vect_loop_versioning (loop_vec_info loop_vinfo, bool version_align = LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT (loop_vinfo); bool version_alias = LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo); bool version_niter = LOOP_REQUIRES_VERSIONING_FOR_NITERS (loop_vinfo); + poly_uint64 versioning_threshold + = LOOP_VINFO_VERSIONING_THRESHOLD (loop_vinfo); tree version_simd_if_cond = LOOP_REQUIRES_VERSIONING_FOR_SIMD_IF_COND (loop_vinfo); + unsigned th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo); - if (check_profitability) + if (th >= vect_vf_for_cost (loop_vinfo) + && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) + && !ordered_p (th, versioning_threshold)) cond_expr = fold_build2 (GE_EXPR, boolean_type_node, scalar_loop_iters, build_int_cst (TREE_TYPE (scalar_loop_iters), th - 1)); diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index b0cbbac0cb5ba1ffce706715d3dbb9139063803d..305ee2b06eabde9091049da829e6fc93161aa13f 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -1858,7 +1858,8 @@ vect_dissolve_slp_only_groups (loop_vec_info loop_vinfo) for it. The different analyses will record information in the loop_vec_info struct. */ static opt_result -vect_analyze_loop_2 (loop_vec_info loop_vinfo, bool &fatal, unsigned *n_stmts) +vect_analyze_loop_2 (loop_vec_info loop_vinfo, bool &fatal, unsigned *n_stmts, + bool *vect_epilogues_nomask) { opt_result ok = opt_result::success (); int res; @@ -2179,6 +2180,11 @@ start_over: } } + /* Disable epilogue vectorization if versioning is required because of the + iteration count. TODO: Needs investigation as to whether it is possible + to vectorize epilogues in this case. */ + *vect_epilogues_nomask &= !LOOP_REQUIRES_VERSIONING_FOR_NITERS (loop_vinfo); + /* During peeling, we need to check if number of loop iterations is enough for both peeled prolog loop and vector loop. This check can be merged along with threshold check of loop versioning, so @@ -2186,6 +2192,7 @@ start_over: if (LOOP_REQUIRES_VERSIONING (loop_vinfo)) { poly_uint64 niters_th = 0; + unsigned int th = LOOP_VINFO_COST_MODEL_THRESHOLD (loop_vinfo); if (!vect_use_loop_mask_for_alignment_p (loop_vinfo)) { @@ -2206,6 +2213,14 @@ start_over: /* One additional iteration because of peeling for gap. */ if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)) niters_th += 1; + + /* Use the same condition as vect_transform_loop to decide when to use + the cost to determine a versioning threshold. */ + if (th >= vect_vf_for_cost (loop_vinfo) + && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) + && ordered_p (th, niters_th)) + niters_th = ordered_max (poly_uint64 (th), niters_th); + LOOP_VINFO_VERSIONING_THRESHOLD (loop_vinfo) = niters_th; } @@ -2329,7 +2344,7 @@ again: be vectorized. */ opt_loop_vec_info vect_analyze_loop (class loop *loop, loop_vec_info orig_loop_vinfo, - vec_info_shared *shared) + vec_info_shared *shared, bool *vect_epilogues_nomask) { auto_vector_sizes vector_sizes; @@ -2357,6 +2372,7 @@ vect_analyze_loop (class loop *loop, loop_vec_info orig_loop_vinfo, poly_uint64 autodetected_vector_size = 0; opt_loop_vec_info first_loop_vinfo = opt_loop_vec_info::success (NULL); poly_uint64 first_vector_size = 0; + unsigned vectorized_loops = 0; while (1) { /* Check the CFG characteristics of the loop (nesting, entry/exit). */ @@ -2376,14 +2392,17 @@ vect_analyze_loop (class loop *loop, loop_vec_info orig_loop_vinfo, if (orig_loop_vinfo) LOOP_VINFO_ORIG_LOOP_INFO (loop_vinfo) = orig_loop_vinfo; - opt_result res = vect_analyze_loop_2 (loop_vinfo, fatal, &n_stmts); + opt_result res = vect_analyze_loop_2 (loop_vinfo, fatal, &n_stmts, + vect_epilogues_nomask); if (res) { LOOP_VINFO_VECTORIZABLE_P (loop_vinfo) = 1; + vectorized_loops++; - if (loop->simdlen - && maybe_ne (LOOP_VINFO_VECT_FACTOR (loop_vinfo), - (unsigned HOST_WIDE_INT) loop->simdlen)) + if ((loop->simdlen + && maybe_ne (LOOP_VINFO_VECT_FACTOR (loop_vinfo), + (unsigned HOST_WIDE_INT) loop->simdlen)) + || *vect_epilogues_nomask) { if (first_loop_vinfo == NULL) { @@ -2392,7 +2411,13 @@ vect_analyze_loop (class loop *loop, loop_vec_info orig_loop_vinfo, loop->aux = NULL; } else - delete loop_vinfo; + { + /* Set versioning threshold of the original LOOP_VINFO based + on the last vectorization of the epilog. */ + LOOP_VINFO_VERSIONING_THRESHOLD (first_loop_vinfo) + = LOOP_VINFO_VERSIONING_THRESHOLD (loop_vinfo); + delete loop_vinfo; + } } else { @@ -2401,7 +2426,12 @@ vect_analyze_loop (class loop *loop, loop_vec_info orig_loop_vinfo, } } else - delete loop_vinfo; + { + /* Disable epilog vectorization if we can't determine the epilogs can + be vectorized. */ + *vect_epilogues_nomask &= vectorized_loops > 1; + delete loop_vinfo; + } if (next_size == 0) autodetected_vector_size = current_vector_size; @@ -8468,7 +8498,7 @@ vect_transform_loop_stmt (loop_vec_info loop_vinfo, stmt_vec_info stmt_info, Returns scalar epilogue loop if any. */ class loop * -vect_transform_loop (loop_vec_info loop_vinfo) +vect_transform_loop (loop_vec_info loop_vinfo, bool vect_epilogues_nomask) { class loop *loop = LOOP_VINFO_LOOP (loop_vinfo); class loop *epilogue = NULL; @@ -8497,11 +8527,11 @@ vect_transform_loop (loop_vec_info loop_vinfo) if (th >= vect_vf_for_cost (loop_vinfo) && !LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)) { - if (dump_enabled_p ()) - dump_printf_loc (MSG_NOTE, vect_location, - "Profitability threshold is %d loop iterations.\n", - th); - check_profitability = true; + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "Profitability threshold is %d loop iterations.\n", + th); + check_profitability = true; } /* Make sure there exists a single-predecessor exit bb. Do this before @@ -8519,18 +8549,8 @@ vect_transform_loop (loop_vec_info loop_vinfo) if (LOOP_REQUIRES_VERSIONING (loop_vinfo)) { - poly_uint64 versioning_threshold - = LOOP_VINFO_VERSIONING_THRESHOLD (loop_vinfo); - if (check_profitability - && ordered_p (poly_uint64 (th), versioning_threshold)) - { - versioning_threshold = ordered_max (poly_uint64 (th), - versioning_threshold); - check_profitability = false; - } class loop *sloop - = vect_loop_versioning (loop_vinfo, th, check_profitability, - versioning_threshold); + = vect_loop_versioning (loop_vinfo); sloop->force_vectorize = false; check_profitability = false; } @@ -8557,7 +8577,8 @@ vect_transform_loop (loop_vec_info loop_vinfo) bool niters_no_overflow = loop_niters_no_overflow (loop_vinfo); epilogue = vect_do_peeling (loop_vinfo, niters, nitersm1, &niters_vector, &step_vector, &niters_vector_mult_vf, th, - check_profitability, niters_no_overflow); + check_profitability, niters_no_overflow, + vect_epilogues_nomask); if (LOOP_VINFO_SCALAR_LOOP (loop_vinfo) && LOOP_VINFO_SCALAR_LOOP_SCALING (loop_vinfo).initialized_p ()) scale_loop_frequencies (LOOP_VINFO_SCALAR_LOOP (loop_vinfo), @@ -8818,7 +8839,7 @@ vect_transform_loop (loop_vec_info loop_vinfo) if (LOOP_VINFO_EPILOGUE_P (loop_vinfo)) epilogue = NULL; - if (!PARAM_VALUE (PARAM_VECT_EPILOGUES_NOMASK)) + if (!vect_epilogues_nomask) epilogue = NULL; if (epilogue) diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h index 1456cde4c2c2dec7244c504d2c496248894a4f1e..e87170c592036a6f3f5330e1ebf5d125441861a6 100644 --- a/gcc/tree-vectorizer.h +++ b/gcc/tree-vectorizer.h @@ -1480,10 +1480,10 @@ extern void vect_set_loop_condition (class loop *, loop_vec_info, extern bool slpeel_can_duplicate_loop_p (const class loop *, const_edge); class loop *slpeel_tree_duplicate_loop_to_edge_cfg (class loop *, class loop *, edge); -class loop *vect_loop_versioning (loop_vec_info, unsigned int, bool, - poly_uint64); +class loop *vect_loop_versioning (loop_vec_info); extern class loop *vect_do_peeling (loop_vec_info, tree, tree, - tree *, tree *, tree *, int, bool, bool); + tree *, tree *, tree *, int, bool, bool, + bool); extern void vect_prepare_for_masked_peels (loop_vec_info); extern dump_user_location_t find_loop_location (class loop *); extern bool vect_can_advance_ivs_p (loop_vec_info); @@ -1610,7 +1610,8 @@ extern bool check_reduction_path (dump_user_location_t, loop_p, gphi *, tree, /* Drive for loop analysis stage. */ extern opt_loop_vec_info vect_analyze_loop (class loop *, loop_vec_info, - vec_info_shared *); + vec_info_shared *, + bool *); extern tree vect_build_loop_niters (loop_vec_info, bool * = NULL); extern void vect_gen_vector_loop_niters (loop_vec_info, tree, tree *, tree *, bool); @@ -1622,7 +1623,7 @@ extern tree vect_get_loop_mask (gimple_stmt_iterator *, vec_loop_masks *, unsigned int, tree, unsigned int); /* Drive for loop transformation stage. */ -extern class loop *vect_transform_loop (loop_vec_info); +extern class loop *vect_transform_loop (loop_vec_info, bool); extern opt_loop_vec_info vect_analyze_loop_form (class loop *, vec_info_shared *); extern bool vectorizable_live_operation (stmt_vec_info, gimple_stmt_iterator *, diff --git a/gcc/tree-vectorizer.c b/gcc/tree-vectorizer.c index 173e6b51652fd023893b38da786ff28f827553b5..25c3fc8ff55e017ae0b971fa93ce8ce2a07cb94c 100644 --- a/gcc/tree-vectorizer.c +++ b/gcc/tree-vectorizer.c @@ -61,6 +61,7 @@ along with GCC; see the file COPYING3. If not see #include "tree.h" #include "gimple.h" #include "predict.h" +#include "params.h" #include "tree-pass.h" #include "ssa.h" #include "cgraph.h" @@ -875,6 +876,7 @@ try_vectorize_loop_1 (hash_table *&simduid_to_vf_htab, vec_info_shared shared; auto_purge_vect_location sentinel; vect_location = find_loop_location (loop); + bool vect_epilogues_nomask = PARAM_VALUE (PARAM_VECT_EPILOGUES_NOMASK); if (LOCATION_LOCUS (vect_location.get_location_t ()) != UNKNOWN_LOCATION && dump_enabled_p ()) dump_printf (MSG_NOTE | MSG_PRIORITY_INTERNALS, @@ -884,7 +886,7 @@ try_vectorize_loop_1 (hash_table *&simduid_to_vf_htab, /* Try to analyze the loop, retaining an opt_problem if dump_enabled_p. */ opt_loop_vec_info loop_vinfo - = vect_analyze_loop (loop, orig_loop_vinfo, &shared); + = vect_analyze_loop (loop, orig_loop_vinfo, &shared, &vect_epilogues_nomask); loop->aux = loop_vinfo; if (!loop_vinfo) @@ -980,7 +982,7 @@ try_vectorize_loop_1 (hash_table *&simduid_to_vf_htab, "loop vectorized using variable length vectors\n"); } - loop_p new_loop = vect_transform_loop (loop_vinfo); + loop_p new_loop = vect_transform_loop (loop_vinfo, vect_epilogues_nomask); (*num_vectorized_loops)++; /* Now that the loop has been vectorized, allow it to be unrolled etc. */ @@ -1013,8 +1015,13 @@ try_vectorize_loop_1 (hash_table *&simduid_to_vf_htab, /* Epilogue of vectorized loop must be vectorized too. */ if (new_loop) - ret |= try_vectorize_loop_1 (simduid_to_vf_htab, num_vectorized_loops, - new_loop, loop_vinfo, NULL, NULL); + { + /* Don't include vectorized epilogues in the "vectorized loops" count. + */ + unsigned dont_count = *num_vectorized_loops; + ret |= try_vectorize_loop_1 (simduid_to_vf_htab, &dont_count, + new_loop, loop_vinfo, NULL, NULL); + } return ret; } --------------0993C3C852F7FE3F3E246957--