From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 1666) id AEF86393C84E; Tue, 15 Sep 2020 12:41:36 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org AEF86393C84E DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1600173696; bh=VC90INLCmsX85Uzncqs+jDMc7KaXPBFEET/TrGWFjg8=; h=From:To:Subject:Date:From; b=dG7cgB+LSVpz+hUKhMwyFjXmYUhkKuRLaanpkZvA0q7QtGBeGsA7u2iDV70h8ATfg 2gNJDiLjUhUBt+3jpBsxdf2yV6t4xvaF9Vl5S7TH1jYR6VGr56TsnlFEeSTXTot7is icVyYoIxnbharADSiz9i12icD1VvB+5QHO/9kEtM= Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Richard Biener To: gcc-cvs@gcc.gnu.org Subject: [gcc r11-3204] Allow more BB vectorization X-Act-Checkin: gcc X-Git-Author: Richard Biener X-Git-Refname: refs/heads/master X-Git-Oldrev: 80297f897758f59071968ddff2a04a8d11481117 X-Git-Newrev: c9de716a59c873859df3b3e1fbb993200fce5a73 Message-Id: <20200915124136.AEF86393C84E@sourceware.org> Date: Tue, 15 Sep 2020 12:41:36 +0000 (GMT) X-BeenThere: gcc-cvs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-cvs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 15 Sep 2020 12:41:36 -0000 https://gcc.gnu.org/g:c9de716a59c873859df3b3e1fbb993200fce5a73 commit r11-3204-gc9de716a59c873859df3b3e1fbb993200fce5a73 Author: Richard Biener Date: Tue Sep 15 14:35:40 2020 +0200 Allow more BB vectorization The following allows more BB vectorization by generally building leafs from scalars rather than giving up. Note this is only a first step towards this and as can be seen with the exception for node splitting it is generally hard to get this heuristic sound. I've added variants of the bb-slp-48.c testcase to make sure we still try permuting for example. 2020-09-15 Richard Biener * tree-vect-slp.c (vect_build_slp_tree_2): Also consider building an operand from scalars when building it did not fail fatally but avoid messing with the upcall splitting of groups. * gcc.dg/vect/bb-slp-48.c: New testcase. * gcc.dg/vect/bb-slp-7.c: Adjust. Diff: --- gcc/testsuite/gcc.dg/vect/bb-slp-48.c | 55 +++++++++++++++++++++++++++ gcc/testsuite/gcc.dg/vect/bb-slp-7.c | 3 +- gcc/tree-vect-slp.c | 70 ++++++++++++++++++++--------------- 3 files changed, 98 insertions(+), 30 deletions(-) diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-48.c b/gcc/testsuite/gcc.dg/vect/bb-slp-48.c new file mode 100644 index 00000000000..cd229323ecf --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-48.c @@ -0,0 +1,55 @@ +/* { dg-do compile } */ +/* { dg-additional-options "-fgimple -fdump-tree-optimized" } */ +/* { dg-require-effective-target vect_double } */ + +double a[2]; + +void __GIMPLE (ssa,startwith ("fix_loops")) +foo (double x) +{ + double tem2; + double tem1; + double _1; + double _2; + double _3; + double _4; + + __BB(2): + _1 = a[0]; + _2 = x_6(D) * 3.0e+0; + tem1_7 = _1 + _2; + _3 = x_6(D) + 1.0e+0; + _4 = a[1]; + tem2_8 = _4 + _3; + a[0] = tem1_7; + a[1] = tem2_8; + return; +} + +void __GIMPLE (ssa,startwith ("fix_loops")) +bar (double x) +{ + double tem2; + double tem1; + double _1; + double _2; + double _3; + double _4; + + __BB(2): + _1 = a[0]; + _2 = x_6(D) * 3.0e+0; + tem1_7 = _1 + _2; + _3 = x_6(D) + 1.0e+0; + _4 = a[1]; + /* Once with operands swapped. */ + tem2_8 = _3 + _4; + a[0] = tem1_7; + a[1] = tem2_8; + return; +} + +/* { dg-final { scan-tree-dump-times "basic block part vectorized" 2 "slp2" } } */ +/* We want to vectorize as { a[0], a[1] } + { x*3, x+1 } and thus + elide one add in each function. */ +/* { dg-final { scan-tree-dump-times " \\+ " 4 "optimized" } } */ diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-7.c b/gcc/testsuite/gcc.dg/vect/bb-slp-7.c index b8bef8cffb4..f12dc275667 100644 --- a/gcc/testsuite/gcc.dg/vect/bb-slp-7.c +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-7.c @@ -22,6 +22,7 @@ main1 (unsigned int x, unsigned int y) a2 = *pin++ + 2; a3 = *pin++ * 31; + /* But we can still vectorize the multiplication or the store. */ *pout++ = a0 * x; *pout++ = a1 * y; *pout++ = a2 * x; @@ -46,5 +47,5 @@ int main (void) return 0; } -/* { dg-final { scan-tree-dump-times "basic block vectorized" 0 "slp2" } } */ +/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp2" } } */ diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c index 15912515caa..d844fe4d6bb 100644 --- a/gcc/tree-vect-slp.c +++ b/gcc/tree-vect-slp.c @@ -1444,33 +1444,6 @@ vect_build_slp_tree_2 (vec_info *vinfo, continue; } - /* If the SLP build failed fatally and we analyze a basic-block - simply treat nodes we fail to build as externally defined - (and thus build vectors from the scalar defs). - The cost model will reject outright expensive cases. - ??? This doesn't treat cases where permutation ultimatively - fails (or we don't try permutation below). Ideally we'd - even compute a permutation that will end up with the maximum - SLP tree size... */ - if (is_a (vinfo) - && !matches[0] - /* ??? Rejecting patterns this way doesn't work. We'd have to - do extra work to cancel the pattern so the uses see the - scalar version. */ - && !is_pattern_stmt_p (stmt_info) - && !oprnd_info->any_pattern) - { - if (dump_enabled_p ()) - dump_printf_loc (MSG_NOTE, vect_location, - "Building vector operands from scalars\n"); - this_tree_size++; - child = vect_create_new_slp_node (oprnd_info->ops); - children.safe_push (child); - oprnd_info->ops = vNULL; - oprnd_info->def_stmts = vNULL; - continue; - } - /* If the SLP build for operand zero failed and operand zero and one can be commutated try that for the scalar stmts that failed the match. */ @@ -1542,11 +1515,50 @@ vect_build_slp_tree_2 (vec_info *vinfo, children.safe_push (child); continue; } - + /* We do not undo the swapping here since it might still be + the better order for the second operand in case we build + the first one from scalars below. */ ++*npermutes; } - fail: + + /* If the SLP build failed and we analyze a basic-block + simply treat nodes we fail to build as externally defined + (and thus build vectors from the scalar defs). + The cost model will reject outright expensive cases. + ??? This doesn't treat cases where permutation ultimatively + fails (or we don't try permutation below). Ideally we'd + even compute a permutation that will end up with the maximum + SLP tree size... */ + if (is_a (vinfo) + /* ??? Rejecting patterns this way doesn't work. We'd have to + do extra work to cancel the pattern so the uses see the + scalar version. */ + && !is_pattern_stmt_p (stmt_info) + && !oprnd_info->any_pattern) + { + /* But if there's a leading vector sized set of matching stmts + fail here so we can split the group. This matches the condition + vect_analyze_slp_instance uses. */ + /* ??? We might want to split here and combine the results to support + multiple vector sizes better. */ + for (j = 0; j < group_size; ++j) + if (!matches[j]) + break; + if (!known_ge (j, TYPE_VECTOR_SUBPARTS (vectype))) + { + if (dump_enabled_p ()) + dump_printf_loc (MSG_NOTE, vect_location, + "Building vector operands from scalars\n"); + this_tree_size++; + child = vect_create_new_slp_node (oprnd_info->ops); + children.safe_push (child); + oprnd_info->ops = vNULL; + oprnd_info->def_stmts = vNULL; + continue; + } + } + gcc_assert (child == NULL); FOR_EACH_VEC_ELT (children, j, child) vect_free_slp_tree (child, false);