From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <rguenth@sourceware.org>
Received: by sourceware.org (Postfix, from userid 1666)
 id AEF86393C84E; Tue, 15 Sep 2020 12:41:36 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org AEF86393C84E
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
 s=default; t=1600173696;
 bh=VC90INLCmsX85Uzncqs+jDMc7KaXPBFEET/TrGWFjg8=;
 h=From:To:Subject:Date:From;
 b=dG7cgB+LSVpz+hUKhMwyFjXmYUhkKuRLaanpkZvA0q7QtGBeGsA7u2iDV70h8ATfg
 2gNJDiLjUhUBt+3jpBsxdf2yV6t4xvaF9Vl5S7TH1jYR6VGr56TsnlFEeSTXTot7is
 icVyYoIxnbharADSiz9i12icD1VvB+5QHO/9kEtM=
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
From: Richard Biener <rguenth@gcc.gnu.org>
To: gcc-cvs@gcc.gnu.org
Subject: [gcc r11-3204] Allow more BB vectorization
X-Act-Checkin: gcc
X-Git-Author: Richard Biener <rguenther@suse.de>
X-Git-Refname: refs/heads/master
X-Git-Oldrev: 80297f897758f59071968ddff2a04a8d11481117
X-Git-Newrev: c9de716a59c873859df3b3e1fbb993200fce5a73
Message-Id: <20200915124136.AEF86393C84E@sourceware.org>
Date: Tue, 15 Sep 2020 12:41:36 +0000 (GMT)
X-BeenThere: gcc-cvs@gcc.gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gcc-cvs mailing list <gcc-cvs.gcc.gnu.org>
List-Unsubscribe: <https://gcc.gnu.org/mailman/options/gcc-cvs>,
 <mailto:gcc-cvs-request@gcc.gnu.org?subject=unsubscribe>
List-Archive: <https://gcc.gnu.org/pipermail/gcc-cvs/>
List-Help: <mailto:gcc-cvs-request@gcc.gnu.org?subject=help>
List-Subscribe: <https://gcc.gnu.org/mailman/listinfo/gcc-cvs>,
 <mailto:gcc-cvs-request@gcc.gnu.org?subject=subscribe>
X-List-Received-Date: Tue, 15 Sep 2020 12:41:36 -0000

https://gcc.gnu.org/g:c9de716a59c873859df3b3e1fbb993200fce5a73

commit r11-3204-gc9de716a59c873859df3b3e1fbb993200fce5a73
Author: Richard Biener <rguenther@suse.de>
Date:   Tue Sep 15 14:35:40 2020 +0200

    Allow more BB vectorization
    
    The following allows more BB vectorization by generally building leafs
    from scalars rather than giving up.  Note this is only a first step
    towards this and as can be seen with the exception for node splitting
    it is generally hard to get this heuristic sound.  I've added variants
    of the bb-slp-48.c testcase to make sure we still try permuting for
    example.
    
    2020-09-15  Richard Biener  <rguenther@suse.de>
    
            * tree-vect-slp.c (vect_build_slp_tree_2): Also consider
            building an operand from scalars when building it did not
            fail fatally but avoid messing with the upcall splitting
            of groups.
    
            * gcc.dg/vect/bb-slp-48.c: New testcase.
            * gcc.dg/vect/bb-slp-7.c: Adjust.

Diff:
---
 gcc/testsuite/gcc.dg/vect/bb-slp-48.c | 55 +++++++++++++++++++++++++++
 gcc/testsuite/gcc.dg/vect/bb-slp-7.c  |  3 +-
 gcc/tree-vect-slp.c                   | 70 ++++++++++++++++++++---------------
 3 files changed, 98 insertions(+), 30 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-48.c b/gcc/testsuite/gcc.dg/vect/bb-slp-48.c
new file mode 100644
index 00000000000..cd229323ecf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-48.c
@@ -0,0 +1,55 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fgimple -fdump-tree-optimized" } */
+/* { dg-require-effective-target vect_double } */
+
+double a[2];
+
+void __GIMPLE (ssa,startwith ("fix_loops"))
+foo (double x)
+{
+  double tem2;
+  double tem1;
+  double _1;
+  double _2;
+  double _3;
+  double _4;
+
+  __BB(2):
+  _1 = a[0];
+  _2 = x_6(D) * 3.0e+0;
+  tem1_7 = _1 + _2;
+  _3 = x_6(D) + 1.0e+0;
+  _4 = a[1];
+  tem2_8 = _4 + _3;
+  a[0] = tem1_7;
+  a[1] = tem2_8;
+  return;
+}
+
+void __GIMPLE (ssa,startwith ("fix_loops"))
+bar (double x)
+{
+  double tem2;
+  double tem1;
+  double _1;
+  double _2;
+  double _3;
+  double _4;
+
+  __BB(2):
+  _1 = a[0];
+  _2 = x_6(D) * 3.0e+0;
+  tem1_7 = _1 + _2;
+  _3 = x_6(D) + 1.0e+0;
+  _4 = a[1];
+  /* Once with operands swapped.  */
+  tem2_8 = _3 + _4;
+  a[0] = tem1_7;
+  a[1] = tem2_8;
+  return;
+}
+
+/* { dg-final { scan-tree-dump-times "basic block part vectorized" 2 "slp2" } } */
+/* We want to vectorize as { a[0], a[1] } + { x*3, x+1 } and thus
+   elide one add in each function.  */
+/* { dg-final { scan-tree-dump-times " \\+ " 4 "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-7.c b/gcc/testsuite/gcc.dg/vect/bb-slp-7.c
index b8bef8cffb4..f12dc275667 100644
--- a/gcc/testsuite/gcc.dg/vect/bb-slp-7.c
+++ b/gcc/testsuite/gcc.dg/vect/bb-slp-7.c
@@ -22,6 +22,7 @@ main1 (unsigned int x, unsigned int y)
   a2 = *pin++ + 2;
   a3 = *pin++ * 31;
 
+  /* But we can still vectorize the multiplication or the store.  */
   *pout++ = a0 * x;
   *pout++ = a1 * y;
   *pout++ = a2 * x;
@@ -46,5 +47,5 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "basic block vectorized" 0 "slp2" } } */
+/* { dg-final { scan-tree-dump-times "basic block vectorized" 1 "slp2" } } */
 
diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index 15912515caa..d844fe4d6bb 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -1444,33 +1444,6 @@ vect_build_slp_tree_2 (vec_info *vinfo,
 	  continue;
 	}
 
-      /* If the SLP build failed fatally and we analyze a basic-block
-         simply treat nodes we fail to build as externally defined
-	 (and thus build vectors from the scalar defs).
-	 The cost model will reject outright expensive cases.
-	 ???  This doesn't treat cases where permutation ultimatively
-	 fails (or we don't try permutation below).  Ideally we'd
-	 even compute a permutation that will end up with the maximum
-	 SLP tree size...  */
-      if (is_a <bb_vec_info> (vinfo)
-	  && !matches[0]
-	  /* ???  Rejecting patterns this way doesn't work.  We'd have to
-	     do extra work to cancel the pattern so the uses see the
-	     scalar version.  */
-	  && !is_pattern_stmt_p (stmt_info)
-	  && !oprnd_info->any_pattern)
-	{
-	  if (dump_enabled_p ())
-	    dump_printf_loc (MSG_NOTE, vect_location,
-			     "Building vector operands from scalars\n");
-	  this_tree_size++;
-	  child = vect_create_new_slp_node (oprnd_info->ops);
-	  children.safe_push (child);
-	  oprnd_info->ops = vNULL;
-	  oprnd_info->def_stmts = vNULL;
-	  continue;
-	}
-
       /* If the SLP build for operand zero failed and operand zero
 	 and one can be commutated try that for the scalar stmts
 	 that failed the match.  */
@@ -1542,11 +1515,50 @@ vect_build_slp_tree_2 (vec_info *vinfo,
 	      children.safe_push (child);
 	      continue;
 	    }
-
+	  /* We do not undo the swapping here since it might still be
+	     the better order for the second operand in case we build
+	     the first one from scalars below.  */
 	  ++*npermutes;
 	}
-
 fail:
+
+      /* If the SLP build failed and we analyze a basic-block
+	 simply treat nodes we fail to build as externally defined
+	 (and thus build vectors from the scalar defs).
+	 The cost model will reject outright expensive cases.
+	 ???  This doesn't treat cases where permutation ultimatively
+	 fails (or we don't try permutation below).  Ideally we'd
+	 even compute a permutation that will end up with the maximum
+	 SLP tree size...  */
+      if (is_a <bb_vec_info> (vinfo)
+	  /* ???  Rejecting patterns this way doesn't work.  We'd have to
+	     do extra work to cancel the pattern so the uses see the
+	     scalar version.  */
+	  && !is_pattern_stmt_p (stmt_info)
+	  && !oprnd_info->any_pattern)
+	{
+	  /* But if there's a leading vector sized set of matching stmts
+	     fail here so we can split the group.  This matches the condition
+	     vect_analyze_slp_instance uses.  */
+	  /* ???  We might want to split here and combine the results to support
+	     multiple vector sizes better.  */
+	  for (j = 0; j < group_size; ++j)
+	    if (!matches[j])
+	      break;
+	  if (!known_ge (j, TYPE_VECTOR_SUBPARTS (vectype)))
+	    {
+	      if (dump_enabled_p ())
+		dump_printf_loc (MSG_NOTE, vect_location,
+				 "Building vector operands from scalars\n");
+	      this_tree_size++;
+	      child = vect_create_new_slp_node (oprnd_info->ops);
+	      children.safe_push (child);
+	      oprnd_info->ops = vNULL;
+	      oprnd_info->def_stmts = vNULL;
+	      continue;
+	    }
+	}
+
       gcc_assert (child == NULL);
       FOR_EACH_VEC_ELT (children, j, child)
 	vect_free_slp_tree (child, false);